talk-data.com
Activities & events
| Title & Speakers | Event |
|---|---|
|
Databricks London Meetup
2025-09-24 · 18:00
Holly Smith
– Developer Advocate
@ Databricks
,
Simon Whiteley
– CTO
@ Advancing Analytics
,
Gavi Regunath
– Chief AI Officer (CAIO)
@ Advancing Analytics
We’re excited to be back at Big Data LDN this year—huge thanks to the organisers for hosting Databricks London once more! Join us for an evening of insights, networking, and community with the Databricks Team and Advancing Analytics! 🎤 Agenda: 6:00 PM – 6:10 PM | Kickoff & Warm Welcome Grab a drink, say hi, and get the lowdown on what’s coming up. We’ll set the scene for an evening of learning and laughs. 6:10 PM – 6:50 PM | The Metadata Marathon: How three projects are racing forward – Holly Smith (Staff Developer Advocate, Databricks) With the enormous amount of discussion about open storage formats between nerds and even not-nerds, it can be hard to keep track of who’s doing what and how this actually makes any impact on day to day data projects. Holly will take a closer look at the three big projects in this space; Delta, Hudi and Iceberg. They’re all trying to solve for similar data problems and have tackled the various challenges in different ways. Her talk will start with the very basics of how we got here, what the history is before diving deep into the underlying tech, their roadmaps, and their impacts on the data landscape as a whole. 6:50 PM – 7:10 PM | What’s New in Databricks & Databricks AI – Simon Whiteley & Gavi Regunath Hot off the press! Simon and Gavi will walk you through the latest and greatest from Databricks, including shiny new AI features and platform updates you’ll want to try ASAP. 7:10 PM onwards | Q&A Panel + Networking Your chance to ask the experts anything—then stick around for drinks, snacks, and some good old-fashioned data geekery. |
Big Data LDN 2025
|
|
MLOps Days NYC: AI/ML gathering
2025-03-04 · 21:00
Join us in the heart of New York City for a free ML/AI mega-meetup featuring an incredible lineup of ML experts, data scientists, and DevOps professionals. Dive into scaling AI in production, optimizing ML workflows, and more in data science. This is a great opportunity to mingle with the best tech minds NYC has to offer over some good food. Register now on LUMA and save your spot! Featuring:
Register now on LUMA and save your spot!
Brought to you by JFrog! |
MLOps Days NYC: AI/ML gathering
|
|
Join us on January 29 for a chance to connect, learn, and get inspired by the latest and greatest in AI. Whether you're a pro at AI or just getting started, this event is all about building our community and shaping the future of AI development. Get ready for an evening packed with awesome sessions, discussions, and networking opportunities! Let's dive into the exciting world of AI together! Agenda
Program Sessions | Session Title | Session Description | Speaker | | :------------- |:-------------|:-------------| | Empowering Enterprises with LLM Agents: Assistants for Data-Driven Insights | Large Language Model (LLM) agents are emerging as powerful tools that can revolutionize how organizations interact with their data. This session will delve into the concept of LLM agents and their potential to transform internal operations by creating intelligent virtual assistants on top of your datawarehouse. We will explore how these agents can be leveraged to generate SQL query statements dynamically, enabling seamless access to data stored in various data warehouses such as Databricks, Microsoft Fabric. Attendees will gain insights into the architecture and implementation of these virtual assistants, and discover how they can enhance productivity, streamline decision-making processes, and unlock valuable insights from their data. | Sammy Deprez | | Simplifying AI in your .NET Application | Discover how to effortlessly integrate AI into your .NET applications with the Microsoft.Extensions.AI library. Learn to add chat features, embedding generation, and tool calling seamlessly. Plus, explore innovative techniques for real-time prompt management, allowing you to test and refine AI prompts live without stopping your application. Join us for a fast-paced session packed with practical insights and cutting-edge strategies to optimize your development workflow. | Maria Naggaga Nakanwagi | | AutoGen 0.4: A Programming Framework for Agentic AI reimagined | AutoGen is an open-source framework for building AI agent systems. It simplifies the creation of event-driven, distributed, scalable, and resilient agentic applications using multi-agent architectures. In this talk, Jack will provide an overview of the AutoGen framework and dig into 0.4, a reimagining of the framework. | Jack Gerrits | | AI in Action: Lessons and Success Stories | Join us for an engaging panel discussion featuring practitioners of artificial intelligence. Our panelists will share their firsthand experiences, challenges, and successes in implementing solutions across various industries. Learn about innovative use cases and discover best practices from experts who are at the forefront of AI technology.| Lisa Qu, Vivian Lei, Samir Mahmoud, Justin Trugman | |
Empire State of AI: An Evening with Microsoft, .NET, and the Global AI Community
|
|
Frank Munz: A Journey in Space with Apache Kafka data streams from NASA
2024-12-06 · 21:26
Frank Munz
– TMM Principal
@ Databricks
🌟 Session Overview 🌟 Session Name: Supernovas, Black Holes, and Streaming Data: A Journey in Space with Apache Kafka data streams from NASA Speaker: Frank Munz Session Description: In this fun, hands-on, and in-depth How-To, we explore NASA's GCN project, which publishes various events in space as Kafka topics. The focus of my talk is on end-to-end data engineering, from consuming the data and ELT-ing the stream, to using generative AI tools for analytics. We will analyze GCN data in real time, specifically targeting the data stream from exploding supernovas. This data triggers dozens of terrestrial telescopes to potentially reposition and point toward the event. The speaker will kick off the session by contrasting various ways of ingesting and transforming the data, discussing their trade-offs: Should you use a declarative data pipeline, or can a data analyst manage with SQL only? Alternatively, when would it be better to follow the classic approach of orchestrating Spark notebooks to get the data ingested? He will answer the question: Does a data engineer working with streaming data benefit from generative AI-based tools and assistants today? Is it worth it, or is it just hype? The demo is easy to replicate at home, and Frank will share the notebooks in a GitHub repository so you can analyze real NASA data yourself! This session is ideal for data engineers, data architects who enjoy some coding, generative AI enthusiasts, or anyone fascinated by technology and the sparkling stars in the night sky. While the focus is clearly on tech, the demo will run on the open-source and open-standards-based Databricks Intelligence Platform (so inevitably, you'll get a high-level overview here too). 🚀 About Big Data and RPA 2024 🚀 Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨ 📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP 💡 Stay Connected & Updated 💡 Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop! 🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT |
DATA MINER Big Data Europe Conference 2020 |
|
Meetup #10
2024-09-03 · 17:00
Hello hello everyone! Thanks to all who attended the last meet up on Data & AI - The product Perspective. It was a full house and hope you all enjoyed the slightly different panel discussion style. This time we are back to the norm... 2 fantastic speakers from more of an engineering focus. 25-30 minute talks followed by an open Q&A. Drinks, networking and of course tons of pizza.. Look forward to seeing you all then! Holly Smith \| Staff Developer Advocate @ Databricks Data Engineers AI Survival Guide I have a hypothesis, that 90% of people doing Gen AI today weren’t doing it two years ago. The landscape is full of people stumbling their way through it, from the AI academics learning that code for papers is not software development ready, all the way to data experts suddenly needing to learn a new skill. In this talk, we'll go through what data engineers need to know to help get those AI projects off the ground. Starting with picking the right projects, execution plans, through to toolsets and skills that will make you shine. About me Holly Smith is a multi award winning Data and AI expert who has over a decade of experience working with Data and AI teams in a variety of capacities from individual contributors all the way up to leadership. In her role at Databricks she has worked with many multi national companies as they embark on their journey to the cutting edge of data. She is a renowned public speaker, teacher and minority in tech advocate. Zack Akil \| Senior ML Engineer & Developer Advocate @ Google The "IVO" design pattern for pragmatic GenAI Come learn about the "IVO" design pattern with live demos and cautionary tales. This pragmatic pattern serves as a foundation for building effective tools with generative AI. Whether you're a developer or simply a user, IVO will guide you in crafting the right questions and tasks to get the most out of generative AI. About me Zack Akil (@ZackAkil) is a senior machine learning engineer & developer advocate at Google specialising in practical AI/ML with a background in full stack application development and data science. He spends most of his time building end-to-end ML applications (often inspired by his many random hobbies) that help to inspire developers on how they can solve problems with ML. Please aim to arrive at 6pm. The talks will start around 6.30pm (give or take a few mins) and there will be pizza and refreshments as always on arrival :) Look forward to seeing you all soon! |
Meetup #10
|
|
Talk with Data Experts in Databricks
2024-08-10 · 13:00
Event Overview: We are thrilled to invite you to London Data Engineering Coffee Chat. This time we will invite two speakers from Databricks to share their data career. Also, an exciting gathering dedicated to exploring the latest advancements and insights in Databricks. This event will bring together industry leaders, experts, and enthusiasts for an engaging series of discussions, presentations, and networking opportunities. Speakers: Liping Huang, Data Architect in Databricks and founder of Youtube channel Data Leaps. Dustin Smith, Data Architect in Databricks with 10+ years in Big Data Engineering and Machines Learning. Why Attend?
Engage: Participate in Q&A sessions, panel discussions, and networking. |
Talk with Data Experts in Databricks
|
|
PyData Bristol - 28th Meetup - 16th of May!
2024-05-16 · 17:00
Date set! Join us once again for the next PyData Bristol Meetup! We’re meeting in a lovely venue Amdaris!! Big thank you for hosting our event this time. Massive thanks to our other sponsor - Adlib for the scrumptious pizza, and thirst-quenching refreshments. Here is an intro from out new hosts! Hi, We are Amdaris, an Insight company, and at our core we specialise in extending teams with highly skilled software experts. With bespoke Software Development, Product Design, Strategy and Consultation, Managed Services and Data Solutions, we seamlessly integrate into your business and culture, bringing passion, care, and technical proficiency directly to you. Agenda for the evening: 🚪 6:00 pm - Doors open 🕡 6:30 pm - Talks commence (sharp!) 📚 Deep Dive 50 Minute Talk:
📢 Community announcements 🤝 Relaxed networking over beers and soft drinks Interested in sharing your knowledge or experience at this or a future event? Fill out this form to submit your talk proposal: PyData Bristol Talk Proposal We look forward to seeing you there for another fantastic evening of Python, Data Science, and camaraderie! TalksBetter ETL with Managed Airflow in ADF by Niall Langley Building complex data workflows using Azure Data Factory can get a little clunky - as you orchestration needs get more complex you hit limitations like not being able to nest loops or conditionals, running simple Python, bash or PowerShell scripts is difficult, and costs can grow quickly as you are charged per task execution. Recently another option become available, Managed Airflow in ADF. Apace Airflow is a code-centric open-source platform for developing, scheduling and monitoring batch-based data workflows, built using the python language Data Engineers know and love. But until Managed Airflow, getting it working in Azure was a complex task for customers more used to PaaS services such as ADF, Databricks and Fabric. It is also an important ETL orchestrator on AWS and GCP, so cross cloud compatibility becomes simpler to achieve. In this session we’ll look at what Airflow is, how it’s different from ADF, and what advantages Managed Airflow in ADF gives us. We talk about the idea of a DAG for building the workflow, and then work through some demos to show just how easy it is to use Python to write an Airflow DAG’s and import them into the Managed Airflow Environment as pipelines. We then dive into the excellent monitoring UI and find out just how easy is it to trigger a pipeline, view it to see the dependencies between tasks, and monitor runs. By the end of the session attendees will have a good understanding of what Airflow is, when to use it, and how it fits into the Azure Data Platform. 🕖 LOGISTICS Talks kick off at 18:30 sharp; then networking in Left Handed Giant from 20:40. If you realise you can't make it, please un-RSVP in good time to free up your place for your fellow community members. Follow @pydatabristol (PyData Bristol (@PyDataBristol) / X) for updates on this and future events, as well as news from the global PyData community. 📜 CODE OF CONDUCT The PyData Code of Conduct governs this meetup (Code of Conduct ). To discuss any issues or concerns relating to the code of conduct or behaviour of anyone at the PyData meetup, please contact the PyData Bristol organisers, or you can submit a report of any potential Code of Conduct violation directly to NumFOCUS (NumFOCUS Code of Conduct Report Form). Speakers |
PyData Bristol - 28th Meetup - 16th of May!
|
|
Databricks & Fabric Tuning, and MLOps
2023-09-14 · 17:00
Sponsored by Amdaris - https://amdaris.com and Advancing Analytics - https://www.advancinganalytics.co.uk/ Amdaris is your trusted partner for high velocity extended delivery teams. Advancing Analytics specialises in building cutting-edge Data Platforms for Data Science and Data Engineering. Location: Amdaris, Finzels Reach, Aurora, Bristol BS1 6BX AGENDA 18.00 – 18:30 Meet & Greet -------------- 18:30 - 19:15 Introduction to Tuning Spark on Databricks and Fabric by Niall Langley More and more organisations are building data platforms in the cloud, often utilising Spark with tools like Databricks, and the recently announced Fabric to build data processing pipelines. These distributed computing tools can take a while to learn, and often teams migrating older premises data warehouses to cloud solutions like the lake house concentrate on getting good data over getting the best performance. However in the cloud, any performance improvements can have a big impact on monthly cost, making it much easier to to justify spending time getting things running faster and more efficiently. This talk aims to help point you at common pain points when working with Spark using Databricks or Fabric, showing you where to look, and what to look for, and what can be done to improve things. -------------- 19:15 - 19:45 Pizza and Networking -------------- 19:45 - 20:30 Want End-to-End MLOps? Look no further than Databricks! by Tori Tompkins Arguably the largest challenge in ML today is effectively deploying reliable and efficient models into production, with experts quoting that as many as 90% of model created never make it to production. MLOps streamlines the process of taking machine learning models to production, and then maintaining and monitoring them. With new MLOps micro-venders popping up every day, is there a tool that does everything? In this session, we will consider Databricks as an end-to-end MLOps tool, exploring collaborative workspaces, feature stores, model registries and model serving. We will also touch upon other critical MLOps practices such as model fairness, explainability and monitoring. Including practical demos of Databricks Feature Store, MLFlow and real-time Model Serving, this session is suitable for Data Scientists and Machine Learning Engineers of all levels. https://www.linkedin.com/in/tompkinstori/ -------------- 20:30 - Pub -------------- Notes
-------------- About Amdaris Of course, we’re obsessed with cutting-edge technology but it’s how much we care about people that sets us apart. Whether that’s looking after our clients, our staff or the next generation of tech talent, we know that exceptional software development is only possible with exceptional teamwork. If you need help extending your team, building your big idea or application support, we offer a better way to do software. |
Databricks & Fabric Tuning, and MLOps
|
|
IFC's MALENA Provides Analytics for ESG Reviews in Emerging Markets Using NLP and LLMs
2023-07-26 · 21:10
International Finance Corporation (IFC) is using data and AI to build machine learning solutions that create analytical capacity to support the review of ESG issues at scale. This includes natural language processing and requires entity recognition and other applications to support the work of IFC’s experts and other investors working in emerging markets. These algorithms are available via IFC’s Machine Learning ESG Analyst (MALENA) platform to enable rapid analysis, increase productivity, and build investor confidence. In this manner, IFC, a development finance institution with the mandate to address poverty in emerging markets, is making use of its historical datasets and open source AI solutions to build custom-AI applications that democratize access to ESG capacity to read and classify text. In this session, you will learn the unique flexibility of the Apache Spark™ ecosystem from Databricks and how that has allowed IFC’s MALENA project to connect to scalable data lake storage, use different natural language processing models and seamlessly adopt MLOps. Talk by: Atiyah Curmally and Blaise Sandwidi Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc |
|
|
Databricks SQL: Why the Best Serverless Data Warehouse is a Lakehouse
2023-07-26 · 21:08
Cyrielle Simeone
,
Miranda Luna
– Product Management
@ Databricks
Many organizations rely on complex cloud data architectures that create silos between applications, users and data. This fragmentation makes it difficult to access accurate, up-to-date information for analytics, often resulting in the use of outdated data. Enter the lakehouse, a modern data architecture that unifies data, AI, and analytics in a single location. This session explores why the lakehouse is the best data warehouse, featuring success stories, use cases and best practices from industry experts. You'll discover how to unify and govern business-critical data at scale to build a curated data lake for data warehousing, SQL and BI. Additionally, you'll learn how Databricks SQL can help lower costs and get started in seconds with on-demand, elastic SQL serverless warehouses, and how to empower analytics engineers and analysts to quickly find and share new insights using their preferred BI and SQL tools such as Fivetran, dbt, Tableau, or Power BI. Talk by: Miranda Luna and Cyrielle Simeone Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc |
|
|
Simon + Denny Live: Ask Us Anything
2023-07-26 · 21:06
Denny Lee
– PM Director, Developer Relations
@ Databricks
,
Simon Whiteley
– Advancing Analytics
@ Advancing Analytics
Simon and Denny have been discussing and debating all things Delta, Lakehouse and Apache Spark™ on their regular webshow. Whether you want advice on lake structures, want to hear their opinions on the latest trends and hype in the data world, or you simply have a tech implementation question to throw at two seasoned experts, these two will have something to say on the matter. In their previous shows, Simon and Denny focused on building out a sample lakehouse architecture, refactoring and tinkering as new features came out, but now we're throwing the doors open for any and every question you might have. So if you've had a persistent question and think these two can help, this is the session for you. There will be a question submission form shared prior to the event, so the team will be prepped with a whole bunch of topics to talk through. Simon and Denny want to hear your questions, which they can field drawing from a wealth of industry experience, wide ranging community engagement and their differing perspectives as external consultant and internal Databricks respectively. There's also a chance they'll get distracted and go way off track talking about coffee, sci-fi, nerdery or the English weather. It happens. Talk by: Simon Whiteley and Denny Lee Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc |
|
|
Lakehouse Federation: Access and Governance of External Data Sources from Unity Catalog
2023-07-25 · 23:11
Can Efeoglu
– Staff Product Manager
@ Databricks
,
Todd Greenstein
– Product Manager
@ Databricks
Are you tired of spending time and money moving data across multiple sources and platforms to access the right data at the right time? Join our session and discover Databricks new Lakehouse Federation feature, which allows you to access, query, and govern your data in place without leaving the Lakehouse. Our experts will demonstrate how you can leverage the latest enhancements in Unity Catalog, including query federation, Hive interface, and Delta Sharing, to discover and govern all your data in one place, regardless of where it lives. Talk by: Can Efeoglu and Todd Greenstein Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc |
|
|
Level Up Your Data Platform With Active Metadata
2022-06-19 · 23:00
Prukalpa Sankar
– Co-founder
@ Atlan
,
Tobias Macey
– host
Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. A variety of platforms have been developed to capture and analyze that information to great effect, but they are inherently limited in their utility due to their nature as storage systems. In order to level up their value a new trend of active metadata is being implemented, allowing use cases like keeping BI reports up to date, auto-scaling your warehouses, and automated data governance. In this episode Prukalpa Sankar joins the show to talk about the work she and her team at Atlan are doing to push this capability into the mainstream. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Today’s episode is Sponsored by Prophecy.io – the low-code data engineering platform for the cloud. Prophecy provides an easy-to-use visual interface to design & deploy data pipelines on Apache Spark & Apache Airflow. Now all the data users can use software engineering best practices – git, tests and continuous deployment with a simple to use visual designer. How does it work? – You visually design the pipelines, and Prophecy generates clean Spark code with tests on git; then you visually schedule these pipelines on Airflow. You can observe your pipelines with built in metadata search and column level lineage. Finally, if you have existing workflows in AbInitio, Informatica or other ETL formats that you want to move to the cloud, you can import them automatically into Prophecy making them run productively on Spark. Create your free account today at dataengineeringpodcast.com/prophecy. Your host is Tobias Macey and today I’m interviewing Prukalpa Sankar about how data platforms can benefit from the idea of "active metadata" and the work that she and her team at Atlan are doing to make it a reality Interview Introduction How did you get involved in the area of data management? Can you describe what "active metadata" is and how it differs from the current approaches to metadata systems? What are some of the use cases that "active metadata" can enable for data producers and consumers? What are the points of friction that those users encounter in the current formulation of metadata systems? Central metadata systems/data catalogs came about as a solution to the challenge of integrating every data tool with every other data tool, giving a single place to integrate. What are the lessons that are being learned from the "modern data stack" that can be applied to centralized metadata? Can you describe the approach that you are taking at Atlan to enable the adoption of "active metadata"? What are the architectural capabilities that you had to build to power the outbound traffic flows? How are you addressing the N x M integration problem for pushing metadata into the necessary contexts at Atlan? What are the interfaces that are necessary for receiving systems to be able to make use of the metadata that is being delivered? How does the type/category of metadata impact the type of integration that is necessary? What are some of the automation possibilities that metadata activation offers for data teams? What are the cases where you still need a human in the loop? What are the most interesting, innovative, or unexpected ways that you have seen active metadata capabilities used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on activating metadata for your users? When is an active approach to metadata the wrong choice? What do you have planned for the future of Atlan and active metadata? Contact Info LinkedIn @prukalpa on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers Links Atlan What is Active Metadata? Segment Podcast Episode Zapier ArgoCD Kubernetes Wix AWS Lambda Modern Data Culture Blog Post The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast |
|
|
Democratize Data Cleaning Across Your Organization With Trifacta
2021-07-09 · 23:00
Adam Wilson
– CEO
@ Trifacta
,
Tobias Macey
– host
Summary Every data project, whether it’s analytics, machine learning, or AI, starts with the work of data cleaning. This is a critical step and benefits from being accessible to the domain experts. Trifacta is a platform for managing your data engineering workflow to make curating, cleaning, and preparing your information more approachable for everyone in the business. In this episode CEO Adam Wilson shares the story behind the business, discusses the myriad ways that data wrangling is performed across the business, and how the platform is architected to adapt to the ever-changing landscape of data management tools. This is a great conversation about how deliberate user experience and platform design can make a drastic difference in the amount of value that a business can provide to their customers. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You listen to this show to learn about all of the latest tools, patterns, and practices that power data engineering projects across every domain. Now there’s a book that captures the foundational lessons and principles that underly everything that you hear about here. I’m happy to announce I collected wisdom from the community to help you in your journey as a data engineer and worked with O’Reilly to publish it as 97 Things Every Data Engineer Should Know. Go to dataengineeringpodcast.com/97things today to get your copy! When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Your host is Tobias Macey and today I’m interviewing Adam Wilson about Trifacta, a platform for modern data workers to assess quality, transform, and automate data pipelines Interview Introduction How did you get involved in the area of data management? Can you describe what Trifacta is and the story behind it? Across your site and material you focus on using the term "data wrangling". What is your personal definition of that term, and in what ways do you differentiate from ETL/ELT? How does the deliberate use of that terminology influence the way that you think about the design and features of the Trifacta platform? What is Trifacta’s role in the overall data platform/data lifecycle for an organization? What are some examples of tools that Trifacta might replace? What tools or systems does Trifacta integrate with? Who are the target end-users of the Trifacta platform and how do those personas direct the design and functionality? Can you describe how Trifacta is architected? How have the goals and design of the system changed or evolved since you first began working on it? Can you talk through the workflow and lifecycle of data as it traverses your platform, and the user interactions that drive it? How can data engineers share and encourage proper patterns for working with data assets with end-users across the organization? What are the limits of scale for volume and complexity of data assets that users are able to manage through Trifacta’s visual tools? What are some strategies that you and your customers have found useful for pre-processing the information that enters your platform to increase the accessibility for end-users to self-serve? What are the most interesting, innovative, or unexpected ways that you have seen Trifacta used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Trifacata? When is Trifacta the wrong choice? What do you have planned for the future of Trifacta? Contact Info LinkedIn @a_adam_wilson on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Links Trifacta Informatica UC Berkeley Stanford University Citadel Podcast Episode Stanford Data Wrangler DBT Podcast Episode Pig Databricks Sqoop Flume SPSS Tableau SDLC == Software Delivery Life-Cycle The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast |
|