talk-data.com
People (1 result)
Activities & events
| Title & Speakers | Event |
|---|---|
|
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
2025-09-30 · 16:00
Day and Time Sept 30 at 9 AM Pacific Location Virtual. Register for the Zoom. Are you working with computer vision in manufacturing and need deeper visibility into your datasets and models? Join us for a free 90-minute hands-on workshop and learn how to leverage the open-source FiftyOne toolset to optimize your visual AI workflows, from anomaly detection on the production line to worker safety and quality assurance in additive manufacturing. In this session, you'll learn how to:
We'll take a data-centric approach to computer vision, starting with importing and exploring industrial visual data, including defects, wear patterns, and worker posture. You'll learn to query and filter datasets to surface edge cases, then use plugins and native integrations to streamline workflows. We'll walk through generating candidate ground truth labels and evaluating fine-tuned foundational models — particularly relevant to manufacturers using pre-trained models for tasks like defect segmentation or object localization in dynamic environments. By the end, you'll see how the FiftyOne App and SDK work together to enable more profound insight into visual AI systems. We'll conclude with a demo showcasing 3D view reconstruction for industrial inspection, revealing how Visual AI can bridge physical and digital layers of your production process. Prerequisites: Basic knowledge of Python and computer vision fundamentals. Resources Provided: All attendees will receive access to tutorials, videos, and the workshop codebase. About the Instructor Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
|
|
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
2025-09-30 · 16:00
Day and Time Sept 30 at 9 AM Pacific Location Virtual. Register for the Zoom. Are you working with computer vision in manufacturing and need deeper visibility into your datasets and models? Join us for a free 90-minute hands-on workshop and learn how to leverage the open-source FiftyOne toolset to optimize your visual AI workflows, from anomaly detection on the production line to worker safety and quality assurance in additive manufacturing. In this session, you'll learn how to:
We'll take a data-centric approach to computer vision, starting with importing and exploring industrial visual data, including defects, wear patterns, and worker posture. You'll learn to query and filter datasets to surface edge cases, then use plugins and native integrations to streamline workflows. We'll walk through generating candidate ground truth labels and evaluating fine-tuned foundational models — particularly relevant to manufacturers using pre-trained models for tasks like defect segmentation or object localization in dynamic environments. By the end, you'll see how the FiftyOne App and SDK work together to enable more profound insight into visual AI systems. We'll conclude with a demo showcasing 3D view reconstruction for industrial inspection, revealing how Visual AI can bridge physical and digital layers of your production process. Prerequisites: Basic knowledge of Python and computer vision fundamentals. Resources Provided: All attendees will receive access to tutorials, videos, and the workshop codebase. About the Instructor Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
|
|
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
2025-09-30 · 16:00
Day and Time Sept 30 at 9 AM Pacific Location Virtual. Register for the Zoom. Are you working with computer vision in manufacturing and need deeper visibility into your datasets and models? Join us for a free 90-minute hands-on workshop and learn how to leverage the open-source FiftyOne toolset to optimize your visual AI workflows, from anomaly detection on the production line to worker safety and quality assurance in additive manufacturing. In this session, you'll learn how to:
We'll take a data-centric approach to computer vision, starting with importing and exploring industrial visual data, including defects, wear patterns, and worker posture. You'll learn to query and filter datasets to surface edge cases, then use plugins and native integrations to streamline workflows. We'll walk through generating candidate ground truth labels and evaluating fine-tuned foundational models — particularly relevant to manufacturers using pre-trained models for tasks like defect segmentation or object localization in dynamic environments. By the end, you'll see how the FiftyOne App and SDK work together to enable more profound insight into visual AI systems. We'll conclude with a demo showcasing 3D view reconstruction for industrial inspection, revealing how Visual AI can bridge physical and digital layers of your production process. Prerequisites: Basic knowledge of Python and computer vision fundamentals. Resources Provided: All attendees will receive access to tutorials, videos, and the workshop codebase. About the Instructor Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
|
|
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
2025-09-30 · 16:00
Day and Time Sept 30 at 9 AM Pacific Location Virtual. Register for the Zoom. Are you working with computer vision in manufacturing and need deeper visibility into your datasets and models? Join us for a free 90-minute hands-on workshop and learn how to leverage the open-source FiftyOne toolset to optimize your visual AI workflows, from anomaly detection on the production line to worker safety and quality assurance in additive manufacturing. In this session, you'll learn how to:
We'll take a data-centric approach to computer vision, starting with importing and exploring industrial visual data, including defects, wear patterns, and worker posture. You'll learn to query and filter datasets to surface edge cases, then use plugins and native integrations to streamline workflows. We'll walk through generating candidate ground truth labels and evaluating fine-tuned foundational models — particularly relevant to manufacturers using pre-trained models for tasks like defect segmentation or object localization in dynamic environments. By the end, you'll see how the FiftyOne App and SDK work together to enable more profound insight into visual AI systems. We'll conclude with a demo showcasing 3D view reconstruction for industrial inspection, revealing how Visual AI can bridge physical and digital layers of your production process. Prerequisites: Basic knowledge of Python and computer vision fundamentals. Resources Provided: All attendees will receive access to tutorials, videos, and the workshop codebase. About the Instructor Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
|
|
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
2025-09-30 · 16:00
Day and Time Sept 30 at 9 AM Pacific Location Virtual. Register for the Zoom. Are you working with computer vision in manufacturing and need deeper visibility into your datasets and models? Join us for a free 90-minute hands-on workshop and learn how to leverage the open-source FiftyOne toolset to optimize your visual AI workflows, from anomaly detection on the production line to worker safety and quality assurance in additive manufacturing. In this session, you'll learn how to:
We'll take a data-centric approach to computer vision, starting with importing and exploring industrial visual data, including defects, wear patterns, and worker posture. You'll learn to query and filter datasets to surface edge cases, then use plugins and native integrations to streamline workflows. We'll walk through generating candidate ground truth labels and evaluating fine-tuned foundational models — particularly relevant to manufacturers using pre-trained models for tasks like defect segmentation or object localization in dynamic environments. By the end, you'll see how the FiftyOne App and SDK work together to enable more profound insight into visual AI systems. We'll conclude with a demo showcasing 3D view reconstruction for industrial inspection, revealing how Visual AI can bridge physical and digital layers of your production process. Prerequisites: Basic knowledge of Python and computer vision fundamentals. Resources Provided: All attendees will receive access to tutorials, videos, and the workshop codebase. About the Instructor Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
|
|
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
2025-09-30 · 16:00
Day and Time Sept 30 at 9 AM Pacific Location Virtual. Register for the Zoom. Are you working with computer vision in manufacturing and need deeper visibility into your datasets and models? Join us for a free 90-minute hands-on workshop and learn how to leverage the open-source FiftyOne toolset to optimize your visual AI workflows, from anomaly detection on the production line to worker safety and quality assurance in additive manufacturing. In this session, you'll learn how to:
We'll take a data-centric approach to computer vision, starting with importing and exploring industrial visual data, including defects, wear patterns, and worker posture. You'll learn to query and filter datasets to surface edge cases, then use plugins and native integrations to streamline workflows. We'll walk through generating candidate ground truth labels and evaluating fine-tuned foundational models — particularly relevant to manufacturers using pre-trained models for tasks like defect segmentation or object localization in dynamic environments. By the end, you'll see how the FiftyOne App and SDK work together to enable more profound insight into visual AI systems. We'll conclude with a demo showcasing 3D view reconstruction for industrial inspection, revealing how Visual AI can bridge physical and digital layers of your production process. Prerequisites: Basic knowledge of Python and computer vision fundamentals. Resources Provided: All attendees will receive access to tutorials, videos, and the workshop codebase. About the Instructor Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
|
|
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
2025-09-30 · 16:00
Day and Time Sept 30 at 9 AM Pacific Location Virtual. Register for the Zoom. Are you working with computer vision in manufacturing and need deeper visibility into your datasets and models? Join us for a free 90-minute hands-on workshop and learn how to leverage the open-source FiftyOne toolset to optimize your visual AI workflows, from anomaly detection on the production line to worker safety and quality assurance in additive manufacturing. In this session, you'll learn how to:
We'll take a data-centric approach to computer vision, starting with importing and exploring industrial visual data, including defects, wear patterns, and worker posture. You'll learn to query and filter datasets to surface edge cases, then use plugins and native integrations to streamline workflows. We'll walk through generating candidate ground truth labels and evaluating fine-tuned foundational models — particularly relevant to manufacturers using pre-trained models for tasks like defect segmentation or object localization in dynamic environments. By the end, you'll see how the FiftyOne App and SDK work together to enable more profound insight into visual AI systems. We'll conclude with a demo showcasing 3D view reconstruction for industrial inspection, revealing how Visual AI can bridge physical and digital layers of your production process. Prerequisites: Basic knowledge of Python and computer vision fundamentals. Resources Provided: All attendees will receive access to tutorials, videos, and the workshop codebase. About the Instructor Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Sept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
|
|
Eindhoven Data Community meetup 19 - ASML
2024-11-21 · 16:00
We’re excited to return to ASML for our annual meetup! This year, we have two concurrent tracks featuring a total of four talks. We're also thrilled to welcome a special guest from the US coming over to ASML: Joe Reis, author of "The Fundamentals of Data Engineering," who will be doing an AMA. Joe \| Ask Me Anything about Data Engineering or Otherwise Joe Reis is here to answer all of your questions about data engineering, the state of the industry and technology, and anything else on your mind. This is a very rare change to have a free-flowing conversation with Joe Reis. Cristiano & Shashank\| Automating Creating Trusted Data Products: a developer experience-driven approach Creating high-quality data products is a complex task that often burdens data professionals with repetitive activities. Our trusted dataset creation framework aims to alleviate this challenge by providing a comprehensive mechanism that automates essential processes in data product development. This presentation will delve into how it not only simplifies workflows but also improves developer experience by enhancing feedback loops and cognitive load. Juan \| Standardization of Predictive Maintenance Pipelines Juan will show how his team, The Model Factory, is currently setting up a framework that ensures that all our predictive maintenance pipelines follow standards that ensure 1) Short time-to-market, 2) maintainability, and 3) interpretability of outputs and intermediate calculations. Ismael & Ricardo \| Airflow 3.0: A New Perspective on MLOps and GenAi The new version of Airflow is more than just a tool for data orchestration, and is coming up early 2025. Airflow It's evolving to meet the needs derived by the explosion of GenAi applications, and it is even changing its internal architecture to be faster and more flexible. In this talk, we'll discuss how Airflow 3.0 is evolving to support the requirements of modern applications. We'll also provide a practical example of using Airflow with a RAG implementation. It's a look at the future of Airflow, and we hope you'll join us. Program 17:00 – 18:00 🍕 Food Track 1
Track 2
20:00-21:00 🥤 Drinks 20:15-21:00 Tour ASML experience center Joe Reis \| Author\, data engineer\, "recovering data scientist" Joe Reis, a "recovering data scientist" with 20 years in the data industry, is the co-author of the best-selling O'Reilly book, "Fundamentals of Data Engineering." He’s also the instructor for the wildly popular Data Engineering Professional Specialization on Coursera, created with DeepLearning.ai and AWS. Joe’s extensive experience encompasses data engineering, data architecture, machine learning, and more. He regularly keynotes major data conferences globally, advises and invests in innovative data product companies, writes at Practical Data Modeling and his personal blog, and hosts the popular data podcasts "The Monday Morning Data Chat" and "The Joe Reis Show." In his free time, Joe is dedicated to writing new books and articles, and thinking of ways to advance the data industry. Cristiano Rocha \| Lead Data Engineer Cristiano is a lead engineer at ASML with an educational background in Distributed and Parallel Computing. With over 15+ years of experience in on-premise and cloud data-based solutions, Cristiano has a wealth of knowledge in building and maturing high-impact data platforms and self-service analytics programs for large organizations. He has extensive experience in a variety of roles, including data infrastructure engineer, self-service analytics platform engineer, data engineer, big data competence lead, DataOps competence lead, machine learning engineer, and data analyst. Shashank Shekhar \| Senior Data Engineer Shashank is a Senior Data Engineer at ASML with extensive expertise in cross-cloud technologies and architecting and optimizing data pipelines that drive actionable insights. Over 7 years in the industry, Shashank has successfully executed complex data projects, enabling organizations to harness the full potential of their data. Juan Manuel Ortiz Sevillano \| Machine Learning Engineer Juan is originally a Data Scientist who turned into a Machine Learning Engineer driven by the need to make ML models produce actual value. He currently focuses on reducing time-to-market and improving maintainability of Predictive Maintenance pipelines at ASML Ismael Cabral \| Author\, Machine Learning Engineer Ismael is a Machine Learning Engineer and Airflow trainer at Xebia Data in The Netherlands. At the same time, he is currently co-authoring the 2nd version of “Data Pipelines with Apache Airflow”. Ricardo Granados \| Author\, Analytics Engineer Ricardo Granados, co-author of Fundamentals of Analytics Engineering, is an analytics engineer specializing in data engineering and analysis. With a master’s in IT management and a focus on data science, he is proficient in using various programming languages and tools. Ricardo is skilled in exploring efficient alternatives and has contributed to multicultural teams, creating business value with data products using modern data stack solutions. As an analytics engineer, he helps companies enhance data value through data modeling, best practices, task automation, and data quality improvement. Note: For security reasons, we must register all visitors in advance. When registering, we ask for additional information such as first and last name, e-mail address and possibly license plate of your vehicle if you want to use a parking facility. Please use the extra field "Reason for visiting" to register your license plate. Please note: bring a valid ID! |
Eindhoven Data Community meetup 19 - ASML
|
|
Exploring data technologies using free tools
2024-10-22 · 17:00
AGENDA 18.00 – 18:20 Meet & Greet 18:20 - 19:10: Exploring data technologies using free tools, Dom Winsor and Phil Austin An overview, with demos, of how to develop your data skills using free tools for data manipulation, visualisation and transformation using languages like SQL and Python in the cloud and on your (Windows) laptop. Bio's Dom Winsor Creative and Data Technologist turned Product Manager. Dom has worked across digital disciplines, primarily in the education and public sector starting out doing visual/interface design and development for web & eLearning products, leading to specialism in data-driven interactive applications, then data collection, analysis, and presentation solutions. Many of these multi-faceted roles required agile development and delivery skills, which he currently applies to Product Management for large scale data integration platforms. Dom also helps with local tech community activities, currently including volunteering at a kids code club, helping organise the Data Bristol user group and earlier this year co-organising a Product 'Un-conference'. Phil Austin Senior Consultant at Telefónica Tech, Speaker Phil has spent over 20 years down the data mines, working for household names and total unknowns. He knows a fair bit about BI and data warehousing using SQL Server and Azure. He also takes an interest in testing, query tuning, automation, development lifecycle and what is now known as DevOps, but don’t hold that against him. Outside of that you might see him hacking around Bristol on a bike, sometimes for charity. If you buy him a drink he might tell you about the time a Nokia executive told him Apple don’t know anything about phones. 19:10 - 19:40 Pizza and Networking 19:40 - 20:30 Q&A Panel Discussion We had some great feedback from the group previously in the year, So we have taken this onboard and are going to host a Panel of Data Expects covering Data Engineering to Data Products. The theme is: Developing Data Engineering Capabilities for Yourself and for Your Organisation James Yarrow will host the Panel with some set questions and live questions from you as the audience via Slido! We will share a link on the day. The Panel will feature the below. Anna Wykes, Dual Microsoft MVP and Databrick Champion Currently at DataBricks US, Organiser for Data Bristol, Data Summit and more Naill Langley, Data Engineer and Architect, Blogger and Speaker Dom Winsor, Data Integration Platform Product Manager, Speaker and Master of Socials for Data Bristol And hopefully a guest speaker! ----- Event sponsors We would like to thank our generous sponsors for supporting us: Ovo Energy (www.ovoenergy.com) Location The venue is: Ovo Energy, 1 Rivergate Temple Quay, ------ Photos We ask that you do NOT take photos at this meetup. We will invite people to be included in a group photo/s during the event. Speakers will let you know if it's okay to photograph their presentation (excluding other attendees). You may see organisers taking photos during the talks. These will be of speakers, if they have agreed to this, and will not include faces of attendees. |
Exploring data technologies using free tools
|
|
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable
2023-10-15 · 23:00
Eric Sammer
– Founder
@ Decodable
,
Tobias Macey
– host
Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable Interview Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it? What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction? What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data? How have you worked to address that in the Decodable platform and interfaces? As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable? Contact Info esammer on GitHub LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers Links Decodable Podcast Episode Understanding the Apache Flink Journey Flink Podcast Episode Debezium Podcast Episode Kafka Redpanda Podcast Episode Kinesis PostgreSQL Podcast Episode Snowflake Podcast Episode Databricks Startree Pinot Podcast Episode Rockset Podcast Episode Druid InfluxDB Samza Storm Pulsar Podcast Episode ksqlDB Podcast Episode dbt GitHub Actions Airbyte Singer Splunk Outbox Pattern The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
Neo4J: NODES 2023 is a free online conference focused on graph-driven innovations with content for all skill levels. Its 24 hours are packed with 90 interactive technical sessions from top developers and data scientists across the world covering a broad range of topics and use cases. The event tracks: - Intelligent Applications: APIs, Libraries, and Frameworks – Tools and best practices for creating graph-powered applications and APIs with any software stack and programming language, including Java, Python, and JavaScript - Machine Learning and AI – How graph technology provides context for your data and enhances the accuracy of your AI and ML projects (e.g.: graph neural networks, responsible AI) - Visualization: Tools, Techniques, and Best Practices – Techniques and tools for exploring hidden and unknown patterns in your data and presenting complex relationships (knowledge graphs, ethical data practices, and data representation) Don’t miss your chance to hear about the latest graph-powered implementations and best practices for free on October 26 at NODES 2023. Go to Neo4j.com/NODES today to see the full agenda and register!Rudderstack: Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize: You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing. Go to materialize.com today and get 2 weeks free!Datafold: This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare… |
|
|
Dan Delorey
– VP of Data
@ SoFi
,
Tobias Macey
– host
Summary Dan Delorey helped to build the core technologies of Google’s cloud data services for many years before embarking on his latest adventure as the VP of Data at SoFi. From being an early engineer on the Dremel project, to helping launch and manage BigQuery, on to helping enterprises adopt Google’s data products he learned all of the critical details of how to run services used by data platform teams. Now he is the consumer of many of the tools that his work inspired. In this episode he takes a trip down memory lane to weave an interesting and informative narrative about the broader themes throughout his work and their echoes in the modern data ecosystem. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at dataengineeringpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host is Tobias Macey and today I’m interviewing Dan Delorey about his journey through the data ecosystem as the current head of data at SoFi, prior engineering leader with the BigQuery team, and early engineer on Dremel Interview Introduction How did you get involved in the area of data management? Can you start by sharing what your current relationship to the data ecosystem is and the cliffs-notes version of how you ended up there? Dremel was a ground-breaking technology at the time. What do you see as its lasting impression on the landscape of data both in and outside of Google? You were instrumental in crafting the vision behind "querying data in place," (what they called, federated data) at Dremel and BigQuery. What do you mean by this? How has this approach evolved? What are some challenges with this approach? How well did the Drill project capture the core principles of Dremel as outlined in the eponymous white paper? Following your work on Drill you were involved with the development and growth of BigQuery and the broader suite of Google Cloud’s data platform. |
|
|
Maxime Beauchemin
– guest
,
Tobias Macey
– host
Summary Data Engineering is still a relatively new field that is going through a continued evolution as new technologies are introduced and new requirements are understood. In this episode Maxime Beauchemin returns to revisit what it means to be a data engineer and how the role has changed over the past 5 years. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more. The first 10 people to request a personalized product tour will receive an exclusive Monte Carlo Swag box. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch. Your host is Tobias Macey and today I’m interviewing Maxime Beauchemin about the impacts that the evolution of the modern data stack has had on the role and responsibilities of data engineers Interview Introduction How did you get involved in the area of data management? What is your current working definition of a data engineer? How has that definition changed since your article on the "rise of the data engineer" and episode 3 of this show about "defining data engineering"? How has the growing availability of data infrastructure services shifted foundational skills and knowledge that are necessary to be effective? How should a new/aspiring data engineer focus their time and energy to become effective? One of the core themes in this current spate of technologies is "democratization of data". In your post on the downfall of the data engineer you called out the pressure on data engineers to maintain control with so many contributors with varying levels of skill and understanding. How well is the "modern data stack" balancing these concerns? An interesting impact of the growing usage of data is the constrained availability of data engineers. How do you see the effects of the job market on driving evolution of tooling and services? With the explosion of tools and services for working with data, a new problem has evolved of which ones to use for a given organization. What do you see as |
|



