talk-data.com talk-data.com

Topic

Datafold

data_diffing data_quality data_observability

3

tagged

Activity Trend

13 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Nick Schrock ×

Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his journey from initial skepticism to embracing agentic AI as model and application advancements made it practical for governed workflows, and explores how Compass redefines the relationship between data teams and stakeholders by shifting analysts into steward roles, capturing and governing context, and integrating with Slack where collaboration already happens. The conversation covers organizational observability through Compass's conversational system of record, cost control strategies, and the implications of agentic collaboration on Conway's Law, as well as what's next for Compass and Nick's optimistic views on AI-accelerated software engineering.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Nick Schrock about building an AI analyst that keeps data teams in the loopInterview IntroductionHow did you get involved in the area of data management?Can you describe what Compass is and the story behind it?context repository structurehow to keep it relevant/avoid sprawl/duplicationproviding guardrailshow does a tool like Compass help provide feedback/insights back to the data teams?preparing the data warehouse for effective introspection by the AILLM selectioncost managementcaching/materializing ad-hoc queriesWhy Slack and enterprise chat are important to b2b softwareHow AI is changing stakeholder relationshipsHow not to overpromise AI capabilities How does Compass relate to BI?How does Compass relate to Dagster and Data Infrastructure?What are the most interesting, innovative, or unexpected ways that you have seen Compass used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Compass?When is Compass the wrong choice?What do you have planned for the future of Compass?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DagsterDagster LabsDagster PlusDagster CompassChris Bergh DataOps EpisodeRise of Medium Code blog postContext EngineeringData StewardInformation ArchitectureConway's LawTemporal durable execution frameworkThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Summary In this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insights on how it will ultimately enhance productivity and expand software engineering's scope. He delves into the current state of AI adoption, the importance of maintaining core data engineering principles, and the need for human oversight when leveraging AI tools effectively. Nick also introduces Dagster's new components feature, designed to modularize and standardize data transformation processes, making it easier for teams to collaborate and integrate AI into their workflows. Join in to explore the future of data engineering, the potential for AI to abstract away complexity, and the importance of open standards in preventing walled gardens in the tech industry.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementThis episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial. Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th.Your host is Tobias Macey and today I'm interviewing Nick Schrock about lowering the barrier to entry for data platform consumersInterview IntroductionHow did you get involved in the area of data management?Can you start by giving your summary of the impact that the tidal wave of AI has had on data platforms and data teams?For anyone who hasn't heard of Dagster, can you give a quick summary of the project?What are the notable changes in the Dagster project in the past year?What are the ecosystem pressures that have shaped the ways that you think about the features and trajectory of Dagster as a project/product/community?In your recent release you introduced "components", which is a substantial change in how you enable teams to collaborate on data problems. What was the motivating factor in that work and how does it change the ways that organizations engage with their data?tension between being flexible and extensible vs. opinionated and constrainedincreased dependency on orchestration with LLM use casesreducing the barrier to contribution for data platform/pipelinesbringing application engineers into the mixchallenges of meeting users/teams where they are (languages, platform investments, etc.)What are the most interesting, innovative, or unexpected ways that you have seen teams applying the Components pattern?What are the most interesting, unexpected, or challenging lessons that you have learned while working on the latest iterations of Dagster?When is Dagster the wrong choice?What do you have planned for the future of Dagster?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links Dagster+ EpisodeDagster Components Slide DeckThe Rise Of Medium CodeLakehouse ArchitectureIcebergDagster ComponentsPydantic ModelsKubernetesDagster PipesRuby on RailsdbtSlingFivetranTemporalMCP == Model Context ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Summary

Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm welcoming back Nick Schrock to talk about the state of the ecosystem for data orchestration

Interview

Introduction How did you get involved in the area of data management? Can you start by defining what data orchestration is and how it differs from other types of orchestration systems? (e.g. container orchestration, generalized workflow orchestration, etc.) What are the misconceptions about the applications of/need for/cost to implement data orchestration?

How do those challenges of customer education change across roles/personas?

Because of the multi-faceted nature of data in an organization, how does that influence the capabilities and interfaces that are needed in an orchestration engine? You have been working on Dagster for five years now. How have the requirements/adoption/application for orchestrators changed in that time? One of the challenges for any orchestration engine is to balance the need for robust and extensible core capabilities with a rich suite of integrations to the broader data ecosystem. What are the factors that you have seen make the most influence in driving adoption of a given engine? What are the most interesting, innovative, or unexpected ways that you have seen data orchestration implemented and/or used? What are the most interesting, unexpected, or challenging lessons that you have learned while working o