talk-data.com
Topic
Data Contracts
64
tagged
Activity Trend
Top Events
This session dives into building a modern data platform on Google Cloud with AI-powered data management. Explore how to leverage data mesh architectures to break down data silos and enable efficient data sharing. Learn how data contracts improve reliability, and discover how real-time ingestion empowers immediate insights. We'll also examine the role of data agents in automating data discovery, preparation, and delivery for optimized AI workflows.
This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.
Building a Scalable Data Foundation in Health Tech | Anna Swigart | Shift Left Data Conference 2025
In healthcare technology, protecting patient privacy while scaling data operations requires reimagining where quality and governance live. This presentation explores Helix's journey of shifting critical processes left in its precision medicine business—from implementing automated data classification and privacy workflows to enlisting cross-functional expertise in refining operational workflows. For clinical data management, we've partnered with healthcare systems to implement OMOP standards and data contracts at the source, creating a robust foundation for research and commercial opportunities. Through practical examples, we'll demonstrate how this upstream approach has transformed our data operations, encouraged internal alignment, and strengthened partner relationships.
Panel: Shift Left Across the Data Lifecycle—Data Contracts, Transformations, Observability, and C...
Panel: Shift Left Across the Data Lifecycle—Data Contracts, Transformations, Observability, and Catalogs | Prukalpa Sankar, Tristan Handy, Barr Moses, Chad Sanderson | Shift Left Data Conference 2025
Join industry-leading CEOs Chad (Data Contracts), Tristan (Data Transformations), Barr (Data Observability), and Prukalpa (Data Catalogs) who are pioneering new approaches to operationalizing data by “Shifting Left.” This engaging panel will explore how embedding rigorous data management practices early in the data lifecycle reduces issues downstream, enhances data reliability, and empowers software engineers with clear visibility into data expectations. Attendees will gain insights into how data contracts define accountability, how effective transformations ensure data usability at scale, how proactive how proactive data and AI observability drives continuous confidence in data quality, and how catalogs enable data discoverability, accelerating innovation and trust across organizations.
Wayfair’s Multi-year Data Mesh Journey | Nachiket Mehta and Piyush Tiwari | Shift Left Data Conference 2025
Wayfair’s multi-year Data Mesh journey involved shifting from a monolithic, centralized data model to a decentralized, domain-driven architecture built on microservices. By embracing Data Mesh principles, Wayfair empowered domain teams to take end-to-end ownership of their data.
Key enablers included a data contract management platform ensure trusted, discoverable data products, and the development of Taxon, an internal ontology and knowledge graph that unified semantics across domains while supporting the company's tech modernization.
Organizationally, Wayfair introduced an Embedded Data Engineering model – embedding data engineers within domain teams – to instill a “Data-as-a-Product” mindset among data producers. This sociotechnical shift ensured that those who create data also own its quality, documentation, and evolution, rather than relying on a centralized BI team. As a result, Wayfair’s data producers are now accountable for well-defined, high-quality data products, and data consumers can more easily discover and trust data through the unified catalog and ontology.
The presentation will highlight how Wayfair has adopted the “shift left” (pushing data ownership and quality to the source teams) and next heading towards “shift right” (focusing on consumer-driven data products and outcomes) to unlock business outcomes. This session will share both technical strategies and business results from Wayfair’s Data Mesh journey.
Data Contracts in the Real World, the Adevinta Spain Implementation | Sergio Catoira | Shift Left Data Conference 2025
This talk covers Adevinta Spain's transition from a best-effort governance model to a governed data integration system by design. By creating source-aligned data products, this shift aims to enhance data quality and reliability from the moment data is ingested.
Shifting From Reactive to Proactive at Glassdoor | Zakariah Siyaji | Shift Left Data Conference 2025
As Glassdoor scaled to petabytes of data, ensuring data quality became critical for maintaining trust and supporting strategic decisions. Glassdoor implemented a proactive, “shift left” strategy focused on embedding data quality practices directly into the development process. This talk will detail how Glassdoor leveraged data contracts, static code analysis integrated into the CI/CD pipeline, and automated anomaly detection to empower software engineers and prevent data issues at the source. Attendees will learn how proactive data quality management reduces risk, promotes stronger collaboration across teams, enhances operational efficiency, and fosters a culture of trust in data at scale.
Mark Freeman joins me to chat about data contracts, the crazy life of being the first employee at a hot startup, writing books and creating content, and much more.
DataEngineering #Startups #AI #DataQuality #DataContracts
In todays episode of Data Engineering Central Podcast we talk about a few hot topics, AWS S3 Tables, Databricks raising money, are Data Contracts Dead, and the Lake House Storage Format battle! It's a good one, buckle up!
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
🌟 Session Overview 🌟
Session Name: Data Contracts In Practice With Debezium and Apache Flink Speaker: Gunnar Morling Session Description: Log-based change data capture (CDC) is an invaluable part of the data engineering toolbox: it enables a variety of use cases such as real-time analytics, full-text search, or cache invalidation by publishing data change events from your database. But when publishing change event streams across context or team boundaries, aren’t you tying external consumers to your application’s data model, thus limiting yourself in evolving the same?
Enter data contracts—consciously designed abstractions between your internal data model and the outside world. Come and join us for this session to learn about:
Challenges you may encounter when exposing table-level change event streams and how data contracts can mitigate them Implementation strategies for data contracts, such as the outbox pattern and stream processing Evolving your data model and the corresponding data contracts without breaking any existing consumers We’ll also touch on some advanced topics at the intersection of CDC and stream processing, such as hydrating partial change events, using the popular change stream processing duo of Debezium and Apache Flink.
🚀 About Big Data and RPA 2024 🚀
Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨
📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP
💡 Stay Connected & Updated 💡
Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!
🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT
Building a Data Mesh The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. In Episode 23 of Data Product Management in Action, our host Frannie Helforoush is joined by Soheil Mirchi, a technical product manager. Soheil discusses his company’s shift from a centralized data lake to a decentralized data mesh architecture. He outlines the three types of data products—source-aligned, aggregated, and customer-facing—and highlights the importance of data contracts and testing. Learn about strategies for measuring success through metrics and customer feedback, along with lessons on starting small and fostering data democratization. Tune in for essential insights on effective data management! About our host Frannie Helforoush: Frannie's journey began as a software engineer and evolved into a strategic product manager. Now, as a data product manager, she leverages her expertise in both fields to create impactful solutions. Frannie thrives on making data accessible and actionable, driving product innovation, and ensuring product thinking is integral to data management. Connect with Frannie on LinkedIn. About our guest Soheil Mirchi :Soheil is a Technical Product Manager at Temedica, a health insights company focused on transforming complex healthcare and pharmaceutical data into actionable insights. Leading a team of data engineers, scientists, and analysts, Soheil drives the development of cutting-edge data products while guiding the company’s transition to a data mesh architecture. He is passionate about empowering teams with the autonomy to manage their own data products and believes in a collaborative approach to driving innovation in the health tech space. Connect with Soheil on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate someone that you know. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!
Jonny will showcase how the team at EQT, one of the world's largest private equity firms, is leveraging the dbt Discovery API, data contracts, tagging, and other dbt features to power discovery through their intranet — and by extension, how this also enables the team to support LLMs for live querying of their data.
Speaker: Jonny Reichwald Analytics Lead EQT
Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements
The Good, the Bad and the Ugly Amy is a Senior Data Solutions and Integration Manager at Bay Wa r.e. Her responsibility was enabling Data Governance, Data Products and Data Mesh. The challenge was building a unified data decentralization framework for dozens of organizations that historically used different stacks, metrics, and processes. Data Mesh is a complex concept, and every organisation views it differently. Amy will share the framework she had implemented for which her team gained leadership buy-in. She will discuss what Amy?s team managed to execute, what they've achieved, and what's on their roadmap. She will also share her learnings from this exciting journey, including securing buy-in from different business units. At 'Journey Building Data Mesh: The Good, The Bad, and The Ugly,' Amynwill focus on: Why Data Mesh, and when it is the right time to start prioritizing it? How did they implement data contracts at the scale, and what is the current progress? What Amy?s team would do differently today on their journey to Data Mesh.
In this session, Chad Sanderson, CEO of Gable.ai and author of the upcoming O’Reilly book: "Data Contracts," tackles the necessity of modern data management in an age of hyper iteration, experimentation, and AI. He will explore why traditional data management practices fail and how the cloud has fundamentally changed data development. The talk will cover a modern application of data management best practices, including data change detection, data contracts, observability, and CI/CD tests, and outline the roles of data producers and consumers.
Attendees will leave with a clear understanding of modern data management's components and how to leverage them for better data handling and decision-making.
The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. We've released a special edition series of minisodes of our podcast. Recorded live at Data Connect 2024, our host Michael Toland engages in short, sweet, informative, and delightful conversations with five prevelant practitioners who are forging their way forward in data and technology.
About our host Michael Toland: Michael is a Product Management Coach and Consultant with Pathfinder Product, a Test Double Operation. Since 2016, Michael has worked on large-scale system modernizations and migration initiatives at Verizon. Outside his professional career, Michael serves as the Treasurer for the New Leaders Council, mentors with Venture for America, sings with the Columbus Symphony, and writes satire for his blog Dignified Product. He is excited to discuss data product management with the podcast audience. Connect with Michael on LinkedIn About our guest Jean-Georges Perrin: Jean-Georges “jgp” Perrin is the Chief Innovation Officer at AbeaData, where he focuses on developing cutting-edge data tooling. He chairs the Open Data Contract Standard (ODCS) at the Linux Foundation's Bitol project, co-founded the AIDA User Group, and has authored several influential books, including Implementing Data Mesh (O'Reilly) and Spark in Action, 2nd Edition (Manning). With over 25 years in IT, Jean-Georges is recognized as a Lifetime IBM Champion, a PayPal Champion, and a Data Mesh MVP. His expertise spans data engineering, governance, and the industrialization of data science. Outside of tech, he enjoys exploring Upstate New York and New England with his family. Connect with J-GP on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate a practitioner. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!
The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. We've released a special edition series of minisodes of our podcast. Recorded live at Data Connect 2024, our host Michael Toland engages in short, sweet, informative, and delightful conversations with five prevelant practitioners who are forging their way forward in data and technology. Recorded on Day 2 of Data Connect 2024, Michael sits down with Kim Theis, CEO Abea Data, for her first appearance at both Data Connect 2024 and in Columbus, Ohio. Kim shares insights from her recent talk at the conference, where she focused on maximizing ROI from data-driven projects—a topic she has extensively researched. Since transitioning from her role at PayPal, Kim has been on a mission to help individuals and organizations better understand the journey toward adopting data products and the tangible benefits they can offer. About our host Michael Toland: Michael is a Product Management Coach and Consultant with Pathfinder Product, a Test Double Operation. Since 2016, Michael has worked on large-scale system modernizations and migration initiatives at Verizon. Outside his professional career, Michael serves as the Treasurer for the New Leaders Council, mentors with Venture for America, sings with the Columbus Symphony, and writes satire for his blog Dignified Product. He is excited to discuss data product management with the podcast audience. Connect with Michael on LinkedIn. About our guest Kim Theis: Kim is the CEO and co-founder of AbeaData, a company transforming how data is leveraged as a product and driving the success of AI initiatives across industries. With over 20 years of experience, Kim has a proven track record of delivering data innovation and excellence in various leadership roles. Before founding AbeaData, she served as the Head of Intelligence Automation at PayPal, where she reorganized the team to support a decentralized data framework, launched a data mesh in under a year, and led the team that pioneered the world's first open-source Data Contract. Her extensive experience also includes executive data consulting and technology leadership roles within Fortune 500 and FTSE 100 companies. Connect with Kim on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate a practitioner. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!
Summary Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be used to enforce guarantees and requirements, and how they fit into the broader context of data observability and quality monitoring. The discussion also covers the challenges and benefits of implementing data contracts, the organizational impact, and the potential for standardization in the field.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.At Outshift, the incubation engine from Cisco, they are driving innovation in AI, cloud, and quantum technologies with the powerful combination of enterprise strength and startup agility. Their latest innovation for the AI ecosystem is Motific, addressing a critical gap in going from prototype to production with generative AI. Motific is your vendor and model-agnostic platform for building safe, trustworthy, and cost-effective generative AI solutions in days instead of months. Motific provides easy integration with your organizational data, combined with advanced, customizable policy controls and observability to help ensure compliance throughout the entire process. Move beyond the constraints of traditional AI implementation and ensure your projects are launched quickly and with a firm foundation of trust and efficiency. Go to motific.ai today to learn more!Your host is Tobias Macey and today I'm interviewing Tom Baeyens about using data contracts to build a clearer API for your dataInterview IntroductionHow did you get involved in the area of data management?Can you describe the scope and purpose of data contracts in the context of this conversation?In what way(s) do they differ from data quality/data observability?Data contracts are also known as the API for data, can you elaborate on this?What are the types of guarantees and requirements that you can enforce with these data contracts?What are some examples of constraints or guarantees that cannot be represented in these contracts?Are data contracts related to the shift-left?Data contracts are also known as the API for data, can you elaborate on this?The obvious application of data contracts are in the context of pipeline execution flows to prevent failing checks from propagating further in the data flow. What are some of the other ways that these contracts can be integrated into an organization's data ecosystem?How did you approach the design of the syntax and implementation for Soda's data contracts?Guarantees and constraints around data in different contexts have been implemented in numerous tools and systems. What are the areas of overlap in e.g. dbt, great expectations?Are there any emerging standards or design patterns around data contracts/guarantees that will help encourage portability and integration across tooling/platform contexts?What are the most interesting, innovative, or unexpected ways that you have seen data contracts used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data contracts at Soda?When are data contracts the wrong choice?What do you have planned for the future of data contracts?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SodaPodcast EpisodeJBossData ContractAirflowUnit TestingIntegration TestingOpenAPIGraphQLCircuit Breaker PatternSodaCLSoda Data ContractsData MeshGreat Expectationsdbt Unit TestsOpen Data ContractsODCS == Open Data Contract StandardODPS == Open Data Product SpecificationThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Jean-Georges Perrin is a serial startup founder, currently co-founder of AbeaData [https://abeadata.com/], and co-author of "Implementing Data Mesh." He is the one who championed the PayPal's data contract project, which is now part of Bitol and the Linux Foundation. In this episode, JGP speaks about building and maintaining open-source data contract solutions using open standards. He shares a lot about why and how he came to it and the challenges of maintaining it to avoid appropriation of the solution. JGP discusses how they balance the interests of different groups in developing a community around open data contract standards. More importantly, he shares how data contracts can positively change the life of every data engineer.Check out JGP's LinkedInCheck out Bitol - Open Standards for Data Contracts and become a contributor.
Andrew Jones, principal engineer at GoCardless, is the author of the book "Driving Data Quality with Data Contracts." During this session, we talked a lot about what a data platform is, who data platform engineers are, what it takes to make a data platform reliable, and, most importantly, how Andrew and his team managed to build a reliable platform at GoCardless. Sure enough, we touched a little on data contracts, their implementation, and the possibility of vendors doing the same as Andrew's team did.Andrew's LinkedIn - https://www.linkedin.com/in/andrewrhysjones/
This presentation explores the challenges and evolution in data and technology, emphasizing the growth of data sources, user concurrency, and distributed ownership. It delves into the strategic significance of a unified analytics tier to enable a Phase 2 of virtualization. Key components like an Entitlement Service, Data Catalog(s), MALT, and Data Contracts are discussed and what has been learned forcing those technologies and patterns to also evolve. This presentation underscores the importance of aligning with business outcomes, tracking the influence on Total Cost of Ownership (TCO), and establishing or improving a governance strategy as the organization matures.