talk-data.com talk-data.com

Zhamak Dehghani

Speaker

Zhamak Dehghani

12

talks

creator of Data Mesh Nextdata

Zhamak Dehghani is a pioneering technologist, author, and thought leader known for creating the Data Mesh paradigm and the concept of Autonomous Data Products, implemented by Nextdata OS. Her work has redefined data architecture by promoting decentralized, domain-oriented infrastructure that treats data as a product. Born in Iran, Dehghani holds a Bachelor of Engineering in Computer Software from Shahid Beheshti University and a Master’s in IT Management from the University of Sydney. With over two decades of experience as a software engineer and technologist, she has contributed to multiple patents in distributed systems. As Director of Emerging Technologies at ThoughtWorks, she introduced Data Mesh in 2018. Today, she is the founder and CEO of Nextdata, a software company providing a scalable, federated platform for Autonomous Data Products. Dehghani is also the author of Data Mesh: Delivering Data-Driven Value at Scale and co-author of Software Architecture: The Hard Parts, and a frequent keynote speaker worldwide.

Bio from: Databricks DATA + AI Summit 2023

Frequent Collaborators

Filter by Event / Source

Talks & appearances

12 activities · Newest first

Search activities →

As enterprises scale their deployment of Generative AI (Gen AI), a central constraint has come into focus: the primary limitation is no longer model capability, but data infrastructure. Existing platforms, optimized for human interpretation and batch-oriented analytics, are misaligned with the operational realities of autonomous agents that consume, reason over, and act upon data continuously at machine scale. 

In this talk, Zhamak Dehghani — originator of the Data Mesh and a leading advocate for decentralized data architectures — presents a framework for data infrastructure designed explicitly for the AI-native era. She identifies the foundational capabilities required by Gen AI applications: embedded semantics, runtime computational policy enforcement, agent-centric, context-driven discovery.

The session contrasts the architectural demands of AI with the limitations of today’s fragmented, pipeline-driven systems—systems that rely heavily on human intervention and customized orchestration. Dehghani introduces autonomous data products as the next evolution: self-contained, self-governing services that continuously sense and respond to their environment. She offers an architectural deep dive and showcases their power with real-world use cases.  

Attendees will learn the architecture of “Data 3.0”, and how to both use GenAI to transform to this new architecture, and how this new architecture serves GenAI agents at scale.

As enterprises scale their deployment of Generative AI (Gen AI), a central constraint has come into focus: the primary limitation is no longer model capability, but data infrastructure. Existing platforms, optimized for human interpretation and batch-oriented analytics, are misaligned with the operational realities of autonomous agents that consume, reason over, and act upon data continuously at machine scale. 

In this talk, Zhamak Dehghani — originator of the Data Mesh and a leading advocate for decentralized data architectures — presents a framework for data infrastructure designed explicitly for the AI-native era. She identifies the foundational capabilities required by Gen AI applications: embedded semantics, runtime computational policy enforcement, agent-centric, context-driven discovery.

The session contrasts the architectural demands of AI with the limitations of today’s fragmented, pipeline-driven systems—systems that rely heavily on human intervention and customized orchestration. Dehghani introduces autonomous data products as the next evolution: self-contained, self-governing services that continuously sense and respond to their environment. She offers an architectural deep dive and showcases their power with real-world use cases.  

Attendees will learn the architecture of “Data 3.0”, and how to both use GenAI to transform to this new architecture, and how this new architecture serves GenAI agents at scale.

It’s now over six years since the emergence of the paper "How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh” by Zhamak Dehghani that had a major impact on the data and analytics industry. 

It highlighted major data architecture failures and called for a rethink in data architecture and in data provisioning by creating a data supply chain and democratising data engineering to enable business domain-oriented creation of reusable data products to make data products available as self-governing services. 

Since then, we have seen many companies adopt Data Mesh strategies, and the repositioning of some software products as well as the emergence of new ones to emphasize democratisation. But is what has happened since totally addressing the problems that Data Mesh was intending to solve? And what new problems are arising as organizations try to make data safely available to AI projects at machine-scale?  

In this unmissable session Big Data LDN Chair Mike Ferguson sits down with Zhamak Dehghani to talk about what has happened since Data Mesh emerged. It will look at:

● The drivers behind Data Mesh

● Revisiting Data Mesh to clarify on what a data product is and what Data Mesh is intending to solve

● Did data architecture really change or are companies still using existing architecture to implement this?

● What about technology to support this - Is Data Fabric the answer or best of breed tools? 

● How critical is organisation to successful Data Mesh implementation

● Roadblocks in the way of success e.g., lack of metadata standards

● How does Data Mesh impact AI?

● What’s next on the horizon?

Zhamak Dehghani (creator of Data Mesh, CEO of NextData) joins me to chat about what she thinks is next in data - autonomous data products, decentralized data and AI, and much more.

Zhamak is one of the people I most respect in our industry. She's a once-in-a generation phenomenon who will change the trajectory of our industry.

We talked about:

Zhamak’s background What is Data Mesh? Domain ownership Determining what to optimize for with Data Mesh Decentralization Data as a product Self-serve data platforms Data governance Understanding Data Mesh Adopting Data Mesh Resources on implementing Data Mesh

Links:

Free 30-day code from O'Reilly: https://learning.oreilly.com/get-learning/?code=DATATALKS22 Data Mesh book: https://learning.oreilly.com/library/view/data-mesh/9781492092384/ LinkedIn: https://www.linkedin.com/in/zhamak-dehghani

ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Day 1 Afternoon Keynote |  Data + AI Summit 2022

Day 1 Afternoon Keynote | Data + AI Summit 2022 Supercharging our data architecture at Coinbase using Databricks Lakehouse | Eric Sun | Keynote Partner Connect & Ecosystem Strategy | Zaheera Valani What are ELT and CDC, and why are all the cool kids doing it? |George Fraser Analytics without Compromise | Francois Ajenstat Fireside Chat with Zhamak Dehghani and Arsalan Tavakoli

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Fireside Chat with Zhamak Dehghani and Arsalan Tavakoli | Keynote Data + AI Summit 2022

Join Zhamak Dehghani - creator of Data Mesh and Arsalan Tavakoli Co-founder and SVP Field Engineering of Databricks

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Mesh

We're at an inflection point in data, where our data management solutions no longer match the complexity of organizations, the proliferation of data sources, and the scope of our aspirations to get value from data with AI and analytics. In this practical book, author Zhamak Dehghani introduces data mesh, a decentralized sociotechnical paradigm drawn from modern distributed architecture that provides a new approach to sourcing, sharing, accessing, and managing analytical data at scale. Dehghani guides practitioners, architects, technical leaders, and decision makers on their journey from traditional big data architecture to a distributed and multidimensional approach to analytical data management. Data mesh treats data as a product, considers domains as a primary concern, applies platform thinking to create self-serve data infrastructure, and introduces a federated computational model of data governance. Get a complete introduction to data mesh principles and its constituents Design a data mesh architecture Guide a data mesh strategy and execution Navigate organizational design to a decentralized data ownership model Move beyond traditional data warehouses and lakes to a distributed data mesh

Summary The data mesh is a thesis that was presented to address the technical and organizational challenges that businesses face in managing their analytical workflows at scale. Zhamak Dehghani introduced the concepts behind this architectural patterns in 2019, and since then it has been gaining popularity with many companies adopting some version of it in their systems. In this episode Zhamak re-joins the show to discuss the real world benefits that have been seen, the lessons that she has learned while working with her clients and the community, and her vision for the future of the data mesh.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Your host is Tobias Macey and today I’m welcoming back Zhamak Dehghani to talk about her work on the data mesh book and the lessons learned over the past 2 years

Interview

Introduction How did you get involved in the area of data management? Can you start by giving a brief recap of the principles of the data mesh and the story behind it? How has your view of the principles of the data mesh changed since our conversation in July of 2019? What are some of the ways that your work on the data mesh book influenced your thinking on the practical elements of implementing a data mesh? What do you view as the as-yet-unknown elements of the technical and social design constructs that are needed for a sustainable data mesh implementation? In the opening of your book you state that "Data Mesh is a new approach in sourcing, managing, and accessing data for analytical use cases at scale". As with everything, scale is subjective, but what are some of the heuristics that you rely on for determining when a data mesh is an appropriate solution? What are some of the ways that data mesh concepts manifest at the boundaries of organizations? While the idea of federated access to data product quanta reduces the amount of coordination necessary at the organizational level, it raises the spectre of more complex logic required for consumers of multiple quanta. How can data mesh implementations mitigate the impact of this problem? What are some of the technical components that you have found to be best suited to the implementation of data elements within a mesh? What are the technological components that are still missing for a mesh-native data platform? How should an organization that wishes to implement a mesh style architecture think about the roles and skills that they will need on staff?

How can vendors factor into the solution?

What is the role of application developers in a data mesh ecosystem and how do they need to change their thinking around the interfaces that they provide in their products? What are the most interesting, innovative, or unexpected ways that you have seen data mesh principles used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data mesh implementations? When is a data mesh the wrong approach? What do you think the future of the data mesh will look like?

Contact Info

LinkedIn @zhamakd on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Data Engineering Podcast Data Mesh Interview Data Mesh Book Thoughtworks Expert Systems OpenLineage

Podcast Episode

Data Mesh Learning

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Summary The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access. In this episode Zhamak Dehghani shares an alternative approach in the form of a data mesh. Rather than connecting all of your data flows to one destination, empower your individual business units to create data products that can be consumed by other teams. This was an interesting exploration of a different way to think about the relationship between how your data is produced, how it is used, and how to build a technical platform that supports the organizational needs of your business.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! And to grow your professional network and find opportunities with the startups that are changing the world then Angel List is the place to go. Go to dataengineeringpodcast.com/angel to sign up today. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Upcoming events include the O’Reilly AI Conference, the Strata Data Conference, and the combined events of the Data Architecture Summit and Graphorum. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Zhamak Dehghani about building a distributed data mesh for a domain oriented approach to data management

Interview

Introduction How did you get involved in the area of data management? Can you start by providing your definition of a "data lake" and discussing some of the problems and challenges that they pose?

What are some of the organizational and industry trends that tend to lead to this solution?

You have written a detailed post outlining the concept of a "data mesh" as an alternative to data lakes. Can you give a summary of what you mean by that phrase?

In a domain oriented data model, what are some useful methods for determining appropriate boundaries for the various data products?

What are some of the challenges that arise in this data mesh approach and how do they compare to those of a data lake? One of the primary complications of any data platform, whether distributed or monolithic, is that of discoverability. How do you approach that in a data mesh scenario?

A corollary to the issue of discovery is that of access