talk-data.com talk-data.com

Topic

Data Modelling

data_governance data_quality metadata_management

355

tagged

Activity Trend

18 peak/qtr
2020-Q1 2026-Q1

Activities

355 activities · Newest first

Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data practices at Rent the Runway and explains how the modern data stack has led to a proliferation of dashboards without a coherent way for business consumers to reason about cause, effect, and action. He explores how metric trees differ from and interoperate with other data modeling approaches, serve as a backend for analytical workflows, and provide concrete examples like modeling Uber's revenue drivers and customer journeys. Vijay also discusses the potential of AI agents operating on metric trees to execute workflows, organizational patterns for defining inputs and outputs with business teams, and a vision for analytics that becomes invisible infrastructure embedded in everyday decisions.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Vijay Subramanian about metric trees and how they empower more effective and adaptive analyticsInterview IntroductionHow did you get involved in the area of data management?Can you describe what metric trees are and their purpose?How do metric trees relate to metric/semantic layers?What are the shortcomings of existing data modeling frameworks that prevent effective use of those assets?How do metric trees build on top of existing investments in dimensional data models?What are some strategies for engaging with the business to identify metrics and their relationships?What are your recommendations for storage, representation, and retrieval of metric trees?How do metric trees fit into the overall lifecycle of organizational data workflows?When creating any new data asset it introduces overhead of maintenance, monitoring, and evolution. How do metric trees fit into existing testing and validation frameworks that teams rely on for dimensional modeling?What are some of the key differences in useful evaluation/testing that teams need to develop for metric trees?How do metric trees assist in context engineering for AI-powered self-serve access to organizational data?What are the most interesting, innovative, or unexpected ways that you have seen metric trees used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on metric trees and operationalizing them at Trace?When is a metric tree the wrong abstraction?What do you have planned for the future of Trace and applications of metric trees?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links Metric TreeTraceModern Data StackHadoopVerticaLuigidbtRalph KimballBill InmonMetric LayerDimensional Data WarehouseMaster Data ManagementData GovernanceFinancial P&L (Profit and Loss)EBITDA ==Earnings before interest, taxes, depreciation and amortizationThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Energy flexibility is playing an increasingly fundamental role in the UK energy market. With the adoption of renewable energy sources such as EVs, solar panels and domestic and commercial batteries, the number of flexible assets is soaring - making aggregation and flexibility trading infinitely more complex and requiring vast amounts of data modelling and forecasting. To address this challenge, Flexitricity adopted MLOps best practices to tackle this complex real-world challenge and meet the needs of the scaling energy demand in the UK. 

The session will cover:

- The complex technical challenge of energy flexibility in 2025.

- The critical requirement to invest in technology and skillsets.

- A real-life view of how machine learning operations (MLOps) scaled Flexitricity’s data science model development.

- How innovations in technology can support and optimise delivering on energy flexibility. 

The audience will gain insight into:

- The challenge of building data science models to keep up with scaling demand.

- How MLOps best practices can be adopted to drive efficiency and increase data science experiments to 10000+ per year.

- Lessons learned from adopting MLOps pipelines.

Join Sami Hero and Tammie Coles, as they share how Ellie is reinventing data modeling with AI-native tools that empower both technical and non-technical users. With CData Embedded Cloud, Ellie brings live metadata and data models from systems like Snowflake, Databricks, and Oracle Financials into a unified modeling workspace. Their platform translates legacy structures into human-readable insights, letting users interact with a copilot-style assistant to discover, refine, and maintain data models faster—with less reliance on analysts.

You’ll see how Ellie uses generative AI to recommend new entities, reconcile differences between models and live systems, and continuously document evolving data environments. Learn how corporations are using Ellie and CData together to scale high-quality data modeling across teams. reducing rework, accelerating delivery of analytics-ready models, and making enterprise architecture accessible to the business.

For years, data engineering was a story of predictable pipelines: move data from point A to point B. But AI just hit the reset button on our entire field. Now, we're all staring into the void, wondering what's next. While the fundamentals haven't changed, data remains challenging in the traditional areas of data governance, data management, and data modeling, which still present challenges. Everything else is up for grabs.

This talk will cut through the noise and explore the future of data engineering in an AI-driven world. We'll examine how team structures will evolve, why agentic workflows and real-time systems are becoming non-negotiable, and how our focus must shift from building dashboards and analytics to architecting for automated action. The reset button has been pushed. It's time for us to invent the future of our industry.

Analytical Data Product success is traditionally measured with classic reliability metrics. If we were ambitious, we might track user engagement by dashboard views or self-serve activity; they are blunt, woolly indicators at best. The real goal was always to enable better decisions, but we often struggle to measure whether our data products actually help. Conversational BI changes this equation. Now we can see the exact questions users are asking, what follow-ups they need, and where the data model delights or frustrates them. This creates a richer feedback loop than ever before, but it also puts our data model front and centre, exposed directly to business users in a way that makes design quality impossible to hide.

This session will recap the foundations of good data product design, then dive into what conversational BI means for analytics teams. How do we design models that give the best foundation? How can we capture and interpret this new stream of usage feedback? What does success look like? We'll answer all of these questions and more.

Face To Face
by Shachar Meir (Shachar Meir) , Guy Fighel (Hetz Ventures) , Rob Hulme , Sarah Levy (Euno) , Harry Gollop (Cognify Search) , Joe Reis (DeepLearning.AI)

Practicing analytics well takes more than just tools and tech. It requires data modeling practices that unify and empower all teams within analytics, from engineers to analysts. This is especially true as AI becomes a part of analytics. Without a governed data model that provides consistent data interpretation, AI tools are left to guess. Join panelists Joe Reis, Sarah Levy, Harry Gollop, Rob Hulme, Shachar Meir, and Guy Fighel, as they share battle-tested advice on overcoming conflicting definitions and accurately mapping business intent to data, reports and dashboards at scale. This panel is for data & analytics engineers seeking a clear framework to capture business logic across layers, and for data leaders focused on building a reliable foundation for Gen AI.

For years, data engineering was a story of predictable pipelines: move data from point A to point B. But AI just hit the reset button on our entire field. Now, we're all staring into the void, wondering what's next. While the fundamentals haven't changed, data remains challenging in the traditional areas of data governance, data management, and data modeling, which still present challenges. Everything else is up for grabs.

This talk will cut through the noise and explore the future of data engineering in an AI-driven world. We'll examine how team structures will evolve, why agentic workflows and real-time systems are becoming non-negotiable, and how our focus must shift from building dashboards and analytics to architecting for automated action. The reset button has been pushed. It's time for us to invent the future of our industry.

Despite claims to the contrary, dimensional modelling and star schemas are alive and well the in the modern data world. But whilst developers might have great technical skills and understand how to build a star schema, they may lack the business domain knowledge to ensure that what they deliver is fit for use by analysts and self-service users. On the flip side, these end users often know what they want and need from a data platform, but struggle to explain this in a way that makes it easy for developers to implement.

How can we improve the requirements gathering process to make sure we avoid the tensions that can arise from this?

This session will cover a data modelling requirements approach that looks to bridge the gap between business and IT by using an end-to-end process for working with business users to collaboratively design a dimensional model, making sure you build super star schemas and turn your self into a data modelling superstar.

About Johnny: Johnny currently works as a data and analytics consultant. He’s been working with Business Intelligence software since 2007, specialising in full stack data platform development since 2016. He’s a self-confessed Business Intelligence geek and in his spare time runs the website, SubStack and YouTube channel Greyskull Analytics, where he likes to nerd out about all things data.

The Big Book of Data Science. Part I: Data Processing

There are already excellent books on software programming for data processing and data transformation for instance: Wes McKinney’s. This book, reflecting on my own industrial and teaching experience, tries to overcome the big learning curve newcomers to the field have to travel before they are ready to tackle real data science and AI challenges. In this regard this book is different to other books in that:

It assumes zero software programming knowledge. This instructional design is intentional given the book’s aim to open the practice of data science to anyone interested in data exploration and analysis irrespective of their previous background.

It follows an incremental approach to facilitate the assimilation of, sometimes, arcane software techniques to manipulate data.

It is practice oriented to ensure readers can apply what they learn in their daily practices.

Illustrates how to use generative AI to help you become a more productive data scientist and AI engineer.

By reading and working on the labs included in this book you will develop software programming skills required to successfully contribute to the data understanding and data preparation stages involved in any data related project. You will become proficient at manipulating and transforming datasets in industrial contexts and produce clean, reliable datasets that can drive accurate analysis and informed decision-making. Moreover you will be prepared to develop and deploy dashboards and visualizations supporting the insights and conclusions in the deployment stage.

Data modelling and evaluation are not covered in this book. We are working on a second installment of the book series illustrating the application of statistical and machine learning techniques to derive data insights.

In this discussion, I sit down with data veterans Remco Broekmans and Marco Wobben to explore why so many data projects fail. They argue that the problem isn't the technology, but a fundamental misunderstanding of communication, culture, and long-term strategy.The conversation goes deep into the critical shift from being a "hardcore techie" to focusing on translating business needs into data models. They use the classic "involved party" data modeling pattern as a prime example of how abstract IT jargon creates a massive disconnect with the business.Marco shares a fascinating (and surprising) case study of the Dutch Railroad organization, which has been engaged in an 18-year information modeling "program" - not a project - to manage its immense complexity. This sparks a deep dive into the cultural and work-ethic differences between the US and Europe, contrasting the American short-term, ROI-driven "project" mindset with the European capacity for long-term, foundational "programs".Finally, they tackle the role of AI. Is it a silver bullet or just the latest shiny object? They conclude that AI's best use is as an "intern" or "assistant", a tool to brainstorm, ask questions, and handle initial prototyping, but never as a replacement for the deep, human-centric work of understanding a business.Timestamps:00:00 - Introduction01:09 - Marco Wobben introduces his 25-year journey in information modeling.01:56 - Remco Broekmans reintroduces himself and his focus on the communication aspect of data.03:22 - The progression from hardcore techie to focusing on communication over technology.08:16 - Why is communication in data and IT projects so difficult? 09:49 - The "Involved Party" Problem: A perfect example of where IT communication goes wrong with the business.13:35 - The essence of IT is automating the communication that happens on the business side.18:39 - Discussing a client with 20,000 distinct business terms in their information model.21:55 - The story of the Dutch Railroad's 18-year information modeling program that reduced incident response from 4 hours to 2 seconds.27:25 - Project vs. Program: A key mindset difference between the US and Europe.34:18 - The danger of chasing shiny new tools like AI without getting the fundamentals right first.39:55 - Where does AI fit into the world of data modeling? 43:34 - Why you can't trust AI to be the expert, especially with specialized business jargon.47:18 - The role of risk in trusting AI, using a self-driving car analogy.53:27 - Cultural differences in work pressure and ethics between the US and the Netherlands.59:29 - Why personality and communication skills are more important than a PhD for data modelers.01:03:38 - What is the purpose of an AI-run company with no human benefit? 01:11:21 - Using AI as an instructive tool to improve your own skills, not just to get an answer.01:14:12 - How AI can be used as a "sidekick" to ask dumb questions and help you think.01:18:00 - Where to find Marco and Remco online

MongoDB Essentials

Get started fast with MongoDB architecture, core operations, and AI-powered tools for building intelligent applications Free with your book: DRM-free PDF version + access to Packt's next-gen Reader Key Features Quickly grasp the MongoDB architecture and distributed design principles Learn practical data modeling, CRUD operations, and aggregation techniques Explore AI-enabled tools for building intelligent applications with MongoDB Purchase of the print or Kindle book includes a free PDF eBook Book Description Modern applications demand flexibility, speed, and intelligence, and MongoDB delivers all three. This mini guide wastes no time, offering a concise, practical introduction to handling data flexibly and efficiently with MongoDB. MongoDB Essentials helps developers, architects, database administrators, and decision makers get started quickly and confidently. The book introduces MongoDB’s core principles, from the document data model to its distributed architecture, including replica sets and sharding. It then helps you build hands-on skills such as installing MongoDB, designing effective data schemas, performing CRUD operations, and working with the aggregation pipeline. You’ll discover performance tips along the way and learn how AI-enhanced tools like Atlas Search and Atlas Vector Search power intelligent application development. With clear explanations and a practical approach, this book gives you the foundation and skills you need to start working with MongoDB right away. Email sign-up and proof of purchase required What you will learn Understand MongoDB's document model and architecture Set up local MongoDB deployments quickly Design schemas tailored to application access patterns Perform CRUD and aggregation operations efficiently Use tools to optimize query performance and scalability Explore AI-powered features such as Atlas Search and Atlas Vector Search Who this book is for This book is for anyone looking to explore MongoDB, including students, developers, system architects, managers, database administrators, and decision makers who want to familiarize themselves with what a modern database can offer. Whether you're building your first application or exploring what MongoDB can do for you, this book is the idea starting point for your MongoDB journey.

The Official MongoDB Guide

The official guide to MongoDB architecture, tools, and cloud features, written by leading MongoDB subject matter experts to help you build secure, scalable, high-performance applications Key Features Design resilient, secure solutions with high performance and scalability Streamline development with modern tooling, indexing, and AI-powered workflows Deploy and optimize in the cloud using advanced MongoDB Atlas features Purchase of the print or Kindle book includes a free PDF eBook Book Description Delivering secure, scalable, and high-performance applications is never easy, especially when systems must handle growth, protect sensitive data, and perform reliably under pressure. The Official MongoDB Guide addresses these challenges with guidance from MongoDB’s top subject matter experts, so you learn proven best practices directly from those who know the technology inside out. This book takes you from core concepts and architecture through to advanced techniques for data modeling, indexing, and query optimization, supported by real-world patterns that improve performance and resilience. It offers practical coverage of developer tooling, IDE integrations, and AI-assisted workflows that will help you work faster and more effectively. Security-focused chapters walk you through authentication, authorization, encryption, and compliance, while chapters dedicated to MongoDB Atlas showcase its robust security features and demonstrate how to deploy, scale, and leverage platform-native capabilities such as Atlas Search and Atlas Vector Search. By the end of this book, you’ll be able to design, build, and manage MongoDB applications with the confidence that comes from learning directly from the experts shaping the technology. What you will learn Build secure, scalable, and high-performance applications Design efficient data models and indexes for real workloads Write powerful queries to sort, filter, and project data Protect applications with authentication and encryption Accelerate coding with AI-powered and IDE-based tools Launch, scale, and manage MongoDB Atlas with confidence Unlock advanced features like Atlas Search and Atlas Vector Search Apply proven techniques from MongoDB's own engineering leaders Who this book is for This book is for developers, database professionals, architects, and platform teams who want to get the most out of MongoDB. Whether you’re building web apps, APIs, mobile services, or backend systems, the concepts covered here will help you structure data, improve performance, and deliver value to your users. No prior experience with MongoDB is required, but familiarity with databases and programming will be helpful.

Data Modeling with Snowflake - Second Edition

Data Modeling with Snowflake provides a clear and practical guide to mastering data modeling tailored to the Snowflake Data Cloud. By integrating foundational principles of database modeling with Snowflake's unique features and functionality, this book empowers you to create scalable, cost-effective, and high-performing data solutions. What this Book will help me do Apply universal data modeling concepts within the Snowflake platform effectively. Leverage Snowflake's features such as Time Travel and Zero-Copy Cloning for optimized data solutions. Understand and utilize advanced techniques like Data Vault and Data Mesh for scalable data architecture. Master handling semi-structured data in Snowflake using practical recipes and examples. Achieve cost efficiency and resource optimization by aligning modeling principles with Snowflake's architecture. Author(s) Serge Gershkovich is an accomplished data engineer and seasoned professional in data architecture and modeling. With a passion for simplifying complex concepts, Serge's work leverages his years of hands-on experience to guide readers in mastering both foundational and advanced data management practices. His clear and practical approach ensures accessibility for all levels. Who is it for? This book is ideal for data developers and engineers seeking practical modeling guidance within Snowflake. It's suitable for data analysts looking to broaden their database design expertise, and for database beginners aiming to get a head start in structuring data. Professionals new to Snowflake will also find its clear explanations of key features aligned with modeling techniques invaluable.

Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that data modeling is optional or secondary, emphasizing its crucial role in ensuring alignment between business requirements and data structures. The conversation covers challenges in complex environments, the impact of technical decisions on data strategy, and the evolving role of AI in data management. Serge stresses the need for business stakeholders' involvement in data initiatives and a systematic approach to data modeling, warning against relying solely on technical expertise without considering business alignment.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Enterprises today face an enormous challenge: they’re investing billions into Snowflake and Databricks, but without strong foundations, those investments risk becoming fragmented, expensive, and hard to govern. And that’s especially evident in large, complex enterprise data environments. That’s why companies like DirecTV and Pfizer rely on SqlDBM. Data modeling may be one of the most traditional practices in IT, but it remains the backbone of enterprise data strategy. In today’s cloud era, that backbone needs a modern approach built natively for the cloud, with direct connections to the very platforms driving your business forward. Without strong modeling, data management becomes chaotic, analytics lose trust, and AI initiatives fail to scale. SqlDBM ensures enterprises don’t just move to the cloud—they maximize their ROI by creating governed, scalable, and business-aligned data environments. If global enterprises are using SqlDBM to tackle the biggest challenges in data management, analytics, and AI, isn’t it worth exploring what it can do for yours? Visit dataengineeringpodcast.com/sqldbm to learn more.Your host is Tobias Macey and today I'm interviewing Serge Gershkovich about how and why data modeling is a sociotechnical endeavorInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the activities that you think of when someone says the term "data modeling"?What are the main groupings of incomplete or inaccurate definitions that you typically encounter in conversation on the topic?How do those conceptions of the problem lead to challenges and bottlenecks in execution?Data modeling is often associated with data warehouse design, but it also extends to source systems and unstructured/semi-structured assets. How does the inclusion of other data localities help in the overall success of a data/domain modeling effort?Another aspect of data modeling that often consumes a substantial amount of debate is which pattern to adhere to (star/snowflake, data vault, one big table, anchor modeling, etc.). What are some of the ways that you have found effective to remove that as a stumbling block when first developing an organizational domain representation?While the overall purpose of data modeling is to provide a digital representation of the business processes, there are inevitable technical decisions to be made. What are the most significant ways that the underlying technical systems can help or hinder the goals of building a digital twin of the business?What impact (positive and negative) are you seeing from the introduction of LLMs into the workflow of data modeling?How does tool use (e.g. MCP connection to warehouse/lakehouse) help when developing the transformation logic for achieving a given domain representation? What are the most interesting, innovative, or unexpected ways that you have seen organizations address the data modeling lifecycle?What are the most interesting, unexpected, or challenging lessons that you have learned while working with organizations implementing a data modeling effort?What are the overall trends in the ecosystem that you are monitoring related to data modeling practices?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links sqlDBMSAPJoe ReisERD == Entity Relation DiagramMaster Data ManagementdbtData ContractsData Modeling With Snowflake book by Serge (affiliate link)Type 2 DimensionData VaultStar SchemaAnchor ModelingRalph KimballBill InmonSixth Normal FormMCP == Model Context ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Learning Tableau 2025 - Sixth Edition

"Learning Tableau 2025" provides a comprehensive guide to mastering Tableau's latest features, including advanced AI capabilities like Tableau Pulse and Agent. This book, authored by Tableau expert Joshua N. Milligan, will equip you with the tools to transform complex data into actionable insights and interactive dashboards. What this Book will help me do Learn to use Tableau's advanced AI features, including Tableau Agent and Pulse, to streamline data analysis and automate insights. Develop skills to create and customize dynamic dashboards tailored to interactive data storytelling. Understand and utilize new geospatial functions within Tableau for advanced mapping and analytics. Master Tableau Prep's enhanced data preparation capabilities for efficient data modeling and structuring. Learn to effectively integrate and analyze data from multiple sources, enhancing your ability to extract meaningful insights. Author(s) Joshua N. Milligan, a Tableau Zen Master and Visionary, has years of experience in the field of data visualization and analytics. With a hands-on approach, Joshua combines his expertise and passion for Tableau to make complex topics accessible and engaging. His teaching method ensures that readers gain practical, actionable knowledge. Who is it for? This book is ideal for aspiring business intelligence developers, data analysts, data scientists, and professionals seeking to enhance their data visualization skills. It's suitable for both beginners looking to get started with Tableau and experienced users eager to explore its new features. A Tableau license or access to a 14-day trial is recommended.

We illustrate the power and flexibility of a new extension point in Xarray's data model: "custom indexes" that allow Xarray users to neatly handle complex grids, and enables at least one new data model (vector data cubes). We present a whirlwind tour of specific examples to illustrate the power of this feature, and aim to stimulate experimentation during the sprints.

Xarray has enormous potential as a data model and toolkit for labeled N-D arrays in biology. Originally developed within the geosciences community, it is seeing increased usage in biology, with applications ranging from genomics to image analysis and beyond. However, it has not yet been widely adopted. This presentation will investigate what the blockers have been to wider adoption, showcase the power of Xarray in biology through existing use cases, and present a roadmap for the future of Xarray in biological workflows through recent and upcoming improvements in Xarray.