talk-data.com talk-data.com

Topic

Data Management

data_governance data_quality metadata_management

1097

tagged

Activity Trend

88 peak/qtr
2020-Q1 2026-Q1

Activities

1097 activities · Newest first

Urgent Investments in data, analytics and AI use cases has put the spotlight once more on strong data management foundations. Is our Data even Ready for upcoming AI, analytics and data sharing initiatives is now top of mindshare for heads of data, CDAOs and their counterparts. Data Fabrics have emerged as a long term, foundational data management architecture that you should now pursue for sustained D&A success. This session will:
1. Help understand what data Fabrics are and what they mean for your data strategy and architecture
2. Help decide how to build and where to buy
3. Navigate the vendor landscape to assist in tech procurement decisions to aid your fabric journey

In today's data-centric environment, developing a strong data strategy is crucial for aligning with business goals and driving success. Engage with peers to delve into the complexities of creating effective data strategies that support business objectives. Join to share insights, learn from each other, and uncover practical strategies to enhance your data management practices. Peer Meetups are networking sessions that allow you to connect and share with a small group of your peers without Gartner facilitation. Please make every effort to attend your peer meetup, as other attendees look forward to meeting with us you.

Summary In this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insights on how it will ultimately enhance productivity and expand software engineering's scope. He delves into the current state of AI adoption, the importance of maintaining core data engineering principles, and the need for human oversight when leveraging AI tools effectively. Nick also introduces Dagster's new components feature, designed to modularize and standardize data transformation processes, making it easier for teams to collaborate and integrate AI into their workflows. Join in to explore the future of data engineering, the potential for AI to abstract away complexity, and the importance of open standards in preventing walled gardens in the tech industry.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementThis episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial. Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th.Your host is Tobias Macey and today I'm interviewing Nick Schrock about lowering the barrier to entry for data platform consumersInterview IntroductionHow did you get involved in the area of data management?Can you start by giving your summary of the impact that the tidal wave of AI has had on data platforms and data teams?For anyone who hasn't heard of Dagster, can you give a quick summary of the project?What are the notable changes in the Dagster project in the past year?What are the ecosystem pressures that have shaped the ways that you think about the features and trajectory of Dagster as a project/product/community?In your recent release you introduced "components", which is a substantial change in how you enable teams to collaborate on data problems. What was the motivating factor in that work and how does it change the ways that organizations engage with their data?tension between being flexible and extensible vs. opinionated and constrainedincreased dependency on orchestration with LLM use casesreducing the barrier to contribution for data platform/pipelinesbringing application engineers into the mixchallenges of meeting users/teams where they are (languages, platform investments, etc.)What are the most interesting, innovative, or unexpected ways that you have seen teams applying the Components pattern?What are the most interesting, unexpected, or challenging lessons that you have learned while working on the latest iterations of Dagster?When is Dagster the wrong choice?What do you have planned for the future of Dagster?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links Dagster+ EpisodeDagster Components Slide DeckThe Rise Of Medium CodeLakehouse ArchitectureIcebergDagster ComponentsPydantic ModelsKubernetesDagster PipesRuby on RailsdbtSlingFivetranTemporalMCP == Model Context ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Three out of four companies are betting big on AI – but most are digging on shifting ground. In this $100 billion gold rush, none of these investments will pay off without data quality and strong governance – and that remains a challenge for many organizations. Not every enterprise has a solid data governance practice and maturity models vary widely. As a result, investments in innovation initiatives are at risk of failure. What are the most important data management issues to prioritize? See how your organization measures up and get ahead of the curve with Actian.

Join us for an exclusive roundtable discussion featuring industry leaders and experts as we delve into the transformative power of the SAP and Databricks partnership. This session is designed to provide actionable insights and foster a collaborative dialogue on the ways this collaboration is reshaping the landscape of data management, AI, and business strategy.

This will be a dynamic, interactive roundtable where participants can share their viewpoints, explore real-world use cases, and address challenges and opportunities. The session is designed to encourage open discussion and provide valuable insights for navigating the evolving data and AI landscape.

Discover how Data Mesh is transforming data management by decentralizing delivery and empowering business-driven D&A initiatives. You will find out what data mesh is, its benefits, and the most common challenges. We will provide a successful path based on the experience of early adopters, allowing you to avoid the most common pitfalls and adopt data mesh successfully.

With growing focus on AI in organisations, deliveringAI-ready data has become the number one investment priority of data management leaders. This session will define AI-ready data, how it differs from traditional data management and discuss AI-ready data practices and technologies.

Legacy data tools weren’t built for the AI era. Agentic Data Management replaces static rules and siloed platforms with intelligent agents that monitor, reason, and act—automating quality, governance, and lineage at scale. Discover how data leaders are shifting from manual firefighting to autonomous control, powering faster, trusted, and scalable data for AI and analytics.
See a live demo of an agentic system in action


Learn how probabilistic and deterministic approaches work in concert


Explore how to build intelligent data products using the MCP protocol

Iceberg Geo Type: Transforming Geospatial Data Management at Scale

The Apache Iceberg™ community is introducing native geospatial type support, addressing key challenges in managing geospatial data at scale, including fragmented formats and inefficiencies in storing large spatial datasets. This talk will delve into the origins of the Iceberg geo type, its specification design and future goals. We will examine the impact on both the geospatial and Iceberg communities, in introducing a standard data warehouse storage layer to the geospatial community, and enabling optimized geospatial analytics for Iceberg users. We will also present a live demonstration of the Iceberg geo data type with Apache Sedona™ and Apache Spark™, showcasing how it simplifies and accelerates geospatial analytics workflows and queries. Finally, we will also provide an in-depth look at its current capabilities and outline the roadmap for future developments, and offer a perspective on its role in advancing geospatial data management in the industry.

In this episode of Hub & Spoken, Jason Foster, CEO & Founder of Cynozure, speaks with Lisa Allen, Director of Data at The Pensions Regulator (TPR), about the role of data in protecting savers and shaping a more resilient pensions industry. Lisa shares the story behind TPR's new data strategy and how it's helping to modernise an ecosystem that oversees more than £2 trillion in savings across 38 million members. Drawing on her experience at organisations including the Ordnance Survey and the Open Data Institute, she explains why strong data foundations, industry collaboration, and adaptive thinking are essential to success. The conversation explores how the regulator is building a data marketplace, adopting open standards, and applying AI to enable risk-based regulation, while reducing unnecessary burdens on the industry. Lisa also discusses the value of working transparently, co-designing with stakeholders, and staying agile in the face of rapid change. This episode is a must-listen for business leaders, regulators, and data professionals thinking about strategy, innovation, and sector-wide impact. ****    Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation. 

Get the Most of Your Delta Lake

Unlock the full potential of Delta Lake, the open-source storage framework for Apache Spark, with this session focused on its latest and most impactful features. Discover how capabilities like Time Travel, Column Mapping, Deletion Vectors, Liquid Clustering, UniForm interoperability, and Change Data Feed (CDF) can transform your data architecture. Learn not just what these features do, but when and how to use them to maximize performance, simplify data management, and enable advanced analytics across your lakehouse environment.

Sponsored by: Acceldata | Agentic Data Management: Trusted Data for Enterprise AI on Databricks

An intelligent, action-driven approach to bridge Data Engineering and AI/ML workflows, delivering continuous data trust through comprehensive monitoring, validation, and remediation across the entire Databricks data lifecycle. Learn how Acceldata’s Agentic Data Management (ADM) platform: Ensures end-to-end data reliability across Databricks from ingestion, transformation, feature engineering, and model deployment. Bridges data engineering and AI teams by providing unified insights across Databricks jobs, notebooks and pipelines with proactive data insights and actions. Accelerates the delivery of trustworthy enterprise AI outcomes by detecting multi-variate anomalies, monitoring feature drift, and maintaining lineage within Databricks-native environments.

Sponsored by: West Monroe | Disruptive Forces: LLMs and the New Age of Data Engineering

Seismic shift Large Language Models are unleashing on data engineering, challenging traditional workflows. LLMs obliterate inefficiencies and redefine productivity. AI powerhouses automate complex tasks like documentation, code translation, and data model development with unprecedented speed and precision. Integrating LLMs into tools promises to reduce offshore dependency, fostering agile onshore innovation. Harnessing LLMs' full potential involves challenges, requiring deep dives into domain-specific data and strategic business alignment. Session will addresses deploying LLMs effectively, overcoming data management hurdles, and fostering collaboration between engineers and stakeholders. Join us to explore a future where LLMs redefine possibilities, inviting you to embrace AI-driven innovation and position your organization as a leader in data engineering.

Retail data is expanding at an unprecedented rate, demanding a scalable, cost-efficient, and near real-time architecture. At Unilever, we transformed our data management approach by leveraging Databricks Lakeflow Declarative Pipelines, achieving approximately $500K in cost savings while accelerating computation speeds by 200–500%.By adopting a streaming-driven architecture, we built a system where data flows continuously across processing layers, enabling real-time updates with minimal latency.Lakeflow Declarative Pipelines' serverless simplicity replaced complex-dependency management, reducing maintenance overhead, and improving pipeline reliability. Lakeflow Declarative Pipelines Direct Publishing further enhanced data segmentation, concurrency, and governance, ensuring efficient and scalable data operations while simplifying workflows.This transformation empowers Unilever to manage data with greater efficiency, scalability, and reduced costs, creating a future-ready infrastructure that evolves with the needs of our retail partners and customers.

Sponsored by: KPMG | Enhancing Regulatory Compliance through Data Quality and Traceability

In highly regulated industries like financial services, maintaining data quality is an ongoing challenge. Reactive measures often fail to prevent regulatory penalties, causing inaccuracies in reporting and inefficiencies due to poor data visibility. Regulators closely examine the origins and accuracy of reporting calculations to ensure compliance. A robust system for data quality and lineage is crucial. Organizations are utilizing Databricks to proactively improve data quality through rules-based and AI/ML-driven methods. This fosters complete visibility across IT, data management, and business operations, facilitating rapid issue resolution and continuous data quality enhancement. The outcome is quicker, more accurate, transparent financial reporting. We will detail a framework for data observability and offer practical examples of implementing quality checks throughout the data lifecycle, specifically focusing on creating data pipelines for regulatory reporting,

Empowering the Warfighter With AI

The new Budget Execution Validation process has transformed how the Navy reviews unspent funds. Powered by Databricks Workflows, MLflow, Delta Lake and Apache Spark™, this data-driven model predicts which financial transactions are most likely to have errors, streamlining reviews and increasing accuracy. In FY24, it helped review $40 billion, freeing $1.1 billion for other priorities, including $260 million from active projects. By reducing reviews by 80%, cutting job runtime by over 50% and lowering costs by 60%, it saved 218,000 work hours and $6.7 million in labor costs. With automated workflows and robust data management, this system exemplifies how advanced tools can improve financial decision-making, save resources and ensure efficient use of taxpayer dollars.

Sponsored by: Prophecy | Ready for GenAI? Survey Says Governed Self-Service Is the New Playbook for Data Teams

Are data teams ready for AI? Prophecy’s exclusive survey, “The Impact of GenAI on Data Teams”, gives the clearest picture yet of GenAI’s potential in data management, and what’s standing in the way. The top two obstacles? Poor governance and slow access to high-quality data. The message is clear: Modernizing your data platform with Databricks is essential. But it’s only the beginning. To unlock the power of AI and analytics, organizations must deliver governed, self-service access to clean, trusted data. Traditional data prep tools introduce risks around security, quality, and cost. It’s no wonder data leaders cited data transformation as the area where GenAI will make the biggest impact. To deliver what’s needed teams need a shift to governed self-service. Data analysts and scientists move fast while staying within IT’s guardrails. Join us to learn more details from the survey and how leading organizations are ahead of the curve, using GenAI to reshape how data gets done.

Sponsored by: Boomi, LP | From Pipelines to Agents: Manage Data and AI on One Platform for Maximum ROI

In the age of agentic AI, competitive advantage lies not only in AI models, but in the quality of the data agents reason on and the agility of the tools that feed them. To fully realize the ROI of agentic AI, organizations need a platform that enables high-quality data pipelines and provides scalable, enterprise-grade tools. In this session, discover how a unified platform for integration, data management, MCP server management, API management, and agent orchestration can help you to bring cohesion and control to how data and agents are used across your organization.

Unleashing Data Governance at iFood:Harnessing System Tables and Lineage for Dynamic Tag Propagation

With regulations like LGPD (Brazil's General Data Protection Law) and GDPR, managing sensitive data access is critical. This session demonstrates how to leverage Databricks Unity Catalog system tables and data lineage to dynamically propagate classification tags, empowering organizations to monitor governance and ensure compliance. The presentation covers practical steps, including system table usage, data normalization, ingestion with Lakeflow Declarative Pipelines and classification tag propagation to downstream tables. It also explores permission monitoring with alerts to proactively address governance risks. Designed for advanced audiences, this session offers actionable strategies to strengthen data governance, prevent breaches and avoid regulatory fines while building scalable frameworks for sensitive data management.

Sponsored by: Informatica | Modernize analytics and empower AI in Databricks with trusted data using Informatica

As enterprises continue their journey to the cloud, data warehouse and data management modernization is essential to optimize analytics and drive business outcomes. Minimizing modernization timelines is important for reducing risk and shortening time to value – and ensuring enterprise data is clean, curated and governed is imperative to enable analytics and AI initiatives. In this session, learn how Informatica's Intelligent Data Management Cloud (IDMC) empowers analytics and AI on Databricks by helping data teams: · Develop no-code/low-code data pipelines that ingest, transform and clean data at enterprise scale · Improve data quality and extend enterprise governance with Informatica Cloud Data Governance and Catalog (CDGC) and Unity Catalog · Accelerate pilot-to-production with Mosaic AI