Data Governance

Managing Databricks at Scale

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Vikas Ranjan (T-Mobile)

AI/ML Databricks Delta

T-Mobile’s leadership in 5G innovation and its rapid growth in the fixed wireless business have led to an exponential increase in data, reaching 100s of terabytes daily. This session explores how T-Mobile uses Databricks to manage this data efficiently, focusing on scalable architecture with Delta Lake, auto-scaling clusters, performance optimization through data partitioning and caching and comprehensive data governance with Unity Catalog. Additionally, it covers cost management, collaborative tools and AI-driven productivity tools, highlighting how these strategies empower T-Mobile to innovate, streamline operations and maximize data impact across network optimization, supporting the community, energy management and more.

Sponsored by: Informatica | Modernize analytics and empower AI in Databricks with trusted data using Informatica

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Kevin Wilson (NCS Australia)

Analytics Data Lakehouse Data Quality Databricks dbt ETL/ELT SQL

In this session, we will share NCS’s approach to implementing a Databricks Lakehouse architecture, focusing on key lessons learned and best practices from our recent implementations. By integrating Databricks SQL Warehouse, the DBT Transform framework and our innovative test automation framework, we’ve optimized performance and scalability, while ensuring data quality. We’ll dive into how Unity Catalog enabled robust data governance, empowering business units with self-serve analytical workspaces to create insights while maintaining control. Through the use of solution accelerators, rapid environment deployment and pattern-driven ELT frameworks, we’ve fast-tracked time-to-value and fostered a culture of innovation. Attendees will gain valuable insights into accelerating data transformation, governance and scaling analytics with Databricks.

Sponsored by: Atlan | Domain-driven Data Governance in the AI Era: A Conversation with General Motors and Atlan

Agent Bricks: Building Multi-Agent Systems for Structured and Unstructured Information

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Elise Gonzales (Databricks)

AI/ML Cyber Security

Learn how to build sophisticated systems that enable natural language interactions with both your structured databases and unstructured document collections. This session explores advanced techniques for creating unified and governed AI systems that can seamlessly interpret questions, retrieve relevant information and generate accurate answers across your entire data ecosystem. Key takeaways include: Strategies for combining vector search over unstructured documents with retrieval from structured databases Techniques for optimizing unstructured data processing through effective parsing, metadata enrichment and intelligent chunking Methods for integrating different retrieval mechanisms while ensuring consistent data governance and security Practical approaches for evaluating and improving KBQA system quality through automated and human feedback

Franchise IP and Data Governance at Krafton: Driving Cost Efficiency and Scalability

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by hwaeium yeom (KRAFTON)

Analytics Databricks Delta S3 Data Streaming

Join us as we explore how KRAFTON optimized data governance for PUBG IP, enhancing cost efficiency and scalability. KRAFTON operates a massive data ecosystem, processing tens of terabytes daily. As real-time analytics demands increased, traditional Batch-based processing faced scalability challenges. To address this, we redesigned data pipelines and governance models, improving performance while reducing costs. Transitioned to real-time pipelines (batch to streaming) Optimized workload management (reducing all-purpose clusters, increasing Jobs usage) Cut costs by tens of thousands monthly (up to 75%) Enhanced data storage efficiency (lower S3 costs, Delta Tables) Improved pipeline stability (Medallion Architecture) Gain insights into how KRAFTON scaled data operations, leveraging real-time analytics and cost optimization for high-traffic games. Learn more: https://www.databricks.com/customers/krafton

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals

2025-06-11 · Data + AI Summit 2025

talk

by Frank Munz (Databricks)

AI/ML Data Engineering Databricks GenAI GitHub SQL Data Streaming

This introductory workshop caters to data engineers seeking hands-on experience and data architects looking to deepen their knowledge. The workshop is structured to provide a solid understanding of the following data engineering and streaming concepts: Introduction to Lakeflow and the Data Intelligence Platform Getting started with Lakeflow Declarative Pipelines for declarative data pipelines in SQL using Streaming Tables and Materialized Views Mastering Databricks Workflows with advanced control flow and triggers Understanding serverless compute Data governance and lineage with Unity Catalog Generative AI for Data Engineers: Genie and Databricks Assistant We believe you can only become an expert if you work on real problems and gain hands-on experience. Therefore, we will equip you with your own lab environment in this workshop and guide you through practical exercises like using GitHub, ingesting data from various sources, creating batch and streaming data pipelines, and more.

Advanced Data Access Control for the Exabyte Era: Scaling with Purpose

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Arpan Ghosh (Databricks) , Shuting Zhang (Databricks)

Databricks Delta Cyber Security

As data-driven companies scale from small startups to global enterprises, managing secure data access becomes increasingly complex. Traditional access control models fall short at enterprise scale, where dynamic, purpose-driven access is essential. In this talk, we explore how our “Just-in-Time” Purpose-Based Access Control (PBAC) platform addresses the evolving challenges of data privacy and compliance, maintaining least privilege while ensuring productivity. Using features like Unity Catalog, Delta Sharing & Databricks Apps, the platform delivers real-time, context-aware data governance. Leveraging JIT PBAC keeps your data secure, your engineers productive, your legal & security teams happy and your organization future-proof in the ever-evolving compliance landscape.

Transforming Data Governance for Multimodal Data at Amgen With Databricks

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Jaison Dominic (Amgen) , Jinesh Kunjumon (AMGEN)

Databricks GenAI Fabric

Amgen is advancing its Enterprise Data Fabric to securely manage sensitive multimodal data, such as imaging and research data, across formats.Databricks is already the de facto standard for governance on structured data, and Amgen seeks to extend it for unstructured multi modal data too. This approach will also allow Amgen to standardize its GenAI projects on Databricks. Key priorities include: Centralized data access: establishing a unified, secure access control system Enhanced traceability: implementing detailed processes for transparency and accountability Consistent access standards: ensuring uniform data access privilege experience User support: providing flexible access for diverse stakeholders Comprehensive auditing: enabling thorough permission audits and data usage tracking Learn strategies for implementing a comprehensive multimodal data governance framework using Databricks, as we share our experience on standardizing data governance for GenAI use cases.

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Gowri Shankar (Doordash) , Chao Wang (DoorDash)

Analytics Delta Marketing

The "Doordash Customer 360 Data Store" represents a foundational step in centralizing and managing customer profile to enable targeting and personalized customer experiences built on Delta Lake. This presentation will explore the initial goals and architecture of the Customer 360 Data Store, its journey to becoming a robust entity management framework, and the challenges and opportunities encountered along the way. We will discuss how the evolution addressed scalability, data governance and integration needs, enabling the system to support dynamic and diverse use cases, including customer lifecycle analytics, marketing campaign targeting using segmentation. Attendees will gain insight into key design principles, technical innovations and strategic decisions that transformed the system into a flexible platform for entity management, positioning it as a critical enabler of data-driven growth at Doordash. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Managing the Governed Cloud

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Sherri Adame (GM) , Johnathan Powell (General Motors)

AI/ML Analytics Cloud Computing Databricks Cyber Security

As organizations increasingly adopt Databricks as a unified platform for analytics and AI, ensuring robust data governance becomes critical for compliance, security, and operational efficiency. This presentation will explore the end-to-end framework for governing the Databricks cloud, covering key use cases, foundational governance principles, and scalable automation strategies. We will discuss best practices for metadata, data access, catalog, classification, quality, and lineage, while leveraging automation to streamline enforcement. Attendees will gain insights into best practices and real-world approaches to building a governed data cloud that balances innovation with control.

Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Kasun Indrasiri Gamage (Confluent) , Victoria Bukta (Shopify)

AI/ML Analytics Databricks Delta Kafka Data Streaming

Learn how Databricks and Confluent are simplifying the path from real-time data to governed, analytics- and AI-ready tables. This session will cover how Confluent Tableflow automatically materializes Kafka topics into Delta tables and registers them with Unity Catalog — eliminating the need for custom streaming pipelines. We’ll walk through how this integration helps data engineers reduce ingestion complexity, enforce data governance and make real-time data immediately usable for analytics and AI.

IQVIA's Analytics for Patient Support Services: Transforming Scalability, Performance and Governance

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Dmytro Kobryn (Customertimes) , Sudha Ragothaman (IQVIA)

Analytics Databricks Cyber Security

This presentation will explore the transformation of IQVIA's decade-old patient support platform through the implementation of Databricks Data Intelligence Platform. Facing scalability challenges, performance bottlenecks and rising costs, the existing platform required significant redesign to handle growing data volumes and complex analytics. Key issues included static metrics limiting workflow optimization, fragmented data governance and heightened compliance and security demands. By partnering with Customertimes (a Databricks Partner) and adopting Databricks' centralized, scalable analytics solution with enhanced self-service capabilities, IQVIA achieved improved query performance, cost efficiency and robust governance, ensuring operational effectiveness and regulatory compliance in an increasingly complex environment.

Unifying Human-Curated Data Ingestion and Real-Time Updates with Databricks Lakeflow Declarative Pipelines, Protobuf and BSR

2025-06-10 · Data + AI Summit 2025

talk

by Dwight Whitlock (Clinician Nexus)

Databricks Kafka Protobuf Data Streaming

Red Stapler is a streaming-native system on Databricks that merges file-based ingestion and real-time user edits into one Lakeflow Declarative Pipelines for near real-time feedback. Protobuf definitions, managed in the Buf Schema Registry (BSR), govern schema and data-quality rules, ensuring backward compatibility. All records — valid or not — are stored in an SCD Type 2 table, capturing every version for full history and immediate quarantine views of invalid data. This unified approach boosts data governance, simplifies auditing and streamlines error fixes.Running on Lakeflow Declarative Pipelines Serverless and the Kafka-compatible Bufstream keeps costs low by scaling down to zero when idle. Red Stapler’s configuration-driven Protobuf logic adapts easily to evolving survey definitions without risking production. The result is consistent validation, quick updates and a complete audit trail — all critical for trustworthy, flexible data pipelines.

Dealing With Sensitive Data on Databricks at Natura

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Daniel Shimura (Natura)

Databricks

Ensuring the protection of sensitive data within a Databricks environment requires robust mechanisms to prevent unauthorized access, even by high-privileged roles such as Databricks Administrators: Account Console Admins, Workspace Admins, and Unity Catalog Admins. To address this, a comprehensive data governance and access control strategy can be implemented, leveraging encryption, secret scope, column mask, fine-grained access on tables and auditing capabilities.

Sponsored by: Informatica | Extending Unity Catalog to Govern the Data Estate With Informatica Cloud Data Governance & Catalog

Trust You Can Measure: Data Quality Standards in The Lakehouse

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Amit Pahwa (Databricks) , Sergiy Kanyshchev (Databricks)

Data Lakehouse Data Quality Databricks

Do you trust your data? If you’ve ever struggled to figure out which datasets are reliable, well-governed, or safe to use, you’re not alone. At Databricks, our own internal lakehouse faced the same challenge—hundreds of thousands of tables, but no easy way to tell which data met quality standards. In this talk, the Databricks Data Platform team shares how we tackled this problem by building the Data Governance Score—a way to systematically measure and surface trust signals across the entire lakehouse. You’ll learn how we leverage Unity Catalog, governed tags, and enforcement to drive better data decisions at scale. Whether you're a data engineer, platform owner, or business leader, you’ll leave with practical ideas on how to raise the bar for data quality and trust in your own data ecosystem.

Scaling Data Governance: How Unity Catalog is Empowering Picpay's Data Governance Strategy

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Lucas Morelato (PicPay) , Gustavo Tadao Okida (PicPay)

With massive data volume and complexity, scaling data governance became a significant challenge. Centralizing metadata management, ensuring regulatory compliance and controlling data access across multiple platforms turned to be critical to maintaining efficiency and trust.

Sponsored by: Deloitte | Accelerating Biopharmaceutical Breakthroughs with an Innovative Enterprise Data Strategy

talk-data.com

Activity Trend

Top Events

Top Speakers

Managing Databricks at Scale

Sponsored by: Informatica | Modernize analytics and empower AI in Databricks with trusted data using Informatica

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

Sponsored by: Atlan | Domain-driven Data Governance in the AI Era: A Conversation with General Motors and Atlan

Sponsored by: Hexaware | Global Data at Scale: Powering Front Office Transformation with Databricks

Agent Bricks: Building Multi-Agent Systems for Structured and Unstructured Information

Franchise IP and Data Governance at Krafton: Driving Cost Efficiency and Scalability

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals

Advanced Data Access Control for the Exabyte Era: Scaling with Purpose

Transforming Data Governance for Multimodal Data at Amgen With Databricks

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework

Managing the Governed Cloud

Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog

IQVIA's Analytics for Patient Support Services: Transforming Scalability, Performance and Governance

Unifying Human-Curated Data Ingestion and Real-Time Updates with Databricks Lakeflow Declarative Pipelines, Protobuf and BSR

Dealing With Sensitive Data on Databricks at Natura

Sponsored by: Informatica | Extending Unity Catalog to Govern the Data Estate With Informatica Cloud Data Governance & Catalog

Trust You Can Measure: Data Quality Standards in The Lakehouse

Scaling Data Governance: How Unity Catalog is Empowering Picpay's Data Governance Strategy

Sponsored by: Deloitte | Accelerating Biopharmaceutical Breakthroughs with an Innovative Enterprise Data Strategy