talk-data.com talk-data.com

Topic

Data Governance

data_management compliance data_quality

417

tagged

Activity Trend

90 peak/qtr
2020-Q1 2026-Q1

Activities

417 activities · Newest first

Managing Databricks at Scale

T-Mobile’s leadership in 5G innovation and its rapid growth in the fixed wireless business have led to an exponential increase in data, reaching 100s of terabytes daily. This session explores how T-Mobile uses Databricks to manage this data efficiently, focusing on scalable architecture with Delta Lake, auto-scaling clusters, performance optimization through data partitioning and caching and comprehensive data governance with Unity Catalog. Additionally, it covers cost management, collaborative tools and AI-driven productivity tools, highlighting how these strategies empower T-Mobile to innovate, streamline operations and maximize data impact across network optimization, supporting the community, energy management and more.

Sponsored by: Informatica | Modernize analytics and empower AI in Databricks with trusted data using Informatica

As enterprises continue their journey to the cloud, data warehouse and data management modernization is essential to optimize analytics and drive business outcomes. Minimizing modernization timelines is important for reducing risk and shortening time to value – and ensuring enterprise data is clean, curated and governed is imperative to enable analytics and AI initiatives. In this session, learn how Informatica's Intelligent Data Management Cloud (IDMC) empowers analytics and AI on Databricks by helping data teams: · Develop no-code/low-code data pipelines that ingest, transform and clean data at enterprise scale · Improve data quality and extend enterprise governance with Informatica Cloud Data Governance and Catalog (CDGC) and Unity Catalog · Accelerate pilot-to-production with Mosaic AI

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

In this session, we will share NCS’s approach to implementing a Databricks Lakehouse architecture, focusing on key lessons learned and best practices from our recent implementations. By integrating Databricks SQL Warehouse, the DBT Transform framework and our innovative test automation framework, we’ve optimized performance and scalability, while ensuring data quality. We’ll dive into how Unity Catalog enabled robust data governance, empowering business units with self-serve analytical workspaces to create insights while maintaining control. Through the use of solution accelerators, rapid environment deployment and pattern-driven ELT frameworks, we’ve fast-tracked time-to-value and fostered a culture of innovation. Attendees will gain valuable insights into accelerating data transformation, governance and scaling analytics with Databricks.

Sponsored by: Atlan | Domain-driven Data Governance in the AI Era: A Conversation with General Motors and Atlan

Now the largest automaker in the United States, selling more than 2.7 million vehicles in 2024, General Motors is setting a bold vision for its future, with Software-defined vehicles and AI as a driving force. With data as a crucial asset, a transformation of this scale calls for a modern approach to Data Governance. Join Sherri Adame, Enterprise Data Governance Leader at General Motors, to learn about GM’s novel governance approach, supported by technologies like Atlan and Databricks. Hear how Sherri and her team are shifting governance to the left with automation, implementing data contracts, and accelerating data product discovery across domains, creating a cultural shift that emphasizes data as a competitive advantage.

Sponsored by: Hexaware | Global Data at Scale: Powering Front Office Transformation with Databricks

Global Data at Scale: Powering Front Office Transformation with DatabricksJoin KPMG for an engaging session on how we transformed our data platform and built a cutting-edge Global Data Store (GDS)—a game-changing data hub for our Front Office Transformation (FOT). Discover how we seamlessly unified data from various member firms, turning it into a dynamic engine for and enabled our business to leverage our Front Office ecosystem to enable smarter analytics and decision-making. Learn about our unique approach that rapidly integrates diverse datasets into the GDS and our hub-and-spoke model, connecting member firms’ data lakes, enabling secure, high-speed collaboration via Delta Sharing. Hear how we are leveraging Unity Catalog to help ensure data governance, compliance, and straight forward data lineage. We’ll share strategies for risk management, security (fine-grained access, encryption), and scaling a cloud-based data ecosystem.

Agent Bricks: Building Multi-Agent Systems for Structured and Unstructured Information

Learn how to build sophisticated systems that enable natural language interactions with both your structured databases and unstructured document collections. This session explores advanced techniques for creating unified and governed AI systems that can seamlessly interpret questions, retrieve relevant information and generate accurate answers across your entire data ecosystem. Key takeaways include: Strategies for combining vector search over unstructured documents with retrieval from structured databases Techniques for optimizing unstructured data processing through effective parsing, metadata enrichment and intelligent chunking Methods for integrating different retrieval mechanisms while ensuring consistent data governance and security Practical approaches for evaluating and improving KBQA system quality through automated and human feedback

Franchise IP and Data Governance at Krafton: Driving Cost Efficiency and Scalability

Join us as we explore how KRAFTON optimized data governance for PUBG IP, enhancing cost efficiency and scalability. KRAFTON operates a massive data ecosystem, processing tens of terabytes daily. As real-time analytics demands increased, traditional Batch-based processing faced scalability challenges. To address this, we redesigned data pipelines and governance models, improving performance while reducing costs. Transitioned to real-time pipelines (batch to streaming) Optimized workload management (reducing all-purpose clusters, increasing Jobs usage) Cut costs by tens of thousands monthly (up to 75%) Enhanced data storage efficiency (lower S3 costs, Delta Tables) Improved pipeline stability (Medallion Architecture) Gain insights into how KRAFTON scaled data operations, leveraging real-time analytics and cost optimization for high-traffic games. Learn more: https://www.databricks.com/customers/krafton

This introductory workshop caters to data engineers seeking hands-on experience and data architects looking to deepen their knowledge. The workshop is structured to provide a solid understanding of the following data engineering and streaming concepts: Introduction to Lakeflow and the Data Intelligence Platform Getting started with Lakeflow Declarative Pipelines for declarative data pipelines in SQL using Streaming Tables and Materialized Views Mastering Databricks Workflows with advanced control flow and triggers Understanding serverless compute Data governance and lineage with Unity Catalog Generative AI for Data Engineers: Genie and Databricks Assistant We believe you can only become an expert if you work on real problems and gain hands-on experience. Therefore, we will equip you with your own lab environment in this workshop and guide you through practical exercises like using GitHub, ingesting data from various sources, creating batch and streaming data pipelines, and more.

Advanced Data Access Control for the Exabyte Era: Scaling with Purpose

As data-driven companies scale from small startups to global enterprises, managing secure data access becomes increasingly complex. Traditional access control models fall short at enterprise scale, where dynamic, purpose-driven access is essential. In this talk, we explore how our “Just-in-Time” Purpose-Based Access Control (PBAC) platform addresses the evolving challenges of data privacy and compliance, maintaining least privilege while ensuring productivity. Using features like Unity Catalog, Delta Sharing & Databricks Apps, the platform delivers real-time, context-aware data governance. Leveraging JIT PBAC keeps your data secure, your engineers productive, your legal & security teams happy and your organization future-proof in the ever-evolving compliance landscape.

Transforming Data Governance for Multimodal Data at Amgen With Databricks

Amgen is advancing its Enterprise Data Fabric to securely manage sensitive multimodal data, such as imaging and research data, across formats.Databricks is already the de facto standard for governance on structured data, and Amgen seeks to extend it for unstructured multi modal data too. This approach will also allow Amgen to standardize its GenAI projects on Databricks. Key priorities include: Centralized data access: establishing a unified, secure access control system Enhanced traceability: implementing detailed processes for transparency and accountability Consistent access standards: ensuring uniform data access privilege experience User support: providing flexible access for diverse stakeholders Comprehensive auditing: enabling thorough permission audits and data usage tracking Learn strategies for implementing a comprehensive multimodal data governance framework using Databricks, as we share our experience on standardizing data governance for GenAI use cases.

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework

The "Doordash Customer 360 Data Store" represents a foundational step in centralizing and managing customer profile to enable targeting and personalized customer experiences built on Delta Lake. This presentation will explore the initial goals and architecture of the Customer 360 Data Store, its journey to becoming a robust entity management framework, and the challenges and opportunities encountered along the way. We will discuss how the evolution addressed scalability, data governance and integration needs, enabling the system to support dynamic and diverse use cases, including customer lifecycle analytics, marketing campaign targeting using segmentation. Attendees will gain insight into key design principles, technical innovations and strategic decisions that transformed the system into a flexible platform for entity management, positioning it as a critical enabler of data-driven growth at Doordash. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Managing the Governed Cloud

As organizations increasingly adopt Databricks as a unified platform for analytics and AI, ensuring robust data governance becomes critical for compliance, security, and operational efficiency. This presentation will explore the end-to-end framework for governing the Databricks cloud, covering key use cases, foundational governance principles, and scalable automation strategies. We will discuss best practices for metadata, data access, catalog, classification, quality, and lineage, while leveraging automation to streamline enforcement. Attendees will gain insights into best practices and real-world approaches to building a governed data cloud that balances innovation with control.

Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog

Learn how Databricks and Confluent are simplifying the path from real-time data to governed, analytics- and AI-ready tables. This session will cover how Confluent Tableflow automatically materializes Kafka topics into Delta tables and registers them with Unity Catalog — eliminating the need for custom streaming pipelines. We’ll walk through how this integration helps data engineers reduce ingestion complexity, enforce data governance and make real-time data immediately usable for analytics and AI.

IQVIA's Analytics for Patient Support Services: Transforming Scalability, Performance and Governance

This presentation will explore the transformation of IQVIA's decade-old patient support platform through the implementation of Databricks Data Intelligence Platform. Facing scalability challenges, performance bottlenecks and rising costs, the existing platform required significant redesign to handle growing data volumes and complex analytics. Key issues included static metrics limiting workflow optimization, fragmented data governance and heightened compliance and security demands. By partnering with Customertimes (a Databricks Partner) and adopting Databricks' centralized, scalable analytics solution with enhanced self-service capabilities, IQVIA achieved improved query performance, cost efficiency and robust governance, ensuring operational effectiveness and regulatory compliance in an increasingly complex environment.

Red Stapler is a streaming-native system on Databricks that merges file-based ingestion and real-time user edits into one Lakeflow Declarative Pipelines for near real-time feedback. Protobuf definitions, managed in the Buf Schema Registry (BSR), govern schema and data-quality rules, ensuring backward compatibility. All records — valid or not — are stored in an SCD Type 2 table, capturing every version for full history and immediate quarantine views of invalid data. This unified approach boosts data governance, simplifies auditing and streamlines error fixes.Running on Lakeflow Declarative Pipelines Serverless and the Kafka-compatible Bufstream keeps costs low by scaling down to zero when idle. Red Stapler’s configuration-driven Protobuf logic adapts easily to evolving survey definitions without risking production. The result is consistent validation, quick updates and a complete audit trail — all critical for trustworthy, flexible data pipelines.

Dealing With Sensitive Data on Databricks at Natura

Ensuring the protection of sensitive data within a Databricks environment requires robust mechanisms to prevent unauthorized access, even by high-privileged roles such as Databricks Administrators: Account Console Admins, Workspace Admins, and Unity Catalog Admins. To address this, a comprehensive data governance and access control strategy can be implemented, leveraging encryption, secret scope, column mask, fine-grained access on tables and auditing capabilities.

Sponsored by: Informatica | Extending Unity Catalog to Govern the Data Estate With Informatica Cloud Data Governance & Catalog

Join this 20-minute session to learn how Informatica CDGC integrates with and leverages Unity Catalog metadata to provide end-to-end governance and security across an enterprise data landscape. Topics covered will include: Comprehensive data lineage that provides complete data transformation visibility across multicloud and hybrid environments -Broad data source support to facilitate holistic cataloging and a centralized governance framework Centralized access policy management and data stewardship to enable compliance with regulatory standards Rich data quality to ensure data is cleansed, validated and trusted for analytics and AI

Trust You Can Measure: Data Quality Standards in The Lakehouse

Do you trust your data? If you’ve ever struggled to figure out which datasets are reliable, well-governed, or safe to use, you’re not alone. At Databricks, our own internal lakehouse faced the same challenge—hundreds of thousands of tables, but no easy way to tell which data met quality standards. In this talk, the Databricks Data Platform team shares how we tackled this problem by building the Data Governance Score—a way to systematically measure and surface trust signals across the entire lakehouse. You’ll learn how we leverage Unity Catalog, governed tags, and enforcement to drive better data decisions at scale. Whether you're a data engineer, platform owner, or business leader, you’ll leave with practical ideas on how to raise the bar for data quality and trust in your own data ecosystem.

Sponsored by: Deloitte | Accelerating Biopharmaceutical Breakthroughs with an Innovative Enterprise Data Strategy

In the rapidly evolving life sciences and healthcare industry, leveraging data-as-a-product is crucial for driving innovation and achieving business objectives. Join us to explore how Deloitte is revolutionizing data strategy solutions by overcoming challenges such as data silos, poor data quality, and lack of real-time insights with the Databricks Data Intelligence Platform. Learn how effective data governance, seamless data integration, and scalable architectures support personalized medicine, regulatory compliance, and operational efficiency. This session will highlight how these strategies enable biopharma companies to transform data into actionable insights, accelerate breakthroughs and enhance life sciences outcomes.