talk-data.com talk-data.com

Topic

Delta

Delta Lake

data_lake acid_transactions time_travel file_format storage

347

tagged

Activity Trend

117 peak/qtr
2020-Q1 2026-Q1

Activities

347 activities · Newest first

Managing Databricks at Scale

T-Mobile’s leadership in 5G innovation and its rapid growth in the fixed wireless business have led to an exponential increase in data, reaching 100s of terabytes daily. This session explores how T-Mobile uses Databricks to manage this data efficiently, focusing on scalable architecture with Delta Lake, auto-scaling clusters, performance optimization through data partitioning and caching and comprehensive data governance with Unity Catalog. Additionally, it covers cost management, collaborative tools and AI-driven productivity tools, highlighting how these strategies empower T-Mobile to innovate, streamline operations and maximize data impact across network optimization, supporting the community, energy management and more.

Multi-Format, Multi-Table, Multi-Statement Transactions on Unity Catalog

Get a first look at multi-statement transactions in Databricks. In this session, we will dive into their capabilities, exploring how multi-statement transactions enable atomic updates across multiple tables in your data pipelines, ensuring data consistency and integrity for complex operations. We will also share how we are enabling unified transactions across Delta Lake and Iceberg with Unity Catalog — powering our vision for an open and interoperable lakehouse.

Sponsored by: Hexaware | Global Data at Scale: Powering Front Office Transformation with Databricks

Global Data at Scale: Powering Front Office Transformation with DatabricksJoin KPMG for an engaging session on how we transformed our data platform and built a cutting-edge Global Data Store (GDS)—a game-changing data hub for our Front Office Transformation (FOT). Discover how we seamlessly unified data from various member firms, turning it into a dynamic engine for and enabled our business to leverage our Front Office ecosystem to enable smarter analytics and decision-making. Learn about our unique approach that rapidly integrates diverse datasets into the GDS and our hub-and-spoke model, connecting member firms’ data lakes, enabling secure, high-speed collaboration via Delta Sharing. Hear how we are leveraging Unity Catalog to help ensure data governance, compliance, and straight forward data lineage. We’ll share strategies for risk management, security (fine-grained access, encryption), and scaling a cloud-based data ecosystem.

Franchise IP and Data Governance at Krafton: Driving Cost Efficiency and Scalability

Join us as we explore how KRAFTON optimized data governance for PUBG IP, enhancing cost efficiency and scalability. KRAFTON operates a massive data ecosystem, processing tens of terabytes daily. As real-time analytics demands increased, traditional Batch-based processing faced scalability challenges. To address this, we redesigned data pipelines and governance models, improving performance while reducing costs. Transitioned to real-time pipelines (batch to streaming) Optimized workload management (reducing all-purpose clusters, increasing Jobs usage) Cut costs by tens of thousands monthly (up to 75%) Enhanced data storage efficiency (lower S3 costs, Delta Tables) Improved pipeline stability (Medallion Architecture) Gain insights into how KRAFTON scaled data operations, leveraging real-time analytics and cost optimization for high-traffic games. Learn more: https://www.databricks.com/customers/krafton

Selectively Overwrite Data With Delta Lake’s Dynamic Insert Overwrite

Dynamic Insert Overwrite is an important Delta Lake feature that allows fine-grained updates by selectively overwriting specific rows, eliminating the need for full-table rewrites. For examples, this capability is essential for: DBT-Databricks' incremental models/workloads, enabling efficient data transformations by processing only new or updated records ETL Slowly Changing Dimension (SCD) Type 2 In this lightning talk, we will: Introduce Dynamic Insert Overwrite: Understand its functionality and how it works Explore key use cases: Learn how it optimizes performance and reduces costs Share best practices: Discover practical tips for leveraging this feature on Databricks, including on the cutting-edge Serverless SQL Warehouses

Sponsored by: AWS | Ripple: Well-Architected Data & AI Platforms - AWS and Databricks in Harmony

Join us as we explore the well-architected framework for modern data lakehouse architecture, where AWS's comprehensive data, AI, and infrastructure capabilities align with Databricks' unified platform approach. Building upon core principles of Operational Excellence, Security, Reliability, Performance, and Cost Optimization, we'll demonstrate how Data and AI Governance alongside Interoperability and Usability enable organizations to build robust, scalable platforms. Learn how Ripple modernized its data infrastructure by migrating from a legacy Hadoop system to a scalable, real-time analytics platform using Databricks on AWS. This session covers the challenges of high operational costs, latency, and peak-time bottlenecks—and how Ripple achieved 80% cost savings and 55% performance improvements with Photon, Graviton, Delta Lake, and Structured Streaming.

Sponsored by: Firebolt | 10ms Queries on Iceberg: Turbocharging Your Lakehouse for Interactive Experiences with Firebolt

Open table formats such as Apache Iceberg or Delta Lake have transformed the data landscape. For the first time, we’re seeing a real open storage ecosystem emerging across database vendors. So far, open table formats have found little adoption powering low-latency, high-concurrency analytics use-cases. Data stored in open formats often gets transformed and ingested into closed systems for serving. The reason for this is simple: most modern query engines don’t properly support these workloads. In this talk we take a look under the hood of Firebolt and dive into the work we’re doing to support low-latency and high concurrency on Iceberg: caching of data and metadata, adaptive object storage reads, subresult reuse, and multi-dimensional scaling. After this session, you will know how you can build low-latency data applications on top of Iceberg. You’ll also have a deep understanding of what it takes for modern high-performance query engines to do well on these workloads.

Sponsored by: SAP | SAP Business Data Cloud: Fuel AI with SAP data products across ERP and lines-of-business

Unlock the power of your SAP data with SAP Business Data Cloud—a fully managed SaaS solution that unifies and governs all SAP data while seamlessly connecting it with third-party data. As part of SAP Business Data Cloud, SAP Databricks brings together trusted, semantically rich business data with industry-leading capabilities in AI, machine learning, and data engineering. Discover how to access curated SAP data products across critical business processes, enrich and harmonize your data without data copies using Delta Sharing, and leverage the results across your business data fabric. See it all in action with a demonstration.

Using Clean Rooms for Privacy-Centric Data Collaboration

Databricks Clean Rooms make privacy-safe collaboration possible for data, analytics, and AI — across clouds and platforms. Built on Delta Sharing, Clean Rooms enable organizations to securely share and analyze data together in a governed, isolated environment — without ever exposing raw data. In this session, you’ll learn how to get started with Databricks Clean Rooms and unlock advanced use cases including: Cross-platform collaboration and joint analytics Training machine learning and AI models Enforcing custom privacy policies Analyzing unstructured data Incorporating proprietary libraries in Python and SQL notebooks Auditing clean room activity for compliance Whether you're a data scientist, engineer or data leader, this session will equip you to drive high-value collaboration while maintaining full control over data privacy and governance.

What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

Databricks continues to redefine how organizations securely and openly collaborate on data. With new innovations like Clean Rooms for multi-party collaboration, Sharing for Lakehouse Federation, cross-platform view sharing and Databricks Apps in the Marketplace, teams can now share and access data more easily, cost-effectively and across platforms — whether or not they’re using Databricks. In this session, we’ll deliver live demos of key capabilities that power this transformation: Delta Sharing: The industry’s only open protocol for seamless cross-platform data sharing Databricks Marketplace: A central hub for discovering and monetizing data and AI assets Clean Rooms: A privacy-preserving solution for secure, multi-party data collaboration Join us to see how these tools enable trusted data sharing, accelerate insights and drive innovation across your ecosystem. Bring your questions and walk away with practical ways to put these capabilities into action today.

Advanced Data Access Control for the Exabyte Era: Scaling with Purpose

As data-driven companies scale from small startups to global enterprises, managing secure data access becomes increasingly complex. Traditional access control models fall short at enterprise scale, where dynamic, purpose-driven access is essential. In this talk, we explore how our “Just-in-Time” Purpose-Based Access Control (PBAC) platform addresses the evolving challenges of data privacy and compliance, maintaining least privilege while ensuring productivity. Using features like Unity Catalog, Delta Sharing & Databricks Apps, the platform delivers real-time, context-aware data governance. Leveraging JIT PBAC keeps your data secure, your engineers productive, your legal & security teams happy and your organization future-proof in the ever-evolving compliance landscape.

Delta and Databricks as a Performant Exabyte-Scale Application Backend

The Delta Lake architecture promises to provide a single, highly functional, and high-scale copy of data that can be leveraged by a variety of tools to satisfy a broad range of use cases. To date, most use cases have focused on interactive data warehousing, ETL, model training, and streaming. Real-time access is generally delegated to costly and sometimes difficult-to-scale NoSQL, indexed storage, and domain-specific specialty solutions, which provide limited functionality compared to Spark on Delta Lake. In this session, we will explore the Delta data-skipping and optimization model and discuss how Capital One leveraged it along with Databricks photon and Spark Connect to implement a real-time web application backend. We’ll share how we built a highly-functional and performant security information and event management user experience (SIEM UX) that is cost effective.

Simplified Delta Sharing With Network Security

Delta Sharing enables cross-domain sharing of data assets for collaboration. A practical concern providers and recipients face in doing so is the need to manually configure network and storage firewalls. This is particularly challenging for large-scale providers and recipients with strict compliance requirements. In this talk, we will describe our solution to fully eliminate these complexities. This enhances user experience, scalability and security, facilitating seamless data collaboration across diverse environments and cloud platforms.

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework

The "Doordash Customer 360 Data Store" represents a foundational step in centralizing and managing customer profile to enable targeting and personalized customer experiences built on Delta Lake. This presentation will explore the initial goals and architecture of the Customer 360 Data Store, its journey to becoming a robust entity management framework, and the challenges and opportunities encountered along the way. We will discuss how the evolution addressed scalability, data governance and integration needs, enabling the system to support dynamic and diverse use cases, including customer lifecycle analytics, marketing campaign targeting using segmentation. Attendees will gain insight into key design principles, technical innovations and strategic decisions that transformed the system into a flexible platform for entity management, positioning it as a critical enabler of data-driven growth at Doordash. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

No Time for the Dad Bod: Automating Life with AI and Databricks

Life as a father, tech leader, and fitness enthusiast demands efficiency. To reclaim my time, I’ve built AI-driven solutions that automate everyday tasks—from research agents that prep for podcasts to multi-agent systems that plan meals—all powered by real-time data and automation. This session dives into the technical foundations of these solutions, focusing on event-driven agent design and scalable patterns for robust AI systems. You’ll discover how Databricks technologies like Delta Lake, for reliable and scalable data management, and DSPy, for streamlining the development of generative AI workflows, empower seamless decision-making and deliver actionable insights. Through detailed architecture diagrams and a live demo, I’ll showcase how to design systems that process data in motion to tackle complex, real-world problems. Whether you’re an engineer, architect, or data scientist, you’ll leave with practical strategies to integrate AI-driven automation into your workflows.

Introduction to Modern Open Table Formats and Catalogs

In this session, learn about why modern open table formats like Delta and Iceberg are a big deal and how they work with catalogs. Learn about what motivated their creation, how they work, what benefits they can bring to your data and AI platform. Hear about how these formats are becoming increasingly interoperable and what our vision is for their future.

Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog

Learn how Databricks and Confluent are simplifying the path from real-time data to governed, analytics- and AI-ready tables. This session will cover how Confluent Tableflow automatically materializes Kafka topics into Delta tables and registers them with Unity Catalog — eliminating the need for custom streaming pipelines. We’ll walk through how this integration helps data engineers reduce ingestion complexity, enforce data governance and make real-time data immediately usable for analytics and AI.

Data Intelligence for Cybersecurity Forum: Insights From SAP, Anvilogic, Capital One, and Wiz

Join cybersecurity leaders from SAP, Anvilogic, Capital One, Wiz, and Databricks to explore how modern data intelligence is transforming security operations. Discover how SAP adopted a modular, AI-powered detection engineering lifecycle using Anvilogic on Databricks. Learn how Capital One built a detection and correlation engine leveraging Delta Lake, Apache Spark Streaming, and Databricks to process millions of cybersecurity events per second. Finally, see how Wiz and Databricks’ partnership enhances cloud security with seamless threat visibility. Through expert insights and live demos, gain strategies to build scalable, efficient cybersecurity powered by data and AI.

Kernel, Catalog, Action! Reimagining our Delta-Spark Connector with DSv2

Delta Lake is redesigning its Spark connector through the combination of three key technologies: First, we're updating our Spark APIs to DSv2 to achieve deeper catalog integration and improved integration with the Spark optimizer. Second, we're fully integrating on top of Delta Kernel to take advantage of its simplified abstraction of Delta protocol complexities, accelerating feature adoption and improving maintainability. Third, we are transforming Delta to become a catalog-aware lakehouse format with Catalog Commits, enabling more efficient metadata management, governance and query performance. Join us to explore how we're advancing Delta Lake's architecture, pushing the boundaries of metadata management and creating a more intelligent, performant data lakehouse platform.

Scaling Modern MDM With Databricks, Delta Sharing and Dun & Bradstreet

Master Data Management (MDM) is the foundation of a successful enterprise data strategy — delivering consistency, accuracy and trust across all systems that depend on reliable data. But how can organizations integrate trusted third-party data to enhance their MDM frameworks? How can they ensure that this master data is securely and efficiently shared across internal platforms and external ecosystems? This session explores how Dun & Bradstreet’s pre-mastered data serves as a single source of truth for customers, suppliers and vendors — reducing duplication and driving alignment across enterprise systems. With Delta Sharing, organizations can natively ingest Dun & Bradstreet data into their Databricks environment and establish a scalable, interoperable MDM framework. Delta Sharing also enables secure, real-time distribution of master data across the enterprise ensuring that every system operates from a consistent and trusted foundation.