SQL

Crypto at Scale: Building a High-Performance Platform for Real-Time Blockchain Data

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Matthew Moorcroft (Databricks) , Ferran Cabezas Castellvi (Elliptic)

Analytics Blockchain Databricks Delta Data Streaming

In today’s fast-evolving crypto landscape, organizations require fast, reliable intelligence to manage risk, investigate financial crime, and stay ahead of evolving threats. In this session we will discover how Elliptic built a scalable, high-performance Data Intelligence Platform that delivers real-time, actionable Blockchain insights to their customers. We’ll walk you through some of the key components of the Elliptic Platform, including the Elliptic Entity Graph and our User-Facing Analytics. Our focus will be put on the evolution of our User-Facing Analytics capabilities, and specifically how components from the Databricks ecosystem such as Structured Streaming, Delta Lake, and SQL Warehouse have played a vital role. We’ll also share some of the optimizations we’ve made to our streaming jobs to maximize performance and ensure Data Completeness. Whether you’re looking to enhance your streaming capabilities, expand your knowledge of how crypto analytics works or simply discover novel approaches to data processing at scale, this session will provide concrete strategies and valuable lessons learned.

How to Migrate From Oracle to Databricks SQL

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Laurent Léturgez (Databricks)

CSV Databricks DWH Oracle PySpark

Migrating your legacy Oracle data warehouse to the Databricks Data Intelligence Platform can accelerate your data modernization journey. In this session, learn the top strategies for completing this data migration. We will cover data type conversion, basic to complex code conversions, validation and reconciliation best practices. Discover the pros and cons of using CSV files to PySpark or using pipelines to Databricks tables. See before-and-after architectures of customers who have migrated, and learn about the benefits they realized.

Summit Live: Spark Talk - Everything Spark, Lakeflow Declarative Pipelines, and Open Source

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Michael Armbrust (Databricks)

Databricks Delta Spark

Databricks co-founders created Spark, the wildly popular open source foundation of Databricks, way back in 2009. Learn from Michael Armbrust, creator of Spark SQL and leader of Databricks Delta, about the latest happenings in Spark, Lakeflow Declarative Pipelines, and open source.

Multi-Statement Transactions: How to Improve Data Consistency and Performance

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Franco Patano (Databricks)

Data Lakehouse Databricks DWH

Multi-statement transactions bring the atomicity and reliability of traditional databases to modern data warehousing on the lakehouse. In this session, we’ll explore real-world patterns enabled by multi-statement transactions — including multi-table updates, deduplication pipelines and audit logging — and show how Databricks ensures atomicity and consistency across complex workflows. We’ll also dive into demos and share tips to getting started and migrations with this feature in Databricks SQL.

Sponsored by: Insight Enterprises | Unity Catalog Agent Assistant

Data Triggers and Advanced Control Flow With Lakeflow Jobs

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Anthony Podgorsak (Databricks) , Prashanth Babu Velanati Venkata (Databricks)

AI/ML BI Data Lakehouse Delta Power BI

Lakeflow Jobs is the production-ready fully managed orchestrator for the entire Lakehouse with 99.95% uptime. Join us for a dive into how you can orchestrate your enterprise data operations, from triggering your jobs only when your data is ready to advanced control flow with conditionals, looping and job modularity — with demos! Attendees will gain practical insights into optimizing their data operations by orchestrating with Lakeflow Jobs: New task types: Publish AI/BI Dashboards, push to Power BI or ingest with Lakeflow Connect Advanced execution control: Reference SQL Task outputs, run partial DAGs and perform targeted backfills Repair runs: Re-run failed pipelines with surgical precision using task-level repair Control flow upgrades: Native for-each loops and conditional logic make DAGs more dynamic + expressive Smarter triggers: Kick off jobs based on file arrival or Delta table changes, enabling responsive workflows Code-first approach to pipeline orchestration

Hands-on Learning: Databricks SQL in Action: Intelligent Data Warehousing, Analytics and BI Workshop

2025-06-11 · Data + AI Summit 2025

workshop

by Pearl Ubaru (Databricks)

AI/ML Analytics BI Cloud Computing Data Lake Data Lakehouse Databricks DWH

Most organizations run complex cloud data architectures that silo applications, users and data. Join this interactive hands-on workshop to learn how Databricks SQL allows you to operate a multi-cloud lakehouse architecture that delivers data warehouse performance at data lake economics — with up to 12x better price/performance than traditional cloud data warehouses.Here’s what we’ll cover: How Databricks SQL fits in the Data Intelligence Platform, enabling you to operate a multicloud lakehouse architecture that delivers data warehouse performance at data lake economics How to manage and monitor compute resources, data access and users across your lakehouse infrastructure How to query directly on your data lake using your tools of choice or the built-in SQL editor and visualizations How to use AI to increase productivity when querying, completing code or building dashboards Ask your questions during this hands-on lab, and the Databricks experts will guide you.

Revolutionizing PepsiCo BI Capabilities: From Traditional BI to Next-Gen Analytics Powerhouse

2025-06-11 · Data + AI Summit 2025 Watch

talk

by John Abraham (PepsiCo) , Joshua Sayah Lee (PepsiCo Inc.)

AI/ML Analytics BI Data Analytics Databricks

This session will provide an in-depth overview of how PepsiCo, a global leader in food and beverage, transformed its outdated data platform into a modern, unified and centralized data and AI-enabled platform using the Databricks SQL serverless environment. Through three distinct implementations that transpired at PepsiCo in 2024, we will demonstrate how the PepsiCo Data Analytics & AI Group unlocked pivotal capabilities that facilitated the delivery of diverse data-driven insights to the business, reduced operational expenses and enhanced overall performance through the newly implemented platform.

Summit Live: Best Practices for Data Warehouse Migrations

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Laurent Léturgez (Databricks)

AI/ML Databricks DWH

Databricks SQL is the fastest-growing data warehouse on the market, with over 10k organizations thanks to its price performance and AI innovations. See the best practices and common architectural challenges of migrating your legacy DW, including reference architectures. Learn how to easily migrate per the recently acquired the Lakebridge migration tool, and through our partners.

Sponsored by: Domo, Inc | Enabling AI-Powered Business Solutions w/Databricks & Domo

GenAI for SQL & ETL: Build Multimodal AI Workflows at Scale

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Ahmed Bilal (Databricks) , Colton Peltier (Databricks)

AI/ML Databricks ETL/ELT GenAI LLM

Enterprises generate massive amounts of unstructured data — from support tickets and PDFs to emails and product images. But extracting insight from that data requires brittle pipelines and complex tools. Databricks AI Functions make this simpler. In this session, you’ll learn how to apply powerful language and vision models directly within your SQL and ETL workflows — no endpoints, no infrastructure, no rewrites. We’ll explore practical use cases and best practices for analyzing complex documents, classifying issues, translating content, and inspecting images — all in a way that’s scalable, declarative, and secure. What you’ll learn: How to run state-of-the-art LLMs like GPT-4, Claude Sonnet 4, and Llama 4 on your data How to build scalable, multimodal ETL workflows for text and images Best practices for prompts, cost, and error handling in production Real-world examples of GenAI use cases powered by AI Functions

How to Migrate from Teradata to Databricks SQL

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Fabien Contaminard (Databricks) , Mehran Golestaneh (Databricks)

Databricks DWH LLM Teradata

Storage and processing costs of your legacy Teradata data warehouses impact your ability to deliver. Migrating your legacy Teradata data warehouse to the Databricks Data Intelligence Platform can accelerate your data modernization journey. In this session, learn the top strategies for completing this data migration. We will cover data type conversion, basic to complex code conversions, validation and reconciliation best practices. How to use Databricks natively hosted LLMs to assist with migration activities. See before-and-after architectures of customers who have migrated, and learn about the benefits they realized.

How We Turned 200+ Business Users Into Analysts With AI/BI Genie

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Thomas Russell (Databricks)

AI/ML Analytics BI Databricks Marketing

AI/BI Genie has transformed self-service analytics for the Databricks Marketing team. This user-friendly conversational AI tool empowers marketers to perform advanced data analysis using natural language — no SQL required. By reducing reliance on data teams, Genie increases productivity and enables faster, data-driven decisions across the organization. But realizing Genie’s full potential takes more than just turning it on. In this session, we’ll share the end-to-end journey of implementing Genie for over 200 marketing users, including lessons learned, best practices and the real business impact of this Databricks-on-Databricks solution. Learn how Genie democratizes data access, enhances insight generation and streamlines decision-making at scale.

Unity Catalog Lakeguard: Secure and Efficient Compute for Your Enterprise

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Jakob Mund (Databricks) , Scott Van Woudenberg (Databricks)

Cloud Computing Databricks Cloud Functions Cyber Security Spark

Modern data workloads span multiple sources — data lakes, databases, apps like Salesforce and services like cloud functions. But as teams scale, secure data access and governance across shared compute becomes critical. In this session, learn how to confidently integrate external data and services into your workloads using Spark and Unity Catalog on Databricks. We'll explore compute options like serverless, clusters, workflows and SQL warehouses, and show how Unity Catalog’s Lakeguard enforces fine-grained governance — even when concurrently sharing compute by multiple users. Walk away ready to choose the right compute model for your team’s needs — without sacrificing security or efficiency.

What’s New with Databricks Assistant: From Exploration to Production

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Gal Oshri (Amazon SageMaker AWS) , Samantha Banchik (Databricks)

Databricks

Databricks Assistant helps you get from initial exploration all the way to production faster and easier than ever. In this session, we'll show you how Assistant simplifies and accelerates common workflows, boosting your productivity across notebooks and the SQL editor. You'll get practical tips, see end-to-end examples in action, and hear about the latest capabilities we're excited about. We'll also discuss how we're continually improving Assistant to make your development experience faster, more contextual and more customizable. Join us to discover how to get the most out of Databricks Assistant and empower your team to build better and faster.

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Kevin Wilson (NCS Australia)

Analytics Data Governance Data Lakehouse Data Quality Databricks dbt ETL/ELT

In this session, we will share NCS’s approach to implementing a Databricks Lakehouse architecture, focusing on key lessons learned and best practices from our recent implementations. By integrating Databricks SQL Warehouse, the DBT Transform framework and our innovative test automation framework, we’ve optimized performance and scalability, while ensuring data quality. We’ll dive into how Unity Catalog enabled robust data governance, empowering business units with self-serve analytical workspaces to create insights while maintaining control. Through the use of solution accelerators, rapid environment deployment and pattern-driven ELT frameworks, we’ve fast-tracked time-to-value and fostered a culture of innovation. Attendees will gain valuable insights into accelerating data transformation, governance and scaling analytics with Databricks.

Bridging Big Data and AI: Empowering PySpark With Lance Format for Multi-Modal AI Data Pipelines

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by LU QIU (LanceDB) , Allison Wang (Databricks)

AI/ML Analytics API Big Data Data Analytics Lance PySpark Python Spark

PySpark has long been a cornerstone of big data processing, excelling in data preparation, analytics and machine learning tasks within traditional data lakes. However, the rise of multimodal AI and vector search introduces challenges beyond its capabilities. Spark’s new Python data source API enables integration with emerging AI data lakes built on the multi-modal Lance format. Lance delivers unparalleled value with its zero-copy schema evolution capability and robust support for large record-size data (e.g., images, tensors, embeddings, etc), simplifying multimodal data storage. Its advanced indexing for semantic and full-text search, combined with rapid random access, enables high-performance AI data analytics to the level of SQL. By unifying PySpark's robust processing capabilities with Lance's AI-optimized storage, data engineers and scientists can efficiently manage and analyze the diverse data types required for cutting-edge AI applications within a familiar big data framework.

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals

2025-06-11 · Data + AI Summit 2025

talk

by Frank Munz (Databricks)

AI/ML Data Engineering Data Governance Databricks GenAI GitHub Data Streaming

This introductory workshop caters to data engineers seeking hands-on experience and data architects looking to deepen their knowledge. The workshop is structured to provide a solid understanding of the following data engineering and streaming concepts: Introduction to Lakeflow and the Data Intelligence Platform Getting started with Lakeflow Declarative Pipelines for declarative data pipelines in SQL using Streaming Tables and Materialized Views Mastering Databricks Workflows with advanced control flow and triggers Understanding serverless compute Data governance and lineage with Unity Catalog Generative AI for Data Engineers: Genie and Databricks Assistant We believe you can only become an expert if you work on real problems and gain hands-on experience. Therefore, we will equip you with your own lab environment in this workshop and guide you through practical exercises like using GitHub, ingesting data from various sources, creating batch and streaming data pipelines, and more.

Hands-On Learning: Build Custom Data Intelligence Apps on Databricks

2025-06-11 · Data + AI Summit 2025

talk

by Justin DeBrabant (Databricks) , Giran Moodley (Databricks) , Ivan Trusov (Databricks)

AI/ML BI Databricks

Want to learn how to build your own custom data intelligence applications directly in Databricks? In this workshop, we’ll guide you through a hands-on tutorial for building a Streamlit web app that leverages many of the key products at Databricks as building blocks. You’ll integrate a live DB SQL warehouse, use Genie to ask questions in natural language, and embed AI/BI dashboards for interactive visualizations. In addition, we’ll discuss key concepts and best practices for building production-ready apps, including logging and observability, scalability, different authorization models, and deployment. By the end, you'll have a working AI app—and the skills to build more.

Lakeflow Connect: Easy, Efficient Ingestion From Databases

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Peter Pogorski (Databricks) , Bret Grantham (Databricks)

Cyber Security postgresql

Lakeflow Connect streamlines the ingestion of incremental data from popular databases like SQL Server and PostgreSQL. In this session, we’ll review best practices for networking, security, minimizing database load, monitoring and more — tailored to common industry scenarios. Join us to gain practical insights into Lakeflow Connect's functionality so that you’re ready to build your own pipelines. Whether you're looking to optimize data ingestion or enhance your database integrations, this session will provide you with a deep understanding of how Lakeflow Connect works with databases.

talk-data.com

Activity Trend

Top Events

Top Speakers

Crypto at Scale: Building a High-Performance Platform for Real-Time Blockchain Data

How to Migrate From Oracle to Databricks SQL

Summit Live: Spark Talk - Everything Spark, Lakeflow Declarative Pipelines, and Open Source

Multi-Statement Transactions: How to Improve Data Consistency and Performance

Sponsored by: Insight Enterprises | Unity Catalog Agent Assistant

Data Triggers and Advanced Control Flow With Lakeflow Jobs

Hands-on Learning: Databricks SQL in Action: Intelligent Data Warehousing, Analytics and BI Workshop

Revolutionizing PepsiCo BI Capabilities: From Traditional BI to Next-Gen Analytics Powerhouse

Summit Live: Best Practices for Data Warehouse Migrations

Sponsored by: Domo, Inc | Enabling AI-Powered Business Solutions w/Databricks & Domo

GenAI for SQL & ETL: Build Multimodal AI Workflows at Scale

How to Migrate from Teradata to Databricks SQL

How We Turned 200+ Business Users Into Analysts With AI/BI Genie

Unity Catalog Lakeguard: Secure and Efficient Compute for Your Enterprise

What’s New with Databricks Assistant: From Exploration to Production

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

Bridging Big Data and AI: Empowering PySpark With Lance Format for Multi-Modal AI Data Pipelines

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals

Hands-On Learning: Build Custom Data Intelligence Apps on Databricks

Lakeflow Connect: Easy, Efficient Ingestion From Databases