SQL

Sponsored by: RowZero | Spreadsheets in the modern data stack: security, governance, AI, and self-serve analytics

Turn Genie Into an Agent Using Conversation APIs

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Prithvi Kannan (Databricks) , Hanlin Sun (Databricks)

AI/ML Analytics API BI

Transform your AI/BI Genie into a text-to-SQL powerhouse using the Genie Conversation APIs. This session explores how Genie functions as an intelligent agent, translating natural language queries into SQL to accelerate insights and enhance self-service analytics. You'll learn practical techniques for configuring agents, optimizing queries and handling errors — ensuring Genie delivers accurate, relevant responses in real time. A must-attend for teams looking to level up their AI/BI capabilities and deliver smarter analytics experiences.

AI Meets SQL: Leverage GenAI at Scale to Enrich Your Data

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Sid Taneja (Databricks) , Youngbin Kim (Databricks)

AI/ML Databricks GenAI LLM NLP RAG

This session is repeated. Integrating AI into existing data workflows can be challenging, often requiring specialized knowledge and complex infrastructure. In this session, we'll share how SQL users can leverage AI/ML to access large language models (LLMs) and traditional machine learning directly from within SQL, simplifying the process of incorporating AI into data workflows. We will demonstrate how to use Databricks SQL for natural language processing, traditional machine learning, retrieval augmented generation and more. You'll learn about best practices and see examples of solving common use cases such as opinion mining, sentiment analysis, forecasting and other common AI/ML tasks.

Real-Time Mode Technical Deep Dive: How We Built Sub-300 Millisecond Streaming Into Apache Spark™

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Siying Dong (Databricks) , Jerry Peng (Databricks)

AI/ML Databricks Spark Data Streaming

Real-time mode is a new low-latency execution mode for Apache Spark™ Structured Streaming. It can consistently provide p99 latencies less than 300 milliseconds for a broad set of stateless and stateful streaming queries. Our talk focuses on the technical aspects of making this possible in Spark. We’ll dive into the core architecture that enables these dramatic latency improvements, including a concurrent stage scheduler and a non-blocking shuffle. We’ll explore how we maintained Spark’s fault-tolerance guarantees, and we’ll also share specific optimizations we made to our streaming SQL operators. These architectural improvements have already enabled Databricks customers to build workloads with latencies up to 10x lower than before. Early adopters in our Private Preview have successfully implemented real-time enrichment pipelines and feature engineering for machine learning — use cases that were previously impossible at these latencies.

Geospatial Insights With Databricks SQL: Techniques and Applications

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Michael Johns (Databricks) , Kent Marten (Databricks)

Databricks

Spatial data is increasingly important, but working with it can be complex. In this session, we’ll explore how Databricks SQL supports spatial analysis and helps analysts and engineers get more value from location-based data. We’ll cover what’s coming in the Public Preview of Spatial SQL, when and how to use the new Geometry and Geography data types, and practical use cases for H3. You’ll also learn about common challenges with spatial data and how we're addressing them, along with a look at the near-term roadmap.

From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Chris Crawford (Databricks) , Hobson Bryan (Global Water Security Center)

CI/CD Data Lakehouse Databricks Git Cyber Security Spark Data Streaming

The Global Water Security Center translates environmental science into actionable insights for the U.S. Department of Defense. Prior to incorporating Databricks, responding to these requests required querying approximately five hundred thousand raster files representing over five hundred billion points. By leveraging lakehouse architecture, Databricks Auto Loader, Spark Streaming, Databricks Spatial SQL, H3 geospatial indexing and Databricks Liquid Clustering, we were able to drastically reduce our “time to analysis” from multiple business days to a matter of seconds. Now, our data scientists execute queries on pre-computed tables in Databricks, resulting in a “time to analysis” that is 99% faster, giving our teams more time for deeper analysis of the data. Additionally, we’ve incorporated Databricks Workflows, Databricks Asset Bundles, Git and Git Actions to support CI/CD across workspaces. We completed this work in close partnership with Databricks.

Geo-Powering Insights: The Art of Spatial Data Integration and Visualization

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Mathieu Pelletier (Databricks)

BI Data Lakehouse Databricks Power BI Tableau

In this presentation, we will explore how to leverage Databricks' SQL engine to efficiently ingest and transform geospatial data. We'll demonstrate the seamless process of connecting to external systems such as ArcGIS to retrieve datasets, showcasing the platform's versatility in handling diverse data sources. We'll then delve into the power of Databricks Apps, illustrating how you can create custom geospatial dashboards using various frameworks like Streamlit and Flask, or any framework of your choice. This flexibility allows you to tailor your visualizations to your specific needs and preferences. Furthermore, we'll highlight the Databricks Lakehouse's integration capabilities with popular dashboarding tools such as Tableau and Power BI. This integration enables you to combine the robust data processing power of Databricks with the advanced visualization features of these specialized tools.

SQL-First ETL: Building Easy, Efficient Data Pipelines With Lakeflow Declarative Pipelines

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Paul Lappas (Databricks) , Ritwik Yadav (Databricks) , Meixian Li (Databricks)

Databricks dbt ETL/ELT Data Streaming

This session explores how SQL-based ETL can accelerate development, simplify maintenance and make data transformation more accessible to both engineers and analysts. We'll walk through how Databricks Lakeflow Declarative Pipelines and Databricks SQL warehouse support building production-grade pipelines using familiar SQL constructs.Topics include: Using streaming tables for real-time ingestion and processing Leveraging materialized views to deliver fast, pre-computed datasets Integrating with tools like dbt to manage batch and streaming workflows at scale By the end of the session, you’ll understand how SQL-first approaches can streamline ETL development and support both operational and analytical use cases.

Unifying Data Delivery: Using Databricks as Your Enterprise Serving Layer

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Ivan Spiriev (The World Bank) , Ivan Donev (The World Bank)

Analytics Databricks Delta

This session will take you on our journey of integrating Databricks as the core serving layer in a large enterprise, demonstrating how you can build a unified data platform that meets diverse business needs. We will walk through the steps for constructing a central serving layer by leveraging Databricks’ SQL Warehouse to efficiently deliver data to analytics tools and downstream applications. To tackle low latency requirements, we’ll show you how to incorporate an interim scalable relational database layer that delivers sub-second performance for hot data scenarios. Additionally, we’ll explore how Delta Sharing enables secure and cost-effective data distribution beyond your organization, eliminating silos and unnecessary duplication for a truly end-to-end centralized solution. This session is perfect for data architects, engineers and decision-makers looking to unlock the full potential of Databricks as a centralized serving hub.

Using Catalogs for a Well-Governed and Efficient Data Ecosystem

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Kajal Woods (Capital One Financial) , jim Lebonitte (Capital One)

Data Management Databricks

The ability to enforce data management controls at scale and reduce the effort required to manage data pipelines is critical to operating efficiently. Capital One has scaled its data management capabilities and invested in platforms to help address this need. In the past couple of years, the role of “the catalog” in a data platform architecture has transitioned from just providing SQL to providing a full suite of capabilities that can help solve this problem at scale. This talk will give insight into how Capital One is thinking about leveraging Databricks Unity Catalog to help tackle these challenges.

Improving User Experience and Efficiency Using DBSQL

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Renato Suarez (PicPay) , Gustavo Tadao Okida (PicPay)

Dashboard Databricks

To scale Databricks SQL to 2,000 users efficiently and cost-effectively, we adopted serverless, ensuring dynamic scalability and resource optimization. During peak times, resources scale up automatically; during low demand, they scale down, preventing waste. Additionally, we implemented a strong content governance model. We created continuous monitoring to assess query and dashboard performance, notifying users about adjustments and ensuring only relevant content remains active. If a query exceeds time or impact limits, access is reviewed and, if necessary, deactivated. This approach brought greater efficiency, cost reduction and an improved user experience, keeping the platform well-organized and high-performing.

De-Risking Investment Decisions: QCG's Smarter Deal Evaluation Process Leveraging Databricks

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Ian Brown (Quantum Capital Group)

Analytics Databricks Delta Spark

Quantum Capital Group (QCG) screens hundreds of deals across the global Sustainable Energy Ecosystem, requiring deep technical due diligence. With over 1.5 billion records sourced from public, premium and proprietary datasets, their challenge was how to efficiently curate, analyze and share this data to drive smarter investment decisions. QCG partnered with Databricks & Tiger Analytics to modernize its data landscape. Using Delta tables, Spark SQL, and Unity Catalog, the team built a golden dataset that powers proprietary evaluation models and automates complex workflows. Data is now seamlessly curated, enriched and distributed — both internally and to external stakeholders — in a secure, governed and scalable way. This session explores how QCG’s investment in data intelligence has turned an overwhelming volume of information into a competitive advantage, transforming deal evaluation into a faster, more strategic process.

Pushing the Limits of What Your Warehouse Can Do Using Python and Databricks

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Jakob Mund (Databricks)

Cloud Computing Databricks Python

SQL warehouses in Databricks can run more than just SQL. Join this session to learn how to get more out of your SQL warehouses and any tools built on top of it by leveraging Python. After attending this session, you will be familiar with Python user-defined functions and how to bring in custom dependencies from PyPi, as a custom wheel or even securely invoke cloud services with performance at scale.

ViewShift: Dynamic Policy Enforcement With Spark and SQL Views

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Khai Tran (LinkedIn) , Walaa Moustafa (LinkedIn)

Data Lake Spark

Dynamic policy enforcement is increasingly critical in today's landscape, where data compliance is a top priorities for companies, individuals, and regulators alike. In this talk, Walaa explores how LinkedIn has implemented a robust dynamic policy enforcement engine, ViewShift, and integrated it within its data lake. He will demystify LinkedIn's query engine stack by demonstrating how catalogs can automatically route table resolutions to compliance-enforcing SQL views. These SQL views possess several noteworthy properties: Auto-Generated: Created automatically from declarative data annotations. User-Centric: They honor user-level consent and preferences. Context-Aware: They apply different transformations tailored to specific use cases. Portable: Despite the SQL logic being implemented in a single dialect, it remains accessible across all engines. Join this session to learn how ViewShift helps ensure that compliance is seamlessly integrated into data processing workflows.

From Datavault to Delta Lake: Streamlining Data Sync with Lakeflow Connect

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Olivia Ren (Databricks) , Andrew Clarke (Australian Red Cross Lifeblood)

Analytics Azure Data Lakehouse Data Vault Databricks Delta DWH

In this session, we will explore the Australian Red Cross Lifeblood's approach to synchronizing an Azure SQL Datavault 2.0 (DV2.0) implementation with Unity Catalog (UC) using Lakeflow Connect. Lifeblood's DV2.0 data warehouse, which includes raw vault (RV) and business vault (BV) tables, as well as information marts defined as views, required a multi-step process to achieve data/business logic sync with UC. This involved using Lakeflow Connect to ingest RV and BV data, followed by a custom process utilizing JDBC to ingest view definitions, and the automated/manual conversion of T-SQL to Databricks SQL views, with Lakehouse Monitoring for validation. In this talk, we will share our journey, the design decisions we made, and how the resulting solution now supports analytics workloads, analysts, and data scientists at Lifeblood.

Analyst Roadmap to Databricks: From SQL to End-to-End BI

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Jake Duckers (Spencer Gifts)

AI/ML Analytics BI Databricks

Analysts often begin their Databricks journey by running familiar SQL queries in the SQL Editor, but that’s just the start. In this session, I’ll share the roadmap I followed to expand beyond ad-hoc querying into SQL Editor/notebook-driven development to scheduled data pipelines producing interactive dashboards — all powered by Databricks SQL and Unity Catalog. You’ll learn how to organize tables with primary-key/foreign-key relationships along with creating table and column comments to form the semantic model, utilizing DBSQL features like RELY constraints. I’ll also show how parameterized dashboards can be set up to empower self-service analytics and feed into Genie Spaces. Attendees will walk away with best practices for starting out with building a robust BI platform on Databricks, including tips for table design and metadata enrichment. Whether you’re a data analyst or BI developer, this talk will help you unlock powerful, AI-enhanced analytics workflows.

Comprehensive Data Warehouse Migrations to Databricks SQL

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Simon Eligulashvili (Databricks) , Sundar Shankar (Databricks)

Databricks DWH

This session is repeated. Databricks has a free, comprehensive solution for migrating legacy data warehouses from a wide range of source systems. See how we accelerate migrations from legacy data warehouses to Databricks SQL, achieving 50% faster migration than traditional methods. We'll cover the tool’s automated migration process: Discovery: Source system profiling Assessment: Legacy code analysis Conversion: Advanced code transpilation Reconciliation: Data validation This comprehensive approach increases the predictability of migration projects, allowing businesses to plan and execute migrations with greater confidence.

Petabyte-Scale On-Chain Insights: Real-Time Intelligence for the Next-Gen Financial Backbone

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Leo Liang (CipherOwl Inc)

AI/ML Analytics Arrow Blockchain Data Lakehouse Delta Kafka Spark

We’ll explore how CipherOwl Inc. constructed a near real-time, multi-chain data lakehouse to power anti-money laundering (AML) monitoring at a petabyte scale. We will walk through the end-to-end architecture, which integrates cutting-edge open-source technologies and AI-driven analytics to handle massive on-chain data volumes seamlessly. Off-chain intelligence complements this to meet rigorous AML requirements. At the core of our solution is ChainStorage, an OSS started by Coinbase that provides robust blockchain data ingestion and block-level serving. We enhanced it with Apache Spark™ and Arrow™, coupled for high-throughput processing and efficient data serialization, backed by Delta Lake and Kafka. For the serving layer, we employ StarRocks to deliver lightning-fast SQL analytics over vast datasets. Finally, our system incorporates machine learning and AI agents for continuous data curation and near real-time insights, which are crucial for tackling on-chain AML challenges.

AI and Genie: Analyzing Healthcare Improvement Opportunities

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Jay Sharma (Premier Inc) , Tim Riddle (Premier Inc)

AI/ML BI Data Lakehouse

This session is repeated. Improving healthcare impacts us all. We highlight how Premier Inc. took risk-adjusted patient data from more than 1,300 member hospitals across America, applying a natural language interface using AI/BI Genie, allowing our users to discover new insights. The stakes are high, new insights surfaced represent potential care improvement and lives positively impacted. Using Genie and our AI-ready data in Unity Catalog, our team was able to stand up a Genie instance in three short days, bypassing costs and time of custom modeling and application development. Additionally, Genie allowed our internal teams to generate complex SQL, as much as 10 times faster than writing it by hand. As Genie and lakehouse apps continue to advance rapidly, we are excited to leverage these features by introducing Genie to as many as 20,000 users across hundreds of hospitals. This will support our members’ ongoing mission to enhance the care they provide to the communities they serve.

Bayada’s Snowflake-to-Databricks Migration: Transforming Data for Speed & Efficiency

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Venkatesh Guruprasad (BAYADA Home Health Care) , PradeepKumar jain Vimalraj (Tredence Inc) , Elaine O'Neill (BAYADA Home Health Care)

AI/ML Analytics Databricks Matillion Snowflake SSIS

Bayada is transforming its data ecosystem by consolidating Matillion+Snowflake and SSIS+SQL Server into a unified Enterprise Data Platform powered by Databricks. Using Databricks' Medallion architecture, this platform enables seamless data integration, advanced analytics and machine learning across critical domains like general ledger, recruitment and activity-based costing. Databricks was selected for its scalability, real-time analytics and ability to handle both structured and unstructured data, positioning Bayada for future growth. The migration aims to reduce data processing times by 35%, improve reporting accuracy and cut reconciliation efforts by 40%. Operational costs are projected to decrease by 20%, while real-time analytics is expected to boost efficiency by 15%. Join this session to learn how Bayada is leveraging Databricks to build a high-performance data platform that accelerates insights, drives efficiency and fosters innovation organization-wide.

talk-data.com

Activity Trend

Top Events

Top Speakers

Sponsored by: RowZero | Spreadsheets in the modern data stack: security, governance, AI, and self-serve analytics

Turn Genie Into an Agent Using Conversation APIs

AI Meets SQL: Leverage GenAI at Scale to Enrich Your Data

Real-Time Mode Technical Deep Dive: How We Built Sub-300 Millisecond Streaming Into Apache Spark™

Geospatial Insights With Databricks SQL: Techniques and Applications

From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%

Geo-Powering Insights: The Art of Spatial Data Integration and Visualization

SQL-First ETL: Building Easy, Efficient Data Pipelines With Lakeflow Declarative Pipelines

Unifying Data Delivery: Using Databricks as Your Enterprise Serving Layer

Using Catalogs for a Well-Governed and Efficient Data Ecosystem

Improving User Experience and Efficiency Using DBSQL

De-Risking Investment Decisions: QCG's Smarter Deal Evaluation Process Leveraging Databricks

Pushing the Limits of What Your Warehouse Can Do Using Python and Databricks

ViewShift: Dynamic Policy Enforcement With Spark and SQL Views

From Datavault to Delta Lake: Streamlining Data Sync with Lakeflow Connect

Analyst Roadmap to Databricks: From SQL to End-to-End BI

Comprehensive Data Warehouse Migrations to Databricks SQL

Petabyte-Scale On-Chain Insights: Real-Time Intelligence for the Next-Gen Financial Backbone

AI and Genie: Analyzing Healthcare Improvement Opportunities

Bayada’s Snowflake-to-Databricks Migration: Transforming Data for Speed & Efficiency