Join us for an insightful Ask Me Anything (AMA) session on Declarative Pipelines — a powerful approach to simplify and optimize data workflows. Learn how to define data transformations using high-level, SQL-like semantics, reducing boilerplate code while improving performance and maintainability. Whether you're building ETL processes, feature engineering pipelines, or analytical workflows, this session will cover best practices, real-world use cases and how Declarative Pipelines can streamline your data applications. Bring your questions and discover how to make your data processing more intuitive and efficient!
talk-data.com
Topic
SQL
Structured Query Language (SQL)
89
tagged
Activity Trend
Top Events
Data warehousing in enterprise and mission-critical environments needs special consideration for price/performance. This session will explain how Databricks SQL addresses the most challenging requirements for high-concurrency, low-latency performance at scale. We will also cover the latest advancements in resource-based scheduling, autoscaling and caching enhancements that allow for seamless performance and workload management.
Databricks SQL has added significant features in the last year at a fast pace. This session will share the most impactful features and the customer use cases that inspired them. We will highlight the new SQL editor, SQL coding features, streaming tables and materialized views, BI integrations, cost management features, system tables and observability features, and more. We will also share AI-powered performance optimizations.
In this session we’ll dive into the SQL kitchen and use a combination of SQL staples and nouvelle cuisine such as recursive queries, temporary tables, and stored procedures. We’ll leave you with well-scripted recipes to execute immediately or store for later consumption in your Unity Catalog. Think of this session as building your go-to cookbook of SQL techniques. Bon appétit!
This session is repeated. This introductory workshop caters to data engineers seeking hands-on experience and data architects looking to deepen their knowledge. The workshop is structured to provide a solid understanding of the following data engineering and streaming concepts: Introduction to Lakeflow and the Data Intelligence Platform Getting started with Lakeflow Declarative Pipelines for declarative data pipelines in SQL using Streaming Tables and Materialized Views Mastering Databricks Workflows with advanced control flow and triggers Understanding serverless compute Data governance and lineage with Unity Catalog Generative AI for Data Engineers: Genie and Databricks Assistant We believe you can only become an expert if you work on real problems and gain hands-on experience. Therefore, we will equip you with your own lab environment in this workshop and guide you through practical exercises like using GitHub, ingesting data from various sources, creating batch and streaming data pipelines, and more.
Want to learn how to build your own custom data intelligence applications directly in Databricks? In this workshop, we’ll guide you through a hands-on tutorial for building a Streamlit web app that leverages many of the key products at Databricks as building blocks. You’ll integrate a live DB SQL warehouse, use Genie to ask questions in natural language, and embed AI/BI dashboards for interactive visualizations. In addition, we’ll discuss key concepts and best practices for building production-ready apps, including logging and observability, scalability, different authorization models, and deployment. By the end, you'll have a working AI app—and the skills to build more.
Join this session for a concise tour of Apache Spark™ 4.0’s most notable enhancements: SQL features: ANSI by default, scripting, SQL pipe syntax, SQL UDF, session variable, view schema evolution, etc. Data type: VARIANT type, string collation Python features: Python data source, plotting API, etc. Streaming improvements: State store data source, state store checkpoint v2, arbitrary state v2, etc. Spark Connect improvements: More API coverage, thin client, unified Scala interface, etc. Infrastructure: Better error message, structured logging, new Java/Scala version support, etc. Whether you’re a seasoned Spark user or new to the ecosystem, this talk will prepare you to leverage Spark 4.0’s latest innovations for modern data and AI pipelines.
Writing SQL is a core part of any data analyst’s workflow, but small inefficiencies can add up, slowing down analysis and making it harder to iterate quickly. In this session, we’ll explore our powerful features in the Databricks SQL editor and notebook that help you to be more productive when writing SQL on Databricks. We’ll demo the new features and the customer use cases that inspired them.
How do you transform a data pipeline from sluggish 10-hour batch processing into a real-time powerhouse that delivers insights in just 10 minutes? This was the challenge we tackled at one of France's largest manufacturing companies, where data integration and analytics were mission-critical for supply chain optimization. Power BI dashboards needed to refresh every 15 minutes. Our team struggled with legacy Azure Data Factory batch pipelines. These outdated processes couldn’t keep up, delaying insights and generating up to three daily incident tickets. We identified Lakeflow Declarative Pipelines and Databricks SQL as the game-changing solution to modernize our workflow, implement quality checks, and reduce processing times.In this session, we’ll dive into the key factors behind our success: Pipeline modernization with Lakeflow Declarative Pipelines: improving scalability Data quality enforcement: clean, reliable datasets Seamless BI integration: Using Databricks SQL to power fast, efficient queries in Power BI
Redox & Databricks direct integration can streamline your interoperability workflows from responding in record time to preauthorization requests to letting attending physicians know about a change in risk for sepsis and readmission in near real time from ADTs. Data engineers will learn how to create fully-streaming ETL pipelines for ingesting, parsing and acting on insights from Redox FHIR bundles delivered directly to Unity Catalog volumes. Once available in the Lakehouse, AI/BI Dashboards and Agentic Frameworks help write FHIR messages back to Redox for direct push down to EMR systems. Parsing FHIR bundle resources has never been easier with SQL combined with the new VARIANT data type in Delta and streaming table creation against Serverless DBSQL Warehouses. We'll also use Databricks accelerators dbignite and redoxwrite for writing and posting FHIR bundles back to Redox integrated EMRs and we'll extend AI/BI with Unity Catalog SQL UDFs and the Redox API for use in Genie.
Migrating your Snowflake data warehouse to the Databricks Data Intelligence Platform can accelerate your data modernization journey. Though a cloud platform-to-cloud platform migration should be relatively easy, the breadth of the Databricks Platform provides flexibility and hence requires careful planning and execution. In this session, we present the migration methodology, technical approaches, automation tools, product/feature mapping, a technical demo and best practices using real-world case studies for migrating data, ELT pipelines and warehouses from Snowflake to Databricks.
Looking for a practical workshop on building an AI Agent on Databricks? Well, we have just the thing for you.This hands-on workshop takes you through the process of creating intelligent agents that can reason their way to useful outcomes. You'll start by building your own toolkit of SQL and Python functions that give your agent practical capabilities. Then we'll explore how to select the right foundation model for your needs, connect your custom tools, and watch as your agent tackles complex challenges through visible reasoning paths.The workshop doesn't just stop at building—you'll dive into evaluation techniques using evaluation datasets to identify where your agent shines and where it needs improvement. After implementing and measuring your changes, we'll explore deployment strategies, including a feedback collection interface that enables continuous improvement and governance mechanisms to ensure responsible AI usage in production environments.
Most organizations run complex cloud data architectures that silo applications, users and data. Join this interactive hands-on workshop to learn how Databricks SQL allows you to operate a multi-cloud lakehouse architecture that delivers data warehouse performance at data lake economics — with up to 12x better price/performance than traditional cloud data warehouses. Here’s what we’ll cover: How Databricks SQL fits in the Data Intelligence Platform, enabling you to operate a multicloud lakehouse architecture that delivers data warehouse performance at data lake economics How to manage and monitor compute resources, data access and users across your lakehouse infrastructure How to query directly on your data lake using your tools of choice or the built-in SQL editor and visualizations How to use AI to increase productivity when querying, completing code or building dashboards Ask your questions during this hands-on lab, and the Databricks experts will guide you.
HP Print's data platform team took on a migration from a monolithic, shared resource of AWS Redshift, to a modular and scalable data ecosystem on Databricks lakehouse. The result was 30–40% cost savings, scalable and isolated resources for different data consumers and ETL workloads, and performance optimization for a variety of query types. Through this migration, there were technical challenges and learnings relating to the ETL migrations with DBT, new Databricks features like Liquid Clustering, predictive optimization, Photon, SQL serverless warehouses, managing multiple teams on Unity Catalog, and others. This presentation dives into both the business and technical sides of this migration. Come along as we share our key takeaways from this journey.
In today’s fast-evolving crypto landscape, organizations require fast, reliable intelligence to manage risk, investigate financial crime, and stay ahead of evolving threats. In this session we will discover how Elliptic built a scalable, high-performance Data Intelligence Platform that delivers real-time, actionable Blockchain insights to their customers. We’ll walk you through some of the key components of the Elliptic Platform, including the Elliptic Entity Graph and our User-Facing Analytics. Our focus will be put on the evolution of our User-Facing Analytics capabilities, and specifically how components from the Databricks ecosystem such as Structured Streaming, Delta Lake, and SQL Warehouse have played a vital role. We’ll also share some of the optimizations we’ve made to our streaming jobs to maximize performance and ensure Data Completeness. Whether you’re looking to enhance your streaming capabilities, expand your knowledge of how crypto analytics works or simply discover novel approaches to data processing at scale, this session will provide concrete strategies and valuable lessons learned.
Migrating your legacy Oracle data warehouse to the Databricks Data Intelligence Platform can accelerate your data modernization journey. In this session, learn the top strategies for completing this data migration. We will cover data type conversion, basic to complex code conversions, validation and reconciliation best practices. Discover the pros and cons of using CSV files to PySpark or using pipelines to Databricks tables. See before-and-after architectures of customers who have migrated, and learn about the benefits they realized.
Databricks co-founders created Spark, the wildly popular open source foundation of Databricks, way back in 2009. Learn from Michael Armbrust, creator of Spark SQL and leader of Databricks Delta, about the latest happenings in Spark, Lakeflow Declarative Pipelines, and open source.
Multi-statement transactions bring the atomicity and reliability of traditional databases to modern data warehousing on the lakehouse. In this session, we’ll explore real-world patterns enabled by multi-statement transactions — including multi-table updates, deduplication pipelines and audit logging — and show how Databricks ensures atomicity and consistency across complex workflows. We’ll also dive into demos and share tips to getting started and migrations with this feature in Databricks SQL.
Insight will explore a multi-agent system built with LangGraph designed to alleviate the challenges faced by data analysts inundated with requests from business users. This innovative solution empowers users who lack SQL skills to easily access insights from specific Unity Catalog datasets. Discover how the Unity Catalog Agent Assistant streamlines data requests, enhances collaboration, and ultimately drives better decision-making across your organization.
Lakeflow Jobs is the production-ready fully managed orchestrator for the entire Lakehouse with 99.95% uptime. Join us for a dive into how you can orchestrate your enterprise data operations, from triggering your jobs only when your data is ready to advanced control flow with conditionals, looping and job modularity — with demos! Attendees will gain practical insights into optimizing their data operations by orchestrating with Lakeflow Jobs: New task types: Publish AI/BI Dashboards, push to Power BI or ingest with Lakeflow Connect Advanced execution control: Reference SQL Task outputs, run partial DAGs and perform targeted backfills Repair runs: Re-run failed pipelines with surgical precision using task-level repair Control flow upgrades: Native for-each loops and conditional logic make DAGs more dynamic + expressive Smarter triggers: Kick off jobs based on file arrival or Delta table changes, enabling responsive workflows Code-first approach to pipeline orchestration