talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

Sponsored by: Actian | Beyond the Lakehouse: Unlocking Enterprise-Wide AI-Ready Data with Unified Metadata Intelligence

As organizations scale AI initiatives on platforms like Databricks, one challenge remains: bridging the gap between the data in the lakehouse and the vast, distributed data that lives elsewhere. Turning massive volumes of technical metadata into trusted, business-ready insight requires more than cataloging what's inside the lakehouse—it demands true enterprise-wide intelligence. Actian CTO Emma McGrattan will explore how combining Databricks Unity Catalog with the Actian Data Platform extends visibility, governance, and trust beyond the lakehouse. Learn how leading enterprises are: Integrating metadata across all enterprise data assets for complete visibility Enriching Unity Catalog metadata with business context for broader usability Empowering non-technical users to discover, trust, and act on AI-ready data Building a foundation for scalable data productization with governance by design

Sponsored by: Alation | Better Together: Enterprise Catalog with Databricks & Alation at American Airlines

In the era of data-driven enterprises, true democratization requires more than just access–it demands context, trust, and governance at scale. In this session, discover how to seamlessly integrate Databricks Unity Catalog with Alation’s Enterprise Data Catalog to deliver: End-to-End Lineage Storytelling: Unify technical and business views into a single, cohesive narrative that resonates with both technical engineers and non-technical stakeholders across business domains Accelerated and Democratized Insights: Automate metadata stitching to reduce time-to-insight, enabling analysts to answer critical business questions faster and drive multi-domain collaboration Empowered, Trustworthy Discovery: Equip business users with a unified platform, populated with rich documentation and usage signals, so they can find, understand, and confidently use trusted data assets

Sponsored by: Fivetran | Raw Data to Real-Time Insights: How Dropbox Revolutionized Data Ingestion

Dropbox, a leading cloud storage platform, is on a mission to accelerate data insights to better understand customers’ needs and elevate the overall customer experience. By leveraging Fivetran’s data movement platform, Dropbox gained real-time visibility into customer sentiment, marketing ROI, and ad performance-empowering teams to optimize spend, improve operational efficiency, and deliver greater business outcomes.Join this session to learn how Dropbox:- Cut data pipeline time from 8 weeks to 30 minutes by automating ingestion and streamlining reporting workflows.- Enable real-time, reliable data movement across tools like Zendesk Chat, Google Ads, MySQL, and more — at global operations scale.- Unify fragmented data sources into the Databricks Data Intelligence Platform to reduce redundancy, improve accessibility, and support scalable analytics.

Sponsored by: Slalom | Nasdaq's Journey from Fragmented Customer Data to AI-Ready Insights

Nasdaq’s rapid growth through acquisitions led to fragmented client data across multiple Salesforce instances, limiting cross-sell potential and sales insights. To solve this, Nasdaq partnered with Slalom to build a unified Client Data Hub on the Databricks Lakehouse Platform. This cloud-based solution merges CRM, product usage, and financial data into a consistent, 360° client view accessible across all Salesforce orgs with bi-directional integration. It enables personalized engagement, targeted campaigns, and stronger cross-sell opportunities across all business units. By delivering this 360 view directly in Salesforce, Nasdaq is improving sales visibility, client satisfaction, and revenue growth. The platform also enables advanced analytics like segmentation, churn prediction, and revenue optimization. With centralized data in Databricks, Nasdaq is now positioned to deploy next-gen Agentic AI and chatbots to drive efficiency and enhance sales and marketing experiences.

Traditional MDM is Dead. How Next-Generation Data Products are Winning the Enterprise

Organizations continue to struggle under the weight of data that still exists across multiple siloed sources, leaving data teams caught between their crumbling legacy data foundations and the race to build new AI and data-driven applications. Modern enterprises are quickly pivoting to data products that simplify and improve reusable data pipelines by joining data at massive scale and publishing it for internal users and the applications that drive business outcomes. Learn how Quantexa with Databricks enables an internal data marketplace to deliver the value that traditional data platforms never could.

Unlocking the Power of Iceberg: Our Journey to a Unified Lakehouse on Databricks

This session showcases our journey of adopting Apache Iceberg™ to build a modern lakehouse architecture and leveraging Databricks advanced Iceberg support to take it to the next level. We’ll dive into the key design principles behind our lakehouse, the operational challenges we tackled and how Databricks enabled us to unlock enhanced performance, scalability and streamlined data workflows. Whether you’re exploring Apache Iceberg™ or building a lakehouse on Databricks, this session offers actionable insights, lessons learned and best practices for modern data engineering.

Accelerate End-to-End Multi-Agents on Databricks and DSPy

A production-ready GenAI application is more than the framework itself. Like ML, you need a unified platform to create an end-to-end workflow for production quality applications.Below is an example of how this works on Databricks: Data ETL with Lakeflow Declarative Pipelines and jobs Data storage for governance and access with Unity Catalog Code development with Notebooks Agent versioning and metric tracking with MLflow and Unity Catalog Evaluation and optimizations with Mosaic AI Agent Framework and DSPy Hosting infrastructure with monitoring with Model Serving and AI Gateway Front-end apps using Databricks Apps In this session, learn how to build agents to access all your data and models through function calling. Then, learn how DSPy enables agent interaction with each other to ensure the question is answered correctly. We will demonstrate a chatbot, powered by multiple agents, to be able to answer questions and reason answers the base LLM does not know and very specialized topics.ow and very specialized topics.

AI Meets SQL: Leverage GenAI at Scale to Enrich Your Data

This session is repeated. Integrating AI into existing data workflows can be challenging, often requiring specialized knowledge and complex infrastructure. In this session, we'll share how SQL users can leverage AI/ML to access large language models (LLMs) and traditional machine learning directly from within SQL, simplifying the process of incorporating AI into data workflows. We will demonstrate how to use Databricks SQL for natural language processing, traditional machine learning, retrieval augmented generation and more. You'll learn about best practices and see examples of solving common use cases such as opinion mining, sentiment analysis, forecasting and other common AI/ML tasks.

A Prescription for Success: Leveraging DABs for Faster Deployment and Better Patient Outcomes

Health Catalyst (HCAT) transformed its CI/CD strategy by replacing a rigid, internal deployment tool with Databricks Asset Bundles (DABs), unlocking greater agility and efficiency. This shift streamlined deployments across both customer workspaces and HCAT's core platform, accelerating time to insights and driving continuous innovation. By adopting DABs, HCAT ensures feature parity, standardizes metric stores across clients, and rapidly delivers tailored analytics solutions. Attendees will gain practical insights into modernizing CI/CD pipelines for healthcare analytics, leveraging Databricks to scale data-driven improvements. HCAT's next-generation platform, Health Catalyst Ignite™, integrates healthcare-specific data models, self-service analytics, and domain expertise—powering faster, smarter decision-making.

Barclays Post Trade real-time trade monitoring platform was historically built on a complex set of legacy technologies including Java, Solace, and custom micro-services.This session will demonstrate how the power of Lakeflow Declarative Pipelines' new real-time mode, in conjunction with the foreach_batch_sink, can enable simple, cost-effective streaming pipelines that can load high volumes of data into Databricks new Serverless OLTP database with very low latency.Once in our OLTP database, this can be used to update real-time trading dashboards, securely hosted in Databricks Apps, with the latest stock trades - enabling better, more responsive decision-making and alerting.The session will walk-through the architecture, and demonstrate how simple it is to create and manage the pipelines and apps within the Databricks environment.

Databricks on Databricks: Powering Marketing Insights with Lakehouse

This presentation outlines the evolution of our marketing data strategy, focusing on how we’ve built a strong foundation using the Databricks Lakehouse. We will explore key advancements across data ingestion, strategy, and insights, highlighting the transition from legacy systems to a more scalable and intelligent infrastructure. Through real-world applications, we will showcase how unified Customer 360 insights drive personalization, predictive analytics enhance campaign effectiveness, and GenAI optimizes content creation and marketing execution. Looking ahead, we will demonstrate the next phase of our CDP, the shift toward an end-user-first analytics model powered by AIBI, Genie and Matik, and the growing importance of clean rooms for secure data collaboration. This is just the beginning, and we are poised to unlock even greater capabilities in the future.

Managing the Governed Cloud

As organizations increasingly adopt Databricks as a unified platform for analytics and AI, ensuring robust data governance becomes critical for compliance, security, and operational efficiency. This presentation will explore the end-to-end framework for governing the Databricks cloud, covering key use cases, foundational governance principles, and scalable automation strategies. We will discuss best practices for metadata, data access, catalog, classification, quality, and lineage, while leveraging automation to streamline enforcement. Attendees will gain insights into best practices and real-world approaches to building a governed data cloud that balances innovation with control.

Real-Time Mode Technical Deep Dive: How We Built Sub-300 Millisecond Streaming Into Apache Spark™

Real-time mode is a new low-latency execution mode for Apache Spark™ Structured Streaming. It can consistently provide p99 latencies less than 300 milliseconds for a broad set of stateless and stateful streaming queries. Our talk focuses on the technical aspects of making this possible in Spark. We’ll dive into the core architecture that enables these dramatic latency improvements, including a concurrent stage scheduler and a non-blocking shuffle. We’ll explore how we maintained Spark’s fault-tolerance guarantees, and we’ll also share specific optimizations we made to our streaming SQL operators. These architectural improvements have already enabled Databricks customers to build workloads with latencies up to 10x lower than before. Early adopters in our Private Preview have successfully implemented real-time enrichment pipelines and feature engineering for machine learning — use cases that were previously impossible at these latencies.

Revolutionizing Nuclear AI With HiVE and Bertha on Databricks Architecture

In this session we will explore the revolutionary advancements in nuclear AI capabilities with HiVE and Bertha on Databricks architecture. HiVE, developed by Westinghouse, leverages over a century of proprietary data to deliver unparalleled AI capabilities. At its core is Bertha, a generative AI model designed to tackle the unique challenges of the nuclear industry. This session will delve into the technical architecture of HiVE and Bertha, showcasing how Databricks' scalable environment enhances their performance. We will discuss the secure data infrastructure supporting HiVE, ensuring data integrity and compliance. Real-world applications and use cases will demonstrate the impact of HiVE and Bertha on improving efficiency, innovation and safety in nuclear operations. Discover how the fusion of HiVE and Bertha with Databricks architecture is transforming the nuclear AI landscape and driving the future of nuclear technology.

Serverless Compute for Notebooks, Jobs and Lakeflow Declarative Pipelines

Discover how Databricks serverless compute revolutionizes data workflows by eliminating infrastructure management, enabling rapid scaling and optimizing costs for Notebooks, Jobs and Lakeflow Declarative Pipelines. This session will delve into the serverless architecture, highlighting its ability to dynamically allocate resources, reduce idle costs and simplify development cycles. Learn about recent advancements, including cost savings and practical strategies for migration and optimization. Tailored for Data Engineers and Architects, this talk will also explore use cases, features, limitations and future roadmap, empowering you to make informed infrastructure decisions while unlocking the full potential of Databricks’ serverless capabilities.

Shifting Left — Setting up Your GenAI Ecosystem to Work for Business Analysts

At Data and AI in 2022, Databricks pioneered the term to shift left in how AI workloads would enable less data science driven people to create their own apps. In 2025, we take a look at how Experian is doing on that journey. This session highlights Databricks services that assist with the shift left paradigm for Generative AI, including how AI/BI Genie helps with Generative analytics, and how Agent Studio helps with synthetic generation of test cases to validate model performance.

Understanding customer engagement and retention isn’t optional—it’s mission-critical. Join us for a live demo to see how you can build a scalable, governed customer health scoring model by transforming raw signals into actionable insights. Discover how Coalesce’s low-code development platform works seamlessly with Databricks’ lakehouse architecture to unify and operationalize customer data at scale. With built-in governance, automation, and metadata intelligence, you’ll deliver trusted scores that support proactive decision-making across the business. Why Attend? Accelerate time-to-insight with automated, low-code transformations Build repeatable, enterprise-grade scoring models with full data lineage Ensure governance, transparency, and compliance at every step

Sponsored by: ThoughtSpot | How Chevron Fuels Cloud Data Modernization

Learn how Chevron transitioned their central finance and procurement analytics into the cloud using Databricks and ThoughtSpot’s Agentic Analytics Platform. Explore how Chevron leverages ThoughtSpot to unlock actionable insights, enhance their semantic layer with user-driven understanding, and ultimately drive more impactful strategies for customer engagement and business growth. In this session, Chevron explains the dos, don’ts, and best practices of migrating from outdated legacy business intelligence to real time, AI-powered insights.

Streaming Meets Governance: Building AI-Ready Tables With Confluent Tableflow and Unity Catalog

Learn how Databricks and Confluent are simplifying the path from real-time data to governed, analytics- and AI-ready tables. This session will cover how Confluent Tableflow automatically materializes Kafka topics into Delta tables and registers them with Unity Catalog — eliminating the need for custom streaming pipelines. We’ll walk through how this integration helps data engineers reduce ingestion complexity, enforce data governance and make real-time data immediately usable for analytics and AI.