talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

509

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Data + AI Summit 2025 ×
What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

Databricks continues to redefine how organizations securely and openly collaborate on data. With new innovations like Clean Rooms for multi-party collaboration, Sharing for Lakehouse Federation, cross-platform view sharing and Databricks Apps in the Marketplace, teams can now share and access data more easily, cost-effectively and across platforms — whether or not they’re using Databricks. In this session, we’ll deliver live demos of key capabilities that power this transformation: Delta Sharing: The industry’s only open protocol for seamless cross-platform data sharing Databricks Marketplace: A central hub for discovering and monetizing data and AI assets Clean Rooms: A privacy-preserving solution for secure, multi-party data collaboration Join us to see how these tools enable trusted data sharing, accelerate insights and drive innovation across your ecosystem. Bring your questions and walk away with practical ways to put these capabilities into action today.

Your Wish is AI Command — Get to Grips With Databricks Genie

Picture the scene — you're exploring a deep, dark cave looking for insights to unearth when, in a burst of smoke, Genie appears and offers you not three but unlimited data wishes. This isn't a folk tale, it's the growing wave of Generative BI that is going to be a part of analytics platforms. Databricks Genie is a tool powered by a SQL-writing LLM that redefines how we interact with data. We'll look at the basics of creating a new Genie room, scoping its data tables and asking questions. We'll help it out with some complex pre-defined questions and ensure it has the best chance of success. We'll give the tool a personality, set some behavioural guidelines and prepare some hidden easter eggs for our users to discover. Generative BI is going to be a fundamental part of the analytics toolset used across businesses. If you're using Databricks, you should be aware of Genie, if you're not, you should be planning your Generative BI Roadmap, and this session will answer your wishes.

keynote
by Jamie Dimon (JPMorgan Chase) , Kasey Uhlenhuth (Databricks) , Justin DeBrabant (Databricks) , Greg Ulrich (Mastercard) , Richard Masters (Virgin Atlantic Airways) , Ali Ghodsi (Databricks) , Reynold Xin (Databricks) , Nikita Shamgunov (Neon) , Dario Amodei (Anthropic) , Holly Smith (Databricks) , Hanlin Tang (Databricks)

Be first to witness the latest breakthroughs from Databricks and share the success of innovative data and AI companies.

lightning_talk
by Nick Karpov (Databricks) , Holly Smith (Databricks)

Join a live recording of the Over Architected Databricks podcast with Nick and Holly as they take the hottest features for the coming week and try to shoehorn them into one architecture. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Advanced Data Access Control for the Exabyte Era: Scaling with Purpose

As data-driven companies scale from small startups to global enterprises, managing secure data access becomes increasingly complex. Traditional access control models fall short at enterprise scale, where dynamic, purpose-driven access is essential. In this talk, we explore how our “Just-in-Time” Purpose-Based Access Control (PBAC) platform addresses the evolving challenges of data privacy and compliance, maintaining least privilege while ensuring productivity. Using features like Unity Catalog, Delta Sharing & Databricks Apps, the platform delivers real-time, context-aware data governance. Leveraging JIT PBAC keeps your data secure, your engineers productive, your legal & security teams happy and your organization future-proof in the ever-evolving compliance landscape.

A Practitioner’s Guide to Databricks Serverless

This session is repeated. Databricks Serverless revolutionizes data engineering and analytics by eliminating the complexities of infrastructure management. This talk will provide an overview of this powerful serverless compute option, highlighting how it enables practitioners to focus solely on building robust data pipelines. We'll explore the core benefits, including automatic scaling, cost optimization and seamless integration with the Databricks ecosystem. Learn how serverless workflows simplify the orchestration of various data tasks, from ingestion to dashboards, ultimately accelerating time-to-insight and boosting productivity. This session is ideal for data engineers, data scientists and analysts looking to leverage the agility and efficiency of serverless computing in their data workflows.

Databricks as the Backbone of MLOps: From Orchestration to Inference

As machine learning (ML) models scale in complexity and impact, organizations must establish a robust MLOps foundation to ensure seamless model deployment, monitoring and retraining. In this session, we’ll share how we leverage Databricks as the backbone of our MLOps ecosystem — handling everything from workflow orchestration to large-scale inference. We’ll walk through our journey of transitioning from fragmented workflows to an integrated, scalable system powered by Databricks Workflows. You’ll learn how we built an automated pipeline that streamlines model development, inference and monitoring while ensuring reliability in production. We’ll also discuss key challenges we faced, lessons learned and best practices for organizations looking to operationalize ML with Databricks.

Delta and Databricks as a Performant Exabyte-Scale Application Backend

The Delta Lake architecture promises to provide a single, highly functional, and high-scale copy of data that can be leveraged by a variety of tools to satisfy a broad range of use cases. To date, most use cases have focused on interactive data warehousing, ETL, model training, and streaming. Real-time access is generally delegated to costly and sometimes difficult-to-scale NoSQL, indexed storage, and domain-specific specialty solutions, which provide limited functionality compared to Spark on Delta Lake. In this session, we will explore the Delta data-skipping and optimization model and discuss how Capital One leveraged it along with Databricks photon and Spark Connect to implement a real-time web application backend. We’ll share how we built a highly-functional and performant security information and event management user experience (SIEM UX) that is cost effective.

Empowering Business Users With Databricks — Integrating AI/BI Genie With Microsoft Teams

In this session, we'll explore how Rooms To Go enhances organizational collaboration by integrating AI/BI Genie with Microsoft Teams. Genie enables warehouse employees and members of the sales team to interact with data using natural language, simplifying data exploration and analysis. By connecting Genie to Microsoft Teams, we bring real-time data insights directly to a user’s phone. We'll provide a comprehensive overview on setting up this integration as well as a demo of how the team uses it daily. Attendees will gain practical knowledge to implement this integration, empowering their teams to access and interact with data seamlessly within Microsoft Teams.

Enterprise Cost Management for Data Warehousing with Databricks SQL

This session shows you how to gain visibility into your Databricks SQL spend and ensure cost efficiency. Learn about the latest features to gain detailed insights into Databricks SQL expenses so you can easily monitor and control your costs. Find out how you can enable attribution to internal projects, understand the Total Cost of Ownership, set up proactive controls and find ways to continually optimize your spend.

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance

We will present a framework for FinCrime detection leveraging Databricks lakehouse architecture specifically how institutions can achieve both data flexibility & ACID transaction guarantees essential for FinCrime monitoring. The framework incorporates advanced ML models for anomaly detection, pattern recognition, and predictive analytics, while maintaining clear data lineage & audit trails required by regulatory bodies. We will also discuss some specific improvements in reduction of false positives, improvement in detection speed, and faster regulatory reporting, delve deep into how the architecture addresses specific FATF recommendations, Basel III risk management requirements, and BSA compliance obligations, particularly in transaction monitoring and SAR. The ability to handle structured and unstructured data while maintaining data quality and governance makes it particularly valuable for large financial institutions dealing with complex, multi-jurisdictional compliance requirements.

From Imperative to Declarative Paradigm: Rebuilding a CI/CD Infrastructure Using Hatch and DABs

Building and deploying Pyspark pipelines to Databricks should be effortless. However, our team at FreeWheel has, for the longest time, struggled with a convoluted and hard-to-maintain CI/CD infrastructure. It followed an imperative paradigm, demanding that every project implement custom scripts to build artifacts and deploy resources, and resulting in redundant boilerplate code and awkward interactions with the Databricks REST API. We set our mind on rebuilding it from scratch, following a declarative paradigm instead. We will share how we were able to eliminate thousands of lines of code from our repository, create a fully configuration-driven infrastructure where projects can be easily onboarded, and improve the quality of our codebase using Hatch and Databricks Asset Bundles as our tools of choice. In particular, DAB has made deploying across our 3 environments a breeze, and has allowed us to quickly adopt new features as soon as they are released by Databricks.

Maximize Retail Data Insights in Genie with DeltaSharing via Crisp’s Collaborative Commerce Platform

Crisp streamlines a brand’s data ingestion across 60+ retail sources, to build a foundation of sales and inventory intelligence on Databricks. Data is normalized and analysis-ready, and integrates seamlessly with AI tools - such as Databricks’ Genie and Blueprints. This session will provide an overview of the Crisp retail data platform and how our semantic layer, normalized and harmonized data sets can help drive powerful insights for supply chain, BI/Analytics, and data science teams.

Real-Time Analytics Pipeline for IoT Device Monitoring and Reporting

This session will show how we implemented a solution to support high-frequency data ingestion from smart meters. We implemented a robust API endpoint that interfaces directly with IoT devices. This API processes messages in real time from millions of distributed IoT devices and meters across the network. The architecture leverages cloud storage as a landing zone for the raw data, followed by a streaming pipeline built on Lakeflow Declarative Pipelines. This pipeline implements a multi-layer medallion architecture to progressively clean, transform and enrich the data. The pipeline operates continuously to maintain near real-time data freshness in our gold layer tables. These datasets connect directly to Databricks Dashboards, providing stakeholders with immediate insights into their operational metrics. This solution demonstrates how modern data architecture can handle high-volume IoT data streams while maintaining data quality and providing accessible real-time analytics for business users.

Scaling Trust in BI: How Bolt Manages Thousands of Metrics Across Databricks, dbt, and Looker

Managing metrics across teams can feel like everyone’s speaking a different language, which often leads to loss of trust in numbers. Based on a real-world use case, we’ll show you how to establish a governed source of truth for metrics that works at scale and builds a solid foundation for AI integration. You’ll explore how Bolt.eu’s data team governs consistent metrics for different data users and leverages Euno’s automations to navigate the overlap between Looker and dbt. We’ll cover best practices for deciding where your metrics belong and how to optimize engineering and maintenance workflows across Databricks, dbt and Looker. For curious analytics engineers, we’ll dive into thinking in dimensions & measures vs. tables & columns and determining when pre-aggregations make sense. The goal is to help you contribute to a self-serve experience with consistent metric definitions, so business teams and AI agents can access the right data at the right time without endless back-and-forth.

Securing Databricks using Databricks as SIEM showcases our approach on how we leverage Databricks product capabilities to prevent and mitigate security risks for Databricks. It demonstrates how Databricks can serve as a powerful Security Information and Event Management (SIEM) platform, offering advanced capabilities for data collection and threat detection. This session explores data collection from diverse data sources and real-time threat detection.

Simplify Data Ingest and Egress with the New Python Data Source API

Data engineering teams are frequently tasked with building bespoke ingest and/or egress solutions for myriad custom, proprietary, or industry-specific data sources or sinks. Many teams find this work cumbersome and time-consuming. Recognizing these challenges, Databricks interviewed numerous companies across different industries to better understand their diverse data integration needs. This comprehensive feedback led us to develop the Python Data Source API for Apache Spark™.

Sponsored by: Google Cloud | Unleash the power of Gemini for Databricks

Elevate your AI initiatives on Databricks by harnessing the latest advancements in Google Cloud's Gemini models. Learn how to integrate Gemini's built-in reasoning and powerful development tools to build more dynamic and intelligent applications within your existing Databricks platform. We'll explore concrete ideas for agentic AI solutions, showcasing how Gemini can help you unlock new value from your data in Databricks.

Sponsored by: Hightouch | Unleashing AI at PetSmart: Using AI Decisioning Agents to Drive Revenue

With 75M+ Treats Rewards members, PetSmart knows how to build loyalty with pet parents. But recently, traditional email testing and personalization strategies weren’t delivering the engagement and growth they wanted—especially in the Salon business. This year, they replaced their email calendar and A/B testing with AI Decisioning, achieving a +22% incremental lift in bookings. Join Bradley Breuer, VP of Marketing – Loyalty, Personalization, CRM, and Customer Analytics, to learn how his team reimagined CRM using AI to personalize campaigns and dynamically optimize creative, offers, and timing for every unique pet parent. Learn: How PetSmart blends human insight and creativity with AI to deliver campaigns that engage and convert. How they moved beyond batch-and-blast calendars with AI Decisioning Agents to optimize sends—while keeping control over brand, messaging, and frequency. How using Databricks as their source of truth led to surprising learnings and better outcomes.