talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

Building robust, production-grade data pipelines goes beyond writing transformation logic — it requires rigorous testing, version control, automated CI/CD workflows and a clear separation between development and production. In this talk, we’ll demonstrate how Lakeflow, paired with Databricks Asset Bundles (DABs), enables Git-based workflows, automated deployments and comprehensive testing for data engineering projects. We’ll share best practices for unit testing, CI/CD automation, data quality monitoring and environment-specific configurations. Additionally, we’ll explore observability techniques and performance tuning to ensure your pipelines are scalable, maintainable and production-ready.

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.

Marketing Runs on Your Data: Why IT Holds the Keys to Customer Growth

Marketing owns the outcomes, but IT owns the infrastructure that makes those outcomes possible. In today’s data-driven landscape, the success of customer engagement and personalization strategies depends on a tight partnership between marketing and IT. This session explores how leading brands are using Databricks and Epsilon to unlock the full value of first-party data — transforming raw data into rich customer profiles, real-time engagement and measurable marketing ROI. Join Epsilon to see how a unified data foundation powers marketing to drive outcomes — with IT as the enabler of scale, governance and innovation. Key takeaways: How to unify first-party data and resolve identities to build rich customer profiles with Databricks and Epsilon Why a collaborative approach between Marketing and IT accelerates data-driven decisions and drives greater return How to activate personalized campaigns with precision and speed across channels — from insights to execution

Measure What Matters: Quality-Focused Monitoring for Production AI Agents

Ensuring the operational excellence of AI agents in production requires robust monitoring capabilities that span both performance metrics and quality evaluation. This session explores Databricks' comprehensive Mosaic Agent Monitoring solution, designed to provide visibility into deployed AI agents through an intuitive dashboard that tracks critical operational metrics and quality indicators. We'll demonstrate how to use the Agent Monitoring solution to iteratively improve a production agent that delivers a better customer support experience while decreasing the cost of delivering customer support. We will show how to: Identify and proactively fix a quality problem with the GenAI agent’s response before it becomes a major issue. Understand user’s usage patterns and implement/test an feature improvement to the GenAI agent Key session takeaways include: Techniques for monitoring essential operational metrics, including request volume, latency, errors, and cost efficiency across your AI agent deployments Strategies for implementing continuous quality evaluation using AI judges that assess correctness, guideline adherence, and safety without requiring ground truth labels Best practices for setting up effective monitoring dashboards that enable dimension-based analysis across time periods, user feedback, and topic categories Methods for collecting and integrating end-user feedback to create a closed-loop system that drives iterative improvement of your AI agents

Optimizing Smart Meter IIoT Data in Databricks for At-Scale Interactive Electrical Load Analytics

Octave is a Plotly Dash application used daily by about 1,000 Hydro-Québec technicians and engineers to analyze smart meter load and voltage data from 4.5M meters across the province. As adoption grew, Octave’s back end was migrated to Databricks to address increasingly massive scale (>1T data points), governance and security requirements. This talk will summarize how Databricks was optimized to support performant at-scale interactive Dash application experiences while in parallel managing complex back-end ETL processes. The talk will outline optimizations targeted to further optimize query latency and user concurrency, along with plans to increase data update frequency. Non-technology related success factors to be reviewed will include the value of: subject matter expertise, operational autonomy, code quality for long-term maintainability and proactive vendor technical support.

Powering Secure and Scalable Data Governance at PepsiCo With Unity Catalog Open APIs

PepsiCo, given its scale, has numerous teams leveraging different tools and engines to access data and perform analytics and AI. To streamline governance across this diverse ecosystem, PepsiCo unifies its data and AI assets under an open and enterprise-grade governance framework with Unity Catalog. In this session, we'll explore real-world examples of how PepsiCo extends Unity Catalog’s governance to all its data and AI assets, enabling secure collaboration even for teams outside Databricks. Learn how PepsiCo architects permissions using service principals and service accounts to authenticate with Unity Catalog, building a multi-engine architecture with seamless and open governance. Attendees will gain practical insights into designing a scalable, flexible data platform that unifies governance across all teams while embracing openness and interoperability.

Real-Time Botnet Defense at CVS: AI-Driven Detection and Mitigation on Databricks

Botnet attacks mobilize digital armies of compromised devices that continuously evolve, challenging traditional security frameworks with their high-speed, high-volume nature. In this session, we will reveal our advanced system — developed on the Databricks platform — that leverages cutting-edge AI/ML capabilities to detect and mitigate bot attacks in near-real time. We will dive into the system’s robust architecture, including scalable data ingestion, feature engineering, MLOps strategies & production deployment of the system. We will address the unique challenges of processing bulk HTTP traffic data, time-series anomaly detection and attack signature identification. We will demonstrate key business values through downtime minimization and threat response automation. With sectors like healthcare facing heightened risks, ensuring data integrity and service continuity is vital. Join us to uncover lessons learned while building an enterprise-grade solution that stays ahead of adversaries.

Sponsored by: Confluent | Turn SAP Data into AI-Powered Insights with Databricks

Learn how Confluent simplifies real-time streaming of your SAP data into AI-ready Delta tables on Databricks. In this session, you'll see how Confluent’s fully managed data streaming platform—with unified Apache Kafka® and Apache Flink®—connects data from SAP S/4HANA, ECC, and 120+ other sources to enable easy development of trusted, real-time data products that fuel highly contextualized AI and analytics. With Tableflow, you can represent Kafka topics as Delta tables in just a few clicks—eliminating brittle batch jobs and custom pipelines. You’ll see a product demo showcasing how Confluent unites your SAP and Databricks environments to unlock ERP-fueled AI, all while reducing the total cost of ownership (TCO) for data streaming by up to 60%.

Sponsored by: Datafold | Breaking Free: How Evri is Modernizing SAP HANA Workflows to Databricks with AI and Datafold

With expensive contracts up for renewal, Evri faced the challenge of migrating 1,000 SAP HANA assets and 200+ Talend jobs to Databricks. This talk will cover how we transformed SAP HANA and Talend workflows into modern Databricks pipelines through AI-powered translation and validation -- without months of manual coding. We'll cover:- Techniques for handling SAP HANA's proprietary formats- Approaches for refactoring incremental pipelines while ensuring dashboard stability- The technology enabling automated translation of complex business logic- Validation strategies that guarantee migration accuracye'll share real examples of SAP HANA stored procedures transformed into Databricks code and demonstrate how we maintained 100% uptime of critical dashboards during the transition. Join us to discover how AI is revolutionizing what's possible in enterprise migrations from GUI-based legacy systems to modern, code-first data platforms.

Sponsored by: Dataiku | Agility Meets Governance: How Morgan Stanley Scales ML in a Regulated World

In regulated industries like finance, agility can't come at the cost of compliance. Morgan Stanley found the answer in combining Dataiku and Databricks to create a governed, collaborative ecosystem for machine learning and predictive analytics. This session explores how the firm accelerated model development and decision-making, reducing time-to-insight by 50% while maintaining full audit readiness. Learn how no-code workflows empowered business users, while scalable infrastructure powered Terabyte-scale ML. Discover best practices for unified data governance, risk automation, and cross-functional collaboration that unlock innovation without compromising security. Ideal for data leaders and ML practitioners in regulated industries looking to harmonize speed, control, and value.

Sponsored by: Impetus Technologies | Future-Ready Data at Scale: How Shutterfly Modernized for GenAI-Driven Personalization

As a leading personalized product retailer, Shutterfly needed a modern, secure, and performant data foundation to power GenAI-driven customer experiences. However, their existing stack was creating roadblocks in performance, governance, and machine learning scalability. In partnership with Impetus, Shutterfly embarked on a multi-phase migration to Databricks Unity Catalog. This transformation not only accelerated Shutterfly’s ability to provide AI-driven personalization at scale but also improved governance, reduced operational overhead, and laid a scalable foundation for GenAI innovation. Join experts from Databricks, Impetus, and Shutterfly to discover how this collaboration enabled faster data-driven decision-making, simplified compliance, and unlocked the agility needed to meet evolving customer demands in the GenAI era. Learn from their journey and take away best practices for your own modernization efforts.

Sponsored by: Promethium | Delivering Self-Service Data for AI Scale on Databricks

AI initiatives often stall when data teams can’t keep up with business demand for ad hoc, self-service data. Whether it’s AI agents, BI tools, or business users—everyone needs data immediately, but the pipeline-centric modern data stack is not built for this scale of agility. Promethium enables the data teams to generate instant, contextual data products called Data Answers based on rapid, exploratory questions from the business. Data Answers empower data teams for AI-scale collaboration with the business. We will demo Promethium’s new agent capability to build data answers on Databricks for self-service data. The Promethium agent leverages and extends Genie with context from other enterprise data and applications to ensure accuracy and relevance.

Sponsored by: Salesforce | From Data to Action: A Unified and Trusted Approach

Empower AI and agents with trusted data and metadata from an end-to-end unified system. Discover how Salesforce Data Cloud, Agentforce, and Databricks work together to fuel automation, AI, and analytics through a unified data strategy—driving real-time intelligence, enabling zero-copy data sharing, and unlocking scalable activation across the enterprise.

Sponsored by: Securiti | Safely Curating Data to Enable Enterprise AI with Databricks

This session will explore how developers can easily select, extract, filter, and control data pre-ingestion to accelerate safe AI. Learn how the Securiti and Databricks partnership empowers Databricks users by providing the critical foundation for unlocking scalability and accelerating trustworthy AI development and adoption.Key Takeaways:● Understand how to leverage data intelligence to establish a foundation for frameworks like OWASP top 10 for LLM’s, NIST AI RMF and Gartner’s TRiSM.● Learn how automated data curation and synching address specific risks while accelerating AI development in Databricks.● Discover how leading organizations are able to apply robust access controls across vast swaths of mostly unstructured data● Learn how to maintain data provenance and control as data is moved and transformed through complex pipelines in the Databricks platform.

Techcombank's Multi-Million Dollar Transformation Leveraging Cloud and Databricks

The migration to the Databricks Data Intelligence Platform has enabled Techcombank to more efficiently unify data from over 50 systems, improve governance, streamline daily operational analytics pipelines and use advanced analytics tools and AI to create more meaningful and personalized experiences for customers. With Databricks, Techcombank has also introduced key solutions that are reshaping its digital banking services: AI-driven lead management system: Techcombank's internally developed AI program called 'Lead Allocation Curated Engine' (LACE) optimizes lead management and provides relationship managers with enriched insights for smarter lead allocation to drive business growth. AI-powered program for digital banking inclusion of small businesses: An AI-powered GeoSense assists frontline workers with analytics-driven insights about which small businesses and merchants to engage in the bank's digital ecosystem. And more examples, which will be presented.

Unlocking Cross-Organizational Collaboration to Protect the Environment With Databricks at DEFRA

Join us to learn how the UK's Department for Environment, Food & Rural Affairs (DEFRA) transformed data use with Databricks’ Unity Catalog, enabling nationwide projects through secure, scalable analytics. DEFRA safeguards the UK's natural environment. Historical fragmentation of data, talent and tools across siloed platforms and organizations, made it difficult to fully exploit the department’s rich data. DEFRA launched its Data Analytics & Science Hub (DASH), powered by the Databricks Data Intelligence Platform, to unify its data ecosystem. DASH enables hundreds of users to access and share datasets securely. A flagship example demonstrates its power, using Databricks to process aerial photography and satellite data to identify peatlands in need of restoration — a complex task made possible through unified data governance, scalable compute and AI. Attendees will hear about DEFRA’s journey, learn valuable lessons about building a platform crossing organizational boundaries.

Using Delta-rs and Delta-Kernel-rs to Serve CDC Feeds

Change data feeds are a common tool for synchronizing changes between tables and performing data processing in a scalable fashion. Serverless architectures offer a compelling solution for organizations looking to avoid the complexity of managing infrastructure. But how can you bring CDFs into a serverless environment? In this session, we'll explore how to integrate Change Data Feeds into serverless architectures using Delta-rs and Delta-kernel-rs—open-source projects that allow you to read Delta tables and their change data feeds in Rust or Python. We’ll demonstrate how to use these tools with Lakestore’s serverless platform to easily stream and process changes. You’ll learn how to: Leverage Delta tables and CDFs in serverless environments Utilize Databricks and Unity Catalog without needing Apache Spark

Pacers Sports and Entertainment and Databricks
talk
by Ari Kaplan (Databricks) , Jared Chavez (Pacers Sports & Entertainment) , Rick Schultz (Databricks)

The Pacers Sports Group has had an amazing year. The Indianapolis Pacers in the NBA finals for the first time in 25 years. The Fever are setting attendance and viewership records with WNBA celebrity Caitlin Clark. Hear how they have transformed their data and AI capabilities for marketing, fan behavior insights, season ticket propensity models, and democratization to their non-technical personas. And receiving a 12,000x cost reduction down to just $8 a year switching to Databricks.

Creating a Custom PySpark Stream Reader with PySpark 4.0

PySpark supports many data sources out of the box, such as Apache Kafka, JDBC, ODBC, Delta Lake, etc. However, some older systems, such as systems that use JMS protocol, are not supported by default and require considerable extra work for developers to read from them. One such example is ActiveMQ for streaming. Traditionally, users of ActiveMQ have to use a middle-man in order to read the stream with Spark (such as writing to a MySQL DB using Java code and reading that table with Spark JDBC). With PySpark 4.0’s custom data sources (supported in DBR 15.3+) we are able to cut out the middle-man processing using batch or Spark Streaming and consume the queues directly from PySpark, saving developers considerable time and complexity in getting source data into your Delta Lake and governed by Unity Catalog and orchestrated with Databricks Workflows.

Disney's Foundational Medallion: A Journey Into Next-Generation Data Architecture

Step into the world of Disney Streaming as we unveil the creation of our Foundational Medallion, a cornerstone in our architecture that redefines how we manage data at scale. In this session, we'll explore how we tackled the multi-faceted challenges of building a consistent, self-service surrogate key architecture — a foundational dataset for every ingested stream powering Disney Streaming's data-driven decisions. Learn how we streamlined our architecture and unlocked new efficiencies by leveraging cutting-edge Databricks features such as liquid clustering, Photon with dynamic file pruning, Delta's identity column, Unity Catalog and more — transforming our implementation into a simpler, more scalable solution. Join us on this thrilling journey as we navigate the twists and turns of designing and implementing a new Medallion at scale — the very heartbeat of our streaming business!