talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

AI Powering Epsilon's Identity Strategy: Unified Marketing Platform on Databricks

Join us to hear about how Epsilon Data Management migrated Epsilon’s unique, AI-powered marketing identity solution from multi-petabyte on-prem Hadoop and data warehouse systems to a unified Databricks Lakehouse platform. This transition enabled Epsilon to further scale its Decision Sciences solution and enable new cloud-based AI research capabilities on time and within budget, without being bottlenecked by the resource constraints of on-prem systems. Learn how Delta Lake, Unity Catalog, MLflow and LLM endpoints powered massive data volume, reduced data duplication, improved lineage visibility, accelerated Data Science and AI, and enabled new data to be immediately available for consumption by the entire Epsilon platform in a privacy-safe way. Using the Databricks platform as the base for AI and Data Science at global internet scale, Epsilon deploys marketing solutions across multiple cloud providers and multiple regions for many customers.

As first-party data becomes increasingly invaluable to organizations, Walmart Data Ventures is dedicated to bringing to life new applications of Walmart’s first-party data to better serve its customers. Through Scintilla, its integrated insights ecosystem, Walmart Data Ventures continues to expand its offerings to deliver insights and analytics that drive collaboration between our merchants, suppliers, and operators.​Scintilla users can now access Walmart data using Cloud Feeds, based on Databricks Delta Sharing technologies. In the past, Walmart used API-based data sharing models, which required users to possess certain skills and technical attributes that weren’t always available. Now, with Cloud Feeds, Scintilla users can more easily access data without a dedicated technical team behind the scenes making it happen. Attendees will gain valuable insights into how Walmart has built its robust data sharing architecture and strategies to design scalable and collaborative data sharing architectures in their own organizations.

Demystifying Upgrading to Unity Catalog — Challenges, Design and Execution

Databricks Unity Catalog (UC) is the industry’s only unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. UC provides a single source of truth for organization’s data and AI, providing open connectivity to any data source, any format, lineage, monitoring and support for open sharing and collaboration. In this session we will discuss the challenges in upgrading to UC from your existing databricks Non-UC set up. We will discuss a few customer use cases and how we overcame difficulties and created a repeatable pattern and reusable assets to replicate the success of upgrading to UC across some of the largest databricks customers. It is co-presented with our partner Celebal Technologies.

Dusting off the Cobwebs — Moving off a 26-year-old Heritage Platform to Databricks [Teradata]

Join us to hear about how National Australia Bank (NAB) successfully completed a significant milestone in its data strategy by decommissioning its 26-year-old Teradata environment and migrating to a new strategic data platform called 'Ada'. This transition marks a pivotal shift from legacy systems to a modern, cloud-based data and AI platform powered by Databricks. The migration process, which spanned two years, involved ingesting 16 data sources, transferring 456 use cases, and collaborating with hundreds of users across 12 business units. This strategic move positions NAB to leverage the full potential of cloud-native data analytics, enabling more agile and data-driven decision-making across the organization. The successful migration to Ada represents a significant step forward in NAB's ongoing efforts to modernize its data infrastructure and capitalize on emerging technologies in the rapidly evolving financial services landscape

Empowering Progress: Building a Personalized Training Goal Ecosystem with Databricks

Tonal is the ultimate strength training system, giving you the expertise of a personal trainer and a full gym in your home. Through user interviews and social media feedback, we identified a consistent challenge: members found it difficult to measure their progress in their fitness journey. To address this, we developed the Training Goal (TG) ecosystem, a four-part solution that introduced new preference options to capture users' fitness aspirations, implemented weekly metrics that accumulate as members complete workouts, defined personalized weekly targets to guide progress, and enhanced workout details to show how each session contributes toward individual goals.We present how we leveraged Spark, MLflow, and Workflows within the Databricks ecosystem to compute TG metrics, manage model development, and orchestrate data pipelines. These tools allowed us to launch the TG system on schedule, supporting scalability, reliability, and a more meaningful, personalized way for members to track their progress.

From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%

The Global Water Security Center translates environmental science into actionable insights for the U.S. Department of Defense. Prior to incorporating Databricks, responding to these requests required querying approximately five hundred thousand raster files representing over five hundred billion points. By leveraging lakehouse architecture, Databricks Auto Loader, Spark Streaming, Databricks Spatial SQL, H3 geospatial indexing and Databricks Liquid Clustering, we were able to drastically reduce our “time to analysis” from multiple business days to a matter of seconds. Now, our data scientists execute queries on pre-computed tables in Databricks, resulting in a “time to analysis” that is 99% faster, giving our teams more time for deeper analysis of the data. Additionally, we’ve incorporated Databricks Workflows, Databricks Asset Bundles, Git and Git Actions to support CI/CD across workspaces. We completed this work in close partnership with Databricks.

GenAI Observability in Customer Care

Customer support is going through the GenAI revolution, but how can we use AI to foster deeper empathy with our end users?To enable this, Earnin has built its GenAI observability platform on Databricks, leveraging Lakeflow Declarative Pipeliness, Kafka and Databricks AI/BI.This session covers how we use Lakeflow Declarative Pipelines to monitor our customer care chatbot in near real-time and how we leverage Databricks to better anticipate our customers' needs.

In this session, we will explore how Genie, an AI-driven platform transformed HVAC operational insights by leveraging Databricks offerings like Apache Spark, Delta Lake and the Databricks Data Intelligence Platform.Key contributions: Real-time data processing: Lakeflow Declarative Pipelines and Apache Spark™ for efficient data ingestion and real-time analysis. Workflow orchestration: Databricks Data Intelligence Platform to orchestrate complex workflows and integrate various data sources and analytical tools. Field Data Integration: Incorporating real-time field data into design and algorithm development, enabling engineers to make informed adjustments and optimize performance. By analyzing real-time data from HVAC installations, Genie identified discrepancies between design specs and field performance, allowing engineers to optimize algorithms, reduce inefficiencies and improve customer satisfaction. Discover how Genie revolutionized HVAC management and apply to your projects.

Geo-Powering Insights: The Art of Spatial Data Integration and Visualization

In this presentation, we will explore how to leverage Databricks' SQL engine to efficiently ingest and transform geospatial data. We'll demonstrate the seamless process of connecting to external systems such as ArcGIS to retrieve datasets, showcasing the platform's versatility in handling diverse data sources. We'll then delve into the power of Databricks Apps, illustrating how you can create custom geospatial dashboards using various frameworks like Streamlit and Flask, or any framework of your choice. This flexibility allows you to tailor your visualizations to your specific needs and preferences. Furthermore, we'll highlight the Databricks Lakehouse's integration capabilities with popular dashboarding tools such as Tableau and Power BI. This integration enables you to combine the robust data processing power of Databricks with the advanced visualization features of these specialized tools.

High-Throughput ML: Mastering Efficient Model Serving at Enterprise Scale

Ever wondered how industry leaders handle thousands of ML predictions per second? This session reveals the architecture behind high-performance model serving systems on Databricks. We'll explore how to build inference pipelines that efficiently scale to handle massive request volumes while maintaining low latency. You'll learn how to leverage Feature Store for consistent, low-latency feature lookups and implement auto-scaling strategies that optimize both performance and cost. Key takeaways: Determining optimal compute capacity using the QPS × model execution time formula Configuring Feature Store for high-throughput, low-latency feature retrieval Managing cold starts and scaling strategies for latency-sensitive applications Implementing monitoring systems that provide visibility into inference performance Whether you're serving recommender systems or real-time fraud detection models, you'll gain practical strategies for building enterprise-grade ML serving systems.

How HP Is Optimizing the 3D Printing Supply Chain Using Delta Sharing

HP’s 3D Print division empowers manufacturers with telemetry data to optimize operations and streamline maintenance. Using Delta Sharing, Unity Catalog and AI/BI dashboards, HP provides a secure, scalable solution for data sharing and analytics. Delta Sharing D2O enables seamless data access, even for customers not on Databricks. Apigee masks private URLs, and Unity Catalog enhances security by managing data assets. Predictive maintenance with Mosaic AI boosts uptime by identifying issues early and alerting support teams. Custom dashboards and sample code let customers run analytics using any supported client, while Apigee simplifies access by abstracting complexity. Insights from A/BI dashboards help HP refines data strategy, aligning solutions with customer needs despite the complexity of diverse technologies, fragmented systems and customer-specific requirements. This fosters trust, drives innovation,and strengthens HP as a trusted partner for scalable, secure data solutions.

IQVIA's Analytics for Patient Support Services: Transforming Scalability, Performance and Governance

This presentation will explore the transformation of IQVIA's decade-old patient support platform through the implementation of Databricks Data Intelligence Platform. Facing scalability challenges, performance bottlenecks and rising costs, the existing platform required significant redesign to handle growing data volumes and complex analytics. Key issues included static metrics limiting workflow optimization, fragmented data governance and heightened compliance and security demands. By partnering with Customertimes (a Databricks Partner) and adopting Databricks' centralized, scalable analytics solution with enhanced self-service capabilities, IQVIA achieved improved query performance, cost efficiency and robust governance, ensuring operational effectiveness and regulatory compliance in an increasingly complex environment.

Managing Data and AI Security Risks With DASF 2.0 — and a Customer Story

The Databricks Security team led a broad working group that significantly evolved the Databricks AI Security Framework (DASF) to its 2.0 version since its first release by closely collaborating with the top cyber security researchers at industry organizations such as OWASP, Gartner, NIST, HITRUST, FAIR Institute and several Fortune 100 companies to address the evolving risks and associated controls of AI systems in enterprises. Join us to to learn how The CLEVER GenAI pipeline, an AI-driven innovation in healthcare, processes over 1.5 million clinical notes daily to classify social determinants impacting veteran care while adhering to robust security measures like NIST 800-53 controls and by leveraging Databricks AI Security Framework. We will discuss robust AI security guidelines to help data and AI teams understand how to deploy their AI applications securely. This session will give a security framework for security teams, AI practitioners, data engineers and governance teams.

Real-Time Market Insights — Powering Optiver’s Live Trading Dashboard with Databricks Apps and Dash

In the fast-paced world of trading, real-time insights are critical for making informed decisions. This presentation explores how Optiver, a leading high-frequency trading firm, harnesses Databricks apps to power its live trading dashboards. The technology enables traders to analyze market data, detect patterns and respond instantly. In this talk, we will showcase how our system leverages Databricks’ scalable infrastructures such as Structured Streaming to efficiently handle vast streams of financial data while ensuring low-latency performance. In addition, we will show how the integration of Databricks apps with Dash has empowered traders to rapidly develop and deploy custom dashboards, minimizing dependency on developers. Attendees will gain insights into our architecture, data processing techniques and lessons learned in integrating Databricks apps with Dash in order to drive rapid, data-driven trading decisions.

ServiceNow ‘Walks the Talk’ With Databricks: Revolutionizing Go-To-Market With AI

At ServiceNow, we’re not just talking about AI innovation — we’re delivering it. By harnessing the power of Databricks, we’re reimagining Go-To-Market (GTM) strategies, seamlessly integrating AI at every stage of the deal journey — from identifying high-value leads to generating hyper-personalized outreach and pitch materials. In this session, learn how we’ve slashed data processing times by over 90%, reducing workflows from an entire day to just 30 minutes with Databricks. This unprecedented speed enables us to deploy AI-driven GTM initiatives faster, empowering our sellers with real-time insights that accelerate deal velocity and drive business growth. As Agentic AI becomes a game-changer in enterprise GTM, ServiceNow and Databricks are leading the charge — paving the way for a smarter, more efficient future in AI-powered sales.

Sponsored by: Deloitte | Advancing AI in Cybersecurity with Databricks & Deloitte: Data Management & Analytics

Deloitte is observing a growing trend among cybersecurity organizations to develop big data management and analytics solutions beyond traditional Security Information and Event Management (SIEM) systems. Leveraging Databricks to extend these SIEM capabilities, Deloitte can help clients lower the cost of cyber data management while enabling scalable, cloud-native architectures. Deloitte helps clients design and implement cybersecurity data meshes, using Databricks as a foundational data lake platform to unify and govern security data at scale. Additionally, Deloitte extends clients’ cybersecurity capabilities by integrating advanced AI and machine learning solutions on Databricks, driving more proactive and automated cybersecurity solutions. Attendees will gain insight into how Deloitte is utilizing Databricks to manage enterprise cyber risks and deliver performant and innovative analytics and AI insights that traditional security tools and data platforms aren’t able to deliver.

SQL-First ETL: Building Easy, Efficient Data Pipelines With Lakeflow Declarative Pipelines

This session explores how SQL-based ETL can accelerate development, simplify maintenance and make data transformation more accessible to both engineers and analysts. We'll walk through how Databricks Lakeflow Declarative Pipelines and Databricks SQL warehouse support building production-grade pipelines using familiar SQL constructs.Topics include: Using streaming tables for real-time ingestion and processing Leveraging materialized views to deliver fast, pre-computed datasets Integrating with tools like dbt to manage batch and streaming workflows at scale By the end of the session, you’ll understand how SQL-first approaches can streamline ETL development and support both operational and analytical use cases.

Unifying Data Delivery: Using Databricks as Your Enterprise Serving Layer

This session will take you on our journey of integrating Databricks as the core serving layer in a large enterprise, demonstrating how you can build a unified data platform that meets diverse business needs. We will walk through the steps for constructing a central serving layer by leveraging Databricks’ SQL Warehouse to efficiently deliver data to analytics tools and downstream applications. To tackle low latency requirements, we’ll show you how to incorporate an interim scalable relational database layer that delivers sub-second performance for hot data scenarios. Additionally, we’ll explore how Delta Sharing enables secure and cost-effective data distribution beyond your organization, eliminating silos and unnecessary duplication for a truly end-to-end centralized solution. This session is perfect for data architects, engineers and decision-makers looking to unlock the full potential of Databricks as a centralized serving hub.

Red Stapler is a streaming-native system on Databricks that merges file-based ingestion and real-time user edits into one Lakeflow Declarative Pipelines for near real-time feedback. Protobuf definitions, managed in the Buf Schema Registry (BSR), govern schema and data-quality rules, ensuring backward compatibility. All records — valid or not — are stored in an SCD Type 2 table, capturing every version for full history and immediate quarantine views of invalid data. This unified approach boosts data governance, simplifies auditing and streamlines error fixes.Running on Lakeflow Declarative Pipelines Serverless and the Kafka-compatible Bufstream keeps costs low by scaling down to zero when idle. Red Stapler’s configuration-driven Protobuf logic adapts easily to evolving survey definitions without risking production. The result is consistent validation, quick updates and a complete audit trail — all critical for trustworthy, flexible data pipelines.

Unity Catalog Upgrades Made Easy. Step-by-Step Guide for Databricks Labs UCX

The Databricks labs project UCX aims to optimize the Unity Catalog (UC) upgrade process, ensuring a seamless transition for businesses. This session will delve into various aspects of the UCX project including the installation and configuration of UCX, the use of the UCX Assessment Dashboard to reduce upgrade risks and prepare effectively for a UC upgrade, and the automation of key components such as group, table and code migration. Attendees will gain comprehensive insights into leveraging UCX and Lakehouse Federation for a streamlined and efficient upgrade process. This session is aimed at customers new to UCX as well as veterans.