talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

509

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Data + AI Summit 2025 ×
Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

DoorDash Data organization actively adopts LakeHouse paradigm. This presentation describes the methodology which allows to migrate the classic Data Warehouse and Data Lake platforms to unified LakeHouse solution.The objective of this effort include Elimination of excessive data movement.Seamless integration and consolidation of the query engine layers, including Snowflake, Databricks, EMR and Trino.Query performance optimization.Abstracting away complexity of underlying storage layers and table formatsStrategic and justified decision on the Unified Metadata catalog used across varios compute platforms

Optimizing Analytics Infrastructure: Lessons from Migrating Snowflake to Databricks

This session explores the strategic migration from Snowflake to Databricks, focusing on the journey of transforming a data lake to leverage Databricks’ advanced capabilities. It outlines the assessment of key architectural differences, performance benchmarks, and cost implications driving the decision. Attendees will gain insights into planning and execution, including data ingestion pipelines, schema conversion and metadata migration. Challenges such as maintaining data quality, optimizing compute resources and minimizing downtime are discussed, alongside solutions implemented to ensure a seamless transition. The session highlights the benefits of unified analytics and enhanced scalability achieved through Databricks, delivering actionable takeaways for similar migrations.

Scaling Identity Graph Ingestion to 1M Events/Sec with Spark Streaming & Delta Lake

Adobe’s Real-Time Customer Data Platform relies on the identity graph to connect over 70 billion identities and deliver personalized experiences. This session will showcase how the platform leverages Databricks, Spark Streaming and Delta Lake, along with 25+ Databricks deployments across multiple regions and clouds — Azure & AWS — to process terabytes of data daily and handle over a million records per second. The talk will highlight the platform’s ability to scale, demonstrating a 10x increase in ingestion pipeline capacity to accommodate peak traffic during events like the Super Bowl. Attendees will learn about the technical strategies employed, including migrating from Flink to Spark Streaming, optimizing data deduplication, and implementing robust monitoring and anomaly detection. Discover how these optimizations enable Adobe to deliver real-time identity resolution at scale while ensuring compliance and privacy.

Semiconductor AI Success: Marvell’s Data + AI Governance

Marvell’s AI-driven solutions, powered by Databricks’ Data Intelligence Platform, provide a robust framework for secure, compliant and transparent Data and AI workflows leveraging Data & AI Governance through Unity Catalog. Marvell ensures centralized management of data and AI assets with quality, security, lineage and governance guardrails. With Databricks Unity Catalog, Marvell achieves comprehensive oversight of structured and unstructured data, AI models and notebooks. Automated governance policies, fine-grained access controls and lineage tracking help enforce regulatory compliance while streamlining AI development. This governance framework enhances trust and reliability in AI-powered decision-making, enabling Marvell to scale AI innovation efficiently while minimizing risks. By integrating data security, auditability and compliance standards, Marvell is driving the future of responsible AI adoption with Databricks.

Supercharge Your Enterprise BI: A Practitioner’s Guide for Migrating to AI/BI

Are you striving to build a data-driven culture while managing costs and reducing reporting latency? Are your BI operations bogged down by complex data movements rather than delivering insights? Databricks IT faced these challenges in 2024 and embarked on an ambitious journey to make Databricks AI/BI our enterprise-wide reporting platform. In just two quarters, we migrated 2,000 dashboards from a traditional BI tool — without disrupting business operations. We’ll share how we executed this large-scale transition cost-effectively, ensuring seamless change management and empowering non-technical users to leverage AI/BI. You’ll gain insights into: Key migration strategies that minimized disruption and optimized efficiency Best practices for user adoption and training to drive self-service analytics Measuring success with clear adoption metrics and business impact Join us to learn how your organization can achieve the same transformation with AI-powered enterprise reporting.

Transforming Customer Processes and Gaining Productivity With Lakeflow Declarative Pipelines

Bradesco Bank is one of the largest private banks in Latin America, with over 75 million customers and over 80 years of presence in FSI. In the digital business, velocity to react to customer interactions is crucial to succeed. In the legacy landscape, acquiring data points on interactions over digital and marketing channels was complex, costly and lacking integrity due to typical fragmentation of tools. With the new in-house Customer Data Platform powered by Databricks Intelligent Platform, it was possible to completely transform the data strategy around customer data. Using some key components such Uniform and Lakeflow Declarative Pipelines, it was possible to increase data integrity, reduce latency and processing time and, most importantly, boost personal productivity and business agility. Months of reprocessing, weeks of human labor and cumbersome and complex data integrations were dramatically simplified achieving significant operational efficiency.

Unifying GTM Analytics: The Strategic Shift to Native Analytics and AI/BI Dashboards at Databricks

The GTM team at Databricks recently launched the GTM Analytics Hub—a native AI/BI platform designed to centralize reporting, streamline insights, and deliver personalized dashboards based on user roles and business needs. Databricks Apps also played a crucial role in this integration by embedding AI/BI Dashboards directly into internal tools and applications, streamlining access to insights without disrupting workflows. This seamless embedding capability allows users to interact with dashboards within their existing platforms, enhancing productivity and collaboration. Furthermore, AI/BI Dashboards leverage Databricks' unified data and governance framework. Join us to learn how we’re using Databricks to build for Databricks—transforming GTM analytics with AI/BI Dashboards, and what it takes to drive scalable, user-centric analytics adoption across the business.

Unlock the Potential of Your Enterprise Data With Zero-Copy Data Sharing, featuring SAP and Salesforce

Tired of data silos and the constant need to move copies of your data across different systems? Imagine a world where all your enterprise data is readily available in Databricks without the cost and complexity of duplication and ingestion. Our vision is to break down these silos by enabling seamless, zero-copy data sharing across platforms, clouds, and regions. This unlocks the true potential of your data for analytics and AI, empowering you to make faster, more informed decisions leveraging your most important enterprise data sets. This session you will hear from Databricks, SAP, and Salesforce product leaders on how zero-copy data sharing can unlock the value of enterprise data. Explore how Delta Sharing makes this vision a reality, providing secure, zero-copy data access for enterprises.SAP Business Data Cloud: See Delta Sharing in action to unlock operational reporting, supply chain optimization, and financial planning. Salesforce Data Cloud: Enable customer analytics, churn prediction, and personalized marketing.

Master Schema Translations in the Era of Open Data Lake

Unity Catalog puts variety of schemas into a centralized repository, now the developer community wants more productivity and automation for schema inference, translation, evolution and optimization especially for the scenarios of ingestion and reverse-ETL with more code generations.Coinbase Data Platform attempts to pave a path with "Schemaster" to interact with data catalog with the (proposed) metadata model to make schema translation and evolution more manageable across some of the popular systems, such as Delta, Iceberg, Snowflake, Kafka, MongoDB, DynamoDB, Postgres...This Lighting Talk covers 4 areas: The complexity and caveats of schema differences among The proposed field-level metadata model, and 2 translation patterns: point-to-point vs hub-and-spoke Why Data Profiling be augmented to enhance schema understanding and translation Integrate it with Ingestion & Reverse-ETL in a Databricks-oriented eco system Takeaway: standardize schema lineage & translation

Multi-Statement Transactions: How to Improve Data Consistency and Performance

Multi-statement transactions bring the atomicity and reliability of traditional databases to modern data warehousing on the lakehouse. In this session, we’ll explore real-world patterns enabled by multi-statement transactions — including multi-table updates, deduplication pipelines and audit logging — and show how Databricks ensures atomicity and consistency across complex workflows. We’ll also dive into demos and share tips to getting started and migrations with this feature in Databricks SQL.

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

AI agent systems hold immense promise for automating complex tasks and driving intelligent decision‑making, but only when they are engineered to be both resilient and transparent. In this session we will explore how Dataiku’s LLM Mesh pairs with Databricks Mosaic AI to streamline the entire lifecycle: ingesting and preparing data in the Lakehouse, prompt engineering LLMs hosted on Mosaic AI Model Serving Endpoints, visually orchestrating multi‑step chains, and monitoring them in real time. We’ll walk through a live demo of a Dataiku flow that connects to a Databricks hosted model, adds automated validation, lineage, and human‑in‑the‑loop review, then exposes the agent via Dataiku's Agent Connect interface. You’ll leave with actionable patterns for setting guardrails, logging decisions, and surfacing explanations—so your organization can deploy trustworthy domain‑specific agents faster & safer.

Sponsored by: Hightouch | Earning $66 Million with AI: How XP Overhauled Customer Acquisition with Databricks and Hightouch

XP is one of the largest financial institutions in Brazil– and they didn’t reach that scale by moving slowly. After getting inspired at Data + AI Summit in 2024, they sprang into action to overhaul their advertising strategy using their first-party data in Databricks. Just a year later, they’ve achieved remarkable success: they’ve unlocked $66 million in incremental revenue from advertising, with the same budget as before. In this session, XP will share the tactical steps they took to bring a first-party data and AI strategy to customer acquisition– including how they built predictive models for customer quality and connected Databricks to their ecosystem through Hightouch’s Composable CDP. If you’re supporting an advertising team or are looking for real strategies to take home from this conference that can transform your business: this session is for you.

Streamlining AI Application Development With Databricks Apps

Think Databricks is just for data and models? Think again. In this session, you’ll see how to build and scale a full-stack AI app capable of handling thousands of queries per second entirely on Databricks. No extra cloud platforms, no patchwork infrastructure. Just one unified platform with native hosting, LLM integration, secure access, and built-in CI/CD. Learn how Databricks Apps, along with services like Model Serving, Jobs, and Gateways, streamline your architecture, eliminate boilerplate, and accelerate development, from prototype to production.

Unlock Agentic AI for Insurance With Deloitte & Databricks

In an era where insights-driven decision-making is paramount, the insurance industry stands at the cusp of a major technological revolution. This session will delve into how Agentic AI — AI agents act autonomously to achieve critical goals — can be leveraged to transform insurance operation (underwriting, claims, services), enhance customer experiences and drive strategic growth.

SAP and Databricks: Building Your Lakehouse Reference Architecture

SAP is the world's 3rd-largest publicly traded software company by revenue, and recently launched the joint SAP Databricks "Business Data Cloud". See how it all works from a practitioner's perspective, including reference architecture, demo, and example customers. See firsthand how the powerful suite of SAP applications benefits from a joint Databricks solution - with data being more easily governed, discovered, shared, and used for AI/ML..

Sponsored by: Prophecy | Ready for GenAI? Survey Says Governed Self-Service Is the New Playbook for Data Teams

Are data teams ready for AI? Prophecy’s exclusive survey, “The Impact of GenAI on Data Teams”, gives the clearest picture yet of GenAI’s potential in data management, and what’s standing in the way. The top two obstacles? Poor governance and slow access to high-quality data. The message is clear: Modernizing your data platform with Databricks is essential. But it’s only the beginning. To unlock the power of AI and analytics, organizations must deliver governed, self-service access to clean, trusted data. Traditional data prep tools introduce risks around security, quality, and cost. It’s no wonder data leaders cited data transformation as the area where GenAI will make the biggest impact. To deliver what’s needed teams need a shift to governed self-service. Data analysts and scientists move fast while staying within IT’s guardrails. Join us to learn more details from the survey and how leading organizations are ahead of the curve, using GenAI to reshape how data gets done.

Join us to see how the powerful combination of ThoughtSpot's agentic analytics platform and the Databricks Data Intelligence Platform is changing the game for data-driven organizations. We'll demonstrate how DataSpot breaks down technical barriers to insight. You'll learn how to get trusted, real-time answers thanks to the seamless integration between ThoughtSpot's semantic layer and Databricks Unity Catalog. This session is for anyone looking to leverage data more effectively, whether you're a business leader seeking AI-driven insights, a data scientist building models in Python, or a product owner creating intelligent applications.