talk-data.com talk-data.com

Topic

Data Lakehouse

data_architecture data_warehouse data_lake

489

tagged

Activity Trend

118 peak/qtr
2020-Q1 2026-Q1

Activities

489 activities · Newest first

This course introduces learners to deploying, operationalizing, and monitoring generative artificial intelligence (AI) applications. First, learners will develop knowledge and skills in deploying generative AI applications using tools like Model Serving. Next, the course will discuss operationalizing generative AI applications following modern LLMOps best practices and recommended architectures. Finally, learners will be introduced to the idea of monitoring generative AI applications and their components using Lakehouse Monitoring. Pre-requisites: Familiarity with prompt engineering and retrieval-augmented generation (RAG) techniques, including data preparation, embeddings, vectors, and vector databases. A foundational knowledge of Databricks Data Intelligence Platform tools for evaluation and governance (particularly Unity Catalog). Labs: Yes Certification Path: Databricks Certified Generative AI Engineer Associate

This course will guide participants through a comprehensive exploration of machine learning model operations, focusing on MLOps and model lifecycle management. The initial segment covers essential MLOps components and best practices, providing participants with a strong foundation for effectively operationalizing machine learning models. In the latter part of the course, we will delve into the basics of the model lifecycle, demonstrating how to navigate it seamlessly using the Model Registry in conjunction with the Unity Catalog for efficient model management. By the course's conclusion, participants will have gained practical insights and a well-rounded understanding of MLOps principles, equipped with the skills needed to navigate the intricate landscape of machine learning model operations. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. understanding of basic MLOps concepts and practices as well as infrastructure and importance of monitoring MLOps solutions) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate

Site to Insight: Powering Construction Analytics Through Delta Sharing

At Procore, we're transforming the construction industry through innovative data solutions. This session unveils how we've supercharged our analytics offerings using a unified lakehouse architecture and Delta Sharing, delivering game-changing results for our customers and our business and how data professionals can unlock the full potential of their data assets and drive meaningful business outcomes. Key highlights: Learn how we've implemented seamless, secure sharing of large datasets across various BI tools and programming languages, dramatically accelerating time-to-insights for our customers Discover our approach to sharing dynamically filtered subsets of data across our numerous customers with cross-platform view sharing We'll demonstrate how our architecture has eliminated the need for data replication, fostering a more efficient, collaborative data ecosystem

Bridging Ontologies & Lakehouses: Palantir AIP + Databricks for Secure Autonomous AI

AI is moving from pilots to production, but many organizations still struggle to connect boardroom ambitions with operational reality. Palantir’s Artificial Intelligence Platform (AIP) and the Databricks Data Intelligence Platform now form a single, open architecture that closes this gap by pairing Palantir’s operational decision empowering Ontology- with Databricks’ industry-leading scale, governance and Lakehouse economics. The result: real-time, AI-powered, autonomous workflows that are already powering mission-critical outcomes for the U.S. Department of Defense, bp and other joint customers across the public and private sectors. In this technically grounded but business-focused session you will see the new reference architecture in action. We will walk through how Unity Catalog and Palantir Virtual Tables provide governed, zero-copy access to Lakehouse data and back mission-critical operational workflows on top of Palantir’s semantic ontology and agentic AI capabilities. We will also explore how Palantir’s no-code and pro-code tooling integrates with Databricks compute to orchestrate builds and write tables to Unity Catalog. Come hear from customers currently using this architecture to drive critical business outcomes seamlessly across Databricks and Palantir.

From Datavault to Delta Lake: Streamlining Data Sync with Lakeflow Connect

In this session, we will explore the Australian Red Cross Lifeblood's approach to synchronizing an Azure SQL Datavault 2.0 (DV2.0) implementation with Unity Catalog (UC) using Lakeflow Connect. Lifeblood's DV2.0 data warehouse, which includes raw vault (RV) and business vault (BV) tables, as well as information marts defined as views, required a multi-step process to achieve data/business logic sync with UC. This involved using Lakeflow Connect to ingest RV and BV data, followed by a custom process utilizing JDBC to ingest view definitions, and the automated/manual conversion of T-SQL to Databricks SQL views, with Lakehouse Monitoring for validation. In this talk, we will share our journey, the design decisions we made, and how the resulting solution now supports analytics workloads, analysts, and data scientists at Lifeblood.

Lakeflow Declarative Pipelines Integrations and Interoperability: Get Data From — and to — Anywhere

This session is repeated.In this session, you will learn how to integrate Lakeflow Declarative Pipelines with external systems in order to ingest and send data virtually anywhere. Lakeflow Declarative Pipelines is most often used in ingestion and ETL into the Lakehouse. New Lakeflow Declarative Pipelines capabilities like the Lakeflow Declarative Pipelines Sinks API and added support for Python Data Source and ForEachBatch have opened up Lakeflow Declarative Pipelines to support almost any integration. This includes popular Apache Spark™ integrations like JDBC, Kafka, External and managed Delta tables, Azure CosmosDB, MongoDB and more.

Swimming at Our Own Lakehouse: How Databricks Uses Databricks

This session is repeated. Peek behind the curtain to learn how Databricks processes hundreds of petabytes of data across every region and cloud where we operate. Learn how Databricks leverages Data and AI to scale and optimize every aspect of the company. From facilities and legal to sales and marketing and of course product research and development. This session is a high-level tour inside Databricks to see how Data and AI enable us to be a better company. We will go into the architecture of things for how Databricks is used for internal use cases like business analytics and SIEM as well as customer-facing features like system tables and assistant. We will cover how data production of our data flow and how we maintain security and privacy while operating a large multi-cloud, multi-region environment.

Trust You Can Measure: Data Quality Standards in The Lakehouse

Do you trust your data? If you’ve ever struggled to figure out which datasets are reliable, well-governed, or safe to use, you’re not alone. At Databricks, our own internal lakehouse faced the same challenge—hundreds of thousands of tables, but no easy way to tell which data met quality standards. In this talk, the Databricks Data Platform team shares how we tackled this problem by building the Data Governance Score—a way to systematically measure and surface trust signals across the entire lakehouse. You’ll learn how we leverage Unity Catalog, governed tags, and enforcement to drive better data decisions at scale. Whether you're a data engineer, platform owner, or business leader, you’ll leave with practical ideas on how to raise the bar for data quality and trust in your own data ecosystem.

The JLL Training and Upskill Program for Our Warehouse Migration to Databricks

Databricks Odyssey is JLL’s bespoke training program designed to upskill and prepare data professionals for a new world of data lakehouse. Based on the concepts of learn, practice and certify, participants earn points, moving through five levels by completing activities with business application of Databricks key features. Databricks Odyssey facilitates cloud data warehousing migration by providing best practice frameworks, ensuring efficient use of pay-per-compute platforms. JLL/T Insights and Data fosters a data culture through learning programs that develop in-house talent and create career pathways. Databricks Odyssey offers: JLL-specific hands-on learning Gamified 'level up' approach Practical, applicable skills Benefits include: Improved platform efficiency Enhanced data accuracy and client insights Ongoing professional development Potential cost savings through better utilization

Accelerating Model Development and Fine-Tuning on Databricks with TwelveLabs

Scaling large language models (LLMs) and multimodal architectures requires efficient data management and computational power. NVIDIA NeMo Framework Megatron-LM on Databricks is an open source solution that integrates GPU acceleration and advanced parallelism with Databricks Delta Lakehouse, streamlining workflows for pre-training and fine-tuning models at scale. This session highlights context parallelism, a unique NeMo capability for parallelizing over sequence lengths, making it ideal for video datasets with large embeddings. Through the case study of TwelveLabs’ Pegasus-1 model, learn how NeMo empowers scalable multimodal AI development, from text to video processing, setting a new standard for LLM workflows.

Petabyte-Scale On-Chain Insights: Real-Time Intelligence for the Next-Gen Financial Backbone

We’ll explore how CipherOwl Inc. constructed a near real-time, multi-chain data lakehouse to power anti-money laundering (AML) monitoring at a petabyte scale. We will walk through the end-to-end architecture, which integrates cutting-edge open-source technologies and AI-driven analytics to handle massive on-chain data volumes seamlessly. Off-chain intelligence complements this to meet rigorous AML requirements. At the core of our solution is ChainStorage, an OSS started by Coinbase that provides robust blockchain data ingestion and block-level serving. We enhanced it with Apache Spark™ and Arrow™, coupled for high-throughput processing and efficient data serialization, backed by Delta Lake and Kafka. For the serving layer, we employ StarRocks to deliver lightning-fast SQL analytics over vast datasets. Finally, our system incorporates machine learning and AI agents for continuous data curation and near real-time insights, which are crucial for tackling on-chain AML challenges.

AI and Genie: Analyzing Healthcare Improvement Opportunities

This session is repeated. Improving healthcare impacts us all. We highlight how Premier Inc. took risk-adjusted patient data from more than 1,300 member hospitals across America, applying a natural language interface using AI/BI Genie, allowing our users to discover new insights. The stakes are high, new insights surfaced represent potential care improvement and lives positively impacted. Using Genie and our AI-ready data in Unity Catalog, our team was able to stand up a Genie instance in three short days, bypassing costs and time of custom modeling and application development. Additionally, Genie allowed our internal teams to generate complex SQL, as much as 10 times faster than writing it by hand. As Genie and lakehouse apps continue to advance rapidly, we are excited to leverage these features by introducing Genie to as many as 20,000 users across hundreds of hospitals. This will support our members’ ongoing mission to enhance the care they provide to the communities they serve.

Apache Iceberg with Unity Catalog at HelloFresh

Table formats like Delta Lake and Iceberg have been game changers for pushing lakehouse architecture into modern Enterprises. The acquisition of Tabular added Iceberg to the Databricks ecosystem, an open format that was already well supported by processing engines across the industry. At HelloFresh we are building a lakehouse architecture that integrates many touchpoints and technologies all across the organization. As such we chose Iceberg as the table format to bridge the gaps in our decentralized managed tech landscape. We are leveraging Unity Catalog as the Iceberg REST catalog of choice for storing metadata and managing tables. In this talk we will outline our architectural setup between Databricks, Spark, Flink and Snowflake and will explain the native Unity Iceberg REST catalog, as well as catalog federation towards connected engines. We will highlight the impact on our business and discuss the advantages and lessons learned from our early adopter experience.

Crafting Business Brilliance: Leveraging Databricks SQL for Next-Gen Applications

At Haleon, we've leveraged Databricks APIs and serverless compute to develop customer-facing applications for our business. This innovative solution enables us to efficiently deliver SAP invoice and order management data through front-end applications developed and served via our API Gateway. The Databricks lakehouse architecture has been instrumental in eliminating the friction associated with directly accessing SAP data from operational systems, while enhancing our performance capabilities. Our system acheived response times of less than 3 seconds from API call, with ongoing efforts to optimise this performance. This architecture not only streamlines our data and application ecosystem but also paves the way for integrating GenAI capabilities with robust governance measures for our future infrastructure. The implementation of this solution has yielded significant benefits, including a 15% reduction in customer service costs and a 28% increase in productivity for our customer support team.

Reimagining Data Governance and Access at Atlassian

Atlassian is rebuilding its central lakehouse from the ground up to deliver a more secure, flexible and scalable data environment. In this session, we’ll share how we leverage Unity Catalog for fine-grained governance and supplement it with Immuta for dynamic policy management, enabling row and column level security at scale. By shifting away from broad, monolithic access controls toward a modern, agile solution, we’re empowering teams to securely collaborate on sensitive data without sacrificing performance or usability. Join us for an inside look at our end-to-end policy architecture, from how data owners declare metadata and author policies to the seamless application of access rules across the platform. We’ll also discuss lessons learned on streamlining data governance, ensuring compliance, and improving user adoption. Whether you’re a data architect, engineer or leader, walk away with actionable strategies to simplify and strengthen your own governance and access practices.

Revolutionizing Cybersecurity: SCB's Journey to a Self-Managed SIEM

Join us to explore how Standard Chartered Bank's (SCB) groundbreaking strategy is reshaping the future of the cybersecurity landscape by replacing traditional SIEM with a cutting-edge Databricks solution, achieving remarkable business outcomes: 80% Reduction in time to detect incidents 92% Faster threat investigation 35% Cost reduction 60% Better detection accuracy Significant enhancements in threat detection and response metrics Substantial increase in ML-driven use cases This session unveils SCB's journey to a distributed, multi-cloud lakehouse architecture that unlocks unprecedented performance and commercial optimization. Explore why a unified data and AI platform is becoming the cornerstone of next-generation, self-managed SIEM solutions for forward-thinking organizations in this era of AI-powered banking transformation.

Sponsored by: Accenture & Avanade | Enterprise Data Journey for The Standard Insurance Leveraging Databricks on Azure and AI Innovation

Modern insurers require agile, integrated data systems to harness AI. This framework for a global insurer uses Azure Databricks to unify legacy systems into a governed lakehouse medallion architecture (bronze/silver/gold layers), eliminating silos and enabling real-time analytics. The solution employs: Medallion architecture for incremental data quality improvement. Unity Catalog for centralized governance, row/column security, and audit compliance. Azure encryption/confidential computing for data mesh security. Automated ingestion/semantic/DevOps pipelines for scalability. By combining Databricks’ distributed infrastructure with Azure’s security, the insurer achieves regulatory compliance while enabling AI-driven innovation (e.g., underwriting, claims). The framework establishes a future-proof foundation for mergers/acquisitions (M&A) and cross-functional data products, balancing governance with agility.

Sponsored by: Domo | Behind the Brand: How Sol de Janeiro Powers Amazon Ops with Databricks + DOMO

How does one of the world’s fastest-growing beauty brands stay ahead of Amazon’s complexity and scale retail with precision? At Sol de Janeiro, we built a real-time Amazon Operations Hub—powered by Databricks and DOMO—that drives decisions across inventory, profitability, and marketing ROI. See how the Databricks Lakehouse and DOMO dashboards work together to simplify workflows, surface actionable insights, and enable smarter decisions across the business—from frontline operators to the executive suite. In this session, you’ll get a behind-the-scenes look at how we unified trillions of rows from NetSuite, Amazon, Shopify, and carrier systems into a single source of truth. We’ll show how this hub streamlined cross-functional workflows, eliminated manual reporting, and laid the foundation for AI-powered forecasting and automation.

Data Modeling 101 for Data Lakehouse Demystified

This session is repeated. In today’s data-driven world, the Data Lakehouse has emerged as a powerful architectural paradigm that unifies the flexibility of data lakes with the reliability and structure of traditional data warehouses. However, organizations must adopt the right data modeling techniques to unlock its full potential to ensure scalability, maintainability and efficiency. This session is designed for beginners looking to demystify the complexities of data modeling for the lakehouse and make informed design decisions. We’ll break down Medallion Architecture, explore key data modeling techniques and walk through the maturity stages of a successful data platform — transitioning from raw, unstructured data to well-organized, query-efficient models.

Delta Lake and the Data Mesh

Delta Lake has proven to be an excellent storage format. Coupled with the Databricks platform, the storage format has shined as a component of a distributed system on the lakehouse. The pairing of Delta and Spark provides an excellent platform, but users often struggle to perform comparable work outside of the Spark ecosystem. Tools such as delta-rs, Polars and DuckDb have brought access to users outside of Spark, but they are only building blocks of a larger system. In this 40-minute talk we will demonstrate how users can use data products on the Nextdata OS data mesh to interact with the Databricks platform to drive Delta Lake workflows. Additionally, we will show how users can build autonomous data products that interact with their Delta tables both inside and outside of the lakehouse platform. Attendees will learn how to integrate the Nextdata OS data mesh with the Databricks platform as both an external and integral component.