talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

Deploying Databricks Asset Bundles (DABs) at Scale

This session is repeated.Managing data and AI workloads in Databricks can be complex. Databricks Asset Bundles (DABs) simplify this by enabling declarative, Git-driven deployment workflows for notebooks, jobs, Lakeflow Declarative Pipelines, dashboards, ML models and more.Join the DABs Team for a Deep Dive and learn about:The Basics: Understanding Databricks asset bundlesDeclare, define and deploy assets, follow best practices, use templates and manage dependenciesCI/CD & Governance: Automate deployments with GitHub Actions/Azure DevOps, manage Dev vs. Prod differences, and ensure reproducibilityWhat’s new and what's coming up! AI/BI Dashboard support, Databricks Apps support, a Pythonic interface and workspace-based deploymentIf you're a data engineer, ML practitioner or platform architect, this talk will provide practical insights to improve reliability, efficiency and compliance in your Databricks workflows.

Empowering Fundraising With AI: A Journey With Databricks Mosaic AI

Artificial Intelligence (AI) is more than a corporate tool; it’s a force for good. At Doctors Without Borders/Médecins Sans Frontières (MSF), we use AI to optimize fundraising, ensuring that every dollar raised directly supports life-saving medical aid worldwide. With Databricks, Mosaic AI and Unity Catalog, we analyze donor behavior, predict giving patterns and personalize outreach, increasing contributions while upholding ethical AI principles. This session will showcase how AI maximizes fundraising impact, enabling faster crisis response and resource allocation. We’ll explore predictive modeling for donor engagement, secure AI governance with Unity Catalog and our vision for generative AI in fundraising, leveraging AI-assisted storytelling to deepen donor connections. AI is not just about efficiency; it’s about saving lives. Join us to see how AI-driven fundraising is transforming humanitarian aid on a global scale.

Federated Data Analytics Platform

Are you struggling to keep up with rapid business changes that demand constant updates to your data pipelines? Is your data engineering team growing rapidly just to manage this complexity? Databricks was not immune to this challenge either. Managing our BI with contributions from hundreds of Product Engineering Teams across the company while maintaining central oversight and quality posed significant hurdles. Join us to learn how we developed a config-driven data pipeline framework using Metric Store and UC Metrics that helped us reduce engineering effort — achieving the work of 100 classical data engineers with just two platform engineers.

Getting Data AI Ready: Testimonial of Good Governance Practices Constructing Accurate Genie Spaces

Genie Rooms have played an integral role in democratizing important datasets like Cell Tower and Lease Information. However, in order to ensure that this exciting new release from Databricks was configured as optimally as possible from development to deployment, we needed additional scaffolding around governance. In this talk we will describe the four main components we used in conjunction with the Genie Room to build a successful product and will provide generalizable lessons to help others get the most out of this object. At the core are a declarative, metadata approach to creating UC tables deployed on a robust framework. Second, a platform that efficiently crowdsourced targeted feedback from different user groups. Third, a tool that balances the LLM’s creativity with human wisdom. And finally, a platform that enforces our principle of separating Storage from Compute to manage access to the room at a fine-grained level and enables a whole host of interesting use-cases.

Graph-Powered Observability Data Analysis in Databricks With Credential Vending

Observability data — logs, metrics, and traces — captures the complex interactions within modern distributed systems. A graph query engine on top of Databricks enables complex traversal of massive observability data, helping users trace service dependencies, analyze upstream/downstream impacts, and uncover recurring error patterns, making it easier to diagnose issues and optimize system performance. A critical challenge in handling observability data is managing dynamic RBAC for the sensitive system telemetry. This session explains how Coinbase leverages credential vending, a method for issuing short-lived credentials to enable fine-grained, secure access to observability data stored in Databricks without long-lived secrets. Key takeaways: Querying Databricks tables as graph structures without ETLing data out Secure access management with credential vending Practical graph-based incident analysis solution at Coinbase, with insights on how PuppyGraph enables this application Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Harnessing Databricks for Advanced LLM Time-Series Models in Healthcare Forecasting

This research introduces a groundbreaking method for healthcare time-series forecasting using a Large Language Model (LLM) foundation model. By leveraging a comprehensive dataset of over 50 million IQVIA time-series trends, which includes data on procedure demands, sales and prescriptions (TRx), alongside publicly available data spanning two decades, the model aims to significantly enhance predictive accuracy in various healthcare applications. The model's transformer-based architecture incorporates self-attention mechanisms to effectively capture complex temporal dependencies within historical time-series trends, offering a sophisticated approach to understanding patterns, trends and cyclical variations.

Harnessing Real-Time Data and AI for Retail Innovation

This talk explores using advanced data processing and generative AI techniques to revolutionize the retail industry. Using Databricks, we will discuss how cutting-edge technologies enable real-time data analysis and machine learning applications, creating a powerful ecosystem for large-scale, data-driven retail solutions. Attendees will gain insights into architecting scalable data pipelines for retail operations and implementing advanced analytics on streaming customer data. Discover how these integrated technologies drive innovation in retail, enhancing customer experiences, streamlining operations and enabling data-driven decision-making. Learn how retailers can leverage these tools to gain a competitive edge in the rapidly evolving digital marketplace, ultimately driving growth and adaptability in the face of changing consumer behaviors and market dynamics.

How Danone Enhanced Global Data Sharing with Delta Sharing

Learn how Danone, a global leader in the food industry, improved its data-sharing processes using Delta Sharing, an open protocol developed by Databricks. This session will explore how Danone migrated from a traditional hub-and-spoke model to a more efficient and scalable data-sharing approach that works seamlessly across regions and platforms. We’ll discuss practical concepts such as in-region and cross-region data sharing, fine-grained access control, data discovery, and the implementation of data contracts. You’ll also hear about the strategies Danone uses to deliver governed data efficiently while maintaining compliance with global regulations. Additionally, we’ll discuss a cost comparison between direct data access and replication. Finally, we’ll share insights into the challenges faced by global organizations in managing data sharing at scale and how Danone addressed these issues. Attendees will gain practical knowledge on building a reliable and secure data-sharing framework for international collaboration.

Implementing GreenOps in Databricks: A Practical Guide for Regulated Environments

Join us on a technical journey into GreenOps at ABN AMRO Bank using Databricks system tables. We'll explore security, implementation challenges and best-practice verification, with practical examples and actionable reports. Discover how to optimize resource usage, ensure compliance and maintain agility. We'll discuss best practices, potential pitfalls and the nuanced 'it depends' scenarios, offering a comprehensive guide for intermediate to advanced practitioners.

Italgas’ AI Factory and the Future of Gas Distribution

At Italgas, Europe’s leading gas distributor both by network size and number of customers, we are spearheading digital transformation through a state-of-the-art, fully-fledged Databricks Intelligent platform. Achieved 50% cost reduction and 20% performance boost migrating from Azure Synapse to Databricks SQL Deployed 41 ML/GenAI models in production, with 100% of workloads governed by Unity Catalog Empowered 80% of employees with self-service BI through Genie Dashboards Enabled natural language queries for control-room operators analyzing network status The future of gas distribution is data-driven: predictive maintenance, automated operations, and real-time decision making are now realities. Our AI Factory isn't just digitizing infrastructure—it's creating a more responsive, efficient, and sustainable gas network that anticipates needs before they arise.

Kafka Forwarder: Simplifying Kafka Consumption at OpenAI

At OpenAI, Kafka fuels real-time data streaming at massive scale, but traditional consumers struggle under the burden of partition management, offset tracking, error handling, retries, Dead Letter Queues (DLQ), and dynamic scaling — all while racing to maintain ultra-high throughput. As deployments scale, complexity multiplies. Enter Kafka Forwarder — a game-changing Kafka Consumer Proxy that flips the script on traditional Kafka consumption. By offloading client-side complexity and pushing messages to consumers, it ensures at-least-once delivery, automated retries, and seamless DLQ management via Databricks. The result? Scalable, reliable and effortless Kafka consumption that lets teams focus on what truly matters. Curious how OpenAI simplified self-service, high-scale Kafka consumption? Join us as we walk through the motivation, architecture and challenges behind Kafka Forwarder, and share how we structured the pipeline to seamlessly route DLQ data into Databricks for analysis.

Optimize Cost and User Value Through Model Routing AI Agent

Each LLM has unique strengths and weaknesses, and there is no one-size-fits-all solution. Companies strive to balance cost reduction with maximizing the value of their use cases by considering various factors such as latency, multi-modality, API costs, user need, and prompt complexity. Model routing helps in optimizing performance and cost along with enhanced scalability and user satisfaction. Overview of cost-effective models training using AI gateway logs, user feedback, prompt, and model features to design an intelligent model-routing AI agent. Covers different strategies for model routing, deployment in Mosaic AI, re-training, and evaluation through A/B testing and end-to-end Databricks workflows. Additionally, it will delve into the details of training data collection, feature engineering, prompt formatting, custom loss functions, architectural modifications, addressing cold-start problems, query embedding generation and clustering through VectorDB, and RL policy-based exploration.

Revolutionizing Cybersecurity: SCB's Journey to a Self-Managed SIEM

Join us to explore how Standard Chartered Bank's (SCB) groundbreaking strategy is reshaping the future of the cybersecurity landscape by replacing traditional SIEM with a cutting-edge Databricks solution, achieving remarkable business outcomes: 80% Reduction in time to detect incidents 92% Faster threat investigation 35% Cost reduction 60% Better detection accuracy Significant enhancements in threat detection and response metrics Substantial increase in ML-driven use cases This session unveils SCB's journey to a distributed, multi-cloud lakehouse architecture that unlocks unprecedented performance and commercial optimization. Explore why a unified data and AI platform is becoming the cornerstone of next-generation, self-managed SIEM solutions for forward-thinking organizations in this era of AI-powered banking transformation.

Scaling Data Intelligence at NAB: Balancing Innovation with Enterprise-Grade Governance

In this session, discover how National Australia Bank (NAB) is reshaping its data and AI strategy by positioning data as a strategic enabler. Driven by a vision to unlock data like electricity—continuous and reliable—NAB has established a scalable foundation for data intelligence that balances agility with enterprise-grade control. We'll delve into the key architectural, security, and governance capabilities underpinning this transformation, including Unity Catalog, Serverless, Lakeflow and GenAI. The session will highlight NAB's adoption of Databricks Serverless, platform security controls like private link, and persona-based data access patterns. Attendees will walk away with practical insights into building secure, scalable, and cost-efficient data platforms that fuel innovation while meeting the demands of compliance in highly regulated environments.

Sponsored by: Accenture & Avanade | Enterprise Data Journey for The Standard Insurance Leveraging Databricks on Azure and AI Innovation

Modern insurers require agile, integrated data systems to harness AI. This framework for a global insurer uses Azure Databricks to unify legacy systems into a governed lakehouse medallion architecture (bronze/silver/gold layers), eliminating silos and enabling real-time analytics. The solution employs: Medallion architecture for incremental data quality improvement. Unity Catalog for centralized governance, row/column security, and audit compliance. Azure encryption/confidential computing for data mesh security. Automated ingestion/semantic/DevOps pipelines for scalability. By combining Databricks’ distributed infrastructure with Azure’s security, the insurer achieves regulatory compliance while enabling AI-driven innovation (e.g., underwriting, claims). The framework establishes a future-proof foundation for mergers/acquisitions (M&A) and cross-functional data products, balancing governance with agility.

Industrial organizations are unlocking new possibilities through the partnership between AVEVA and Databricks. The seamless, no-code, zero-copy solution—powered by Delta Sharing and CONNECT—enables companies to combine IT and OT data effortlessly. By bridging the gap between operational and enterprise data, businesses can harness the power of AI, data science, and business intelligence at an unprecedented scale to drive innovation. In this session, explore real-world applications of this integration, including how industry leaders are using CONNECT and Databricks to boost efficiency, reduce costs, and advance sustainability—all without fragmented point solutions. You’ll also see a live demo of the integration, showcasing how secure, scalable access to trusted industrial data is enabling new levels of industrial intelligence across sectors like mining, manufacturing, power, and oil and gas.

Sponsored by: Domo | Behind the Brand: How Sol de Janeiro Powers Amazon Ops with Databricks + DOMO

How does one of the world’s fastest-growing beauty brands stay ahead of Amazon’s complexity and scale retail with precision? At Sol de Janeiro, we built a real-time Amazon Operations Hub—powered by Databricks and DOMO—that drives decisions across inventory, profitability, and marketing ROI. See how the Databricks Lakehouse and DOMO dashboards work together to simplify workflows, surface actionable insights, and enable smarter decisions across the business—from frontline operators to the executive suite. In this session, you’ll get a behind-the-scenes look at how we unified trillions of rows from NetSuite, Amazon, Shopify, and carrier systems into a single source of truth. We’ll show how this hub streamlined cross-functional workflows, eliminated manual reporting, and laid the foundation for AI-powered forecasting and automation.

Transforming Title Insurance With Databricks Batch Inference

Join us as we explore how First American Data & Analytics, a leading property-centric information provider, revolutionized its data extraction processes using batch inference on the Databricks Platform. Discover how it overcame the challenges of extracting data from millions of historical title policy images and reduced project timelines by 75%. Learn how First American optimized its data processing capabilities, reduced costs by 70% and enhanced the efficiency of its title insurance processes, ultimately improving the home-buying experience for buyers, sellers and lenders. This session will delve into the strategic integration of AI technologies, highlighting the power of collaboration and innovation in transforming complex data challenges into scalable solutions.

AI/BI Dashboards and AI/BI Genie: Dashboards and Last-Mile Analytics Made Simple

Databricks announced two new features in 2024: AI/BI Dashboards and AI/BI Genie. Dashboards is a redesigned dashboarding experience for your regular reporting needs, while Genie provides a natural language experience for your last-mile analytics. In this session, Databricks Solutions Architect and content creator Youssef Mrini will present alongside Databricks MVP and content creator Josue A. Bogran on how you can get the most value from these tools for your organization. Content covered includes: Setup necessary, including Unity Catalog, permissions and compute Building out a dashboard with AI/BI Dashboards Creating and training an AI/BI Genie workspace to reliably deliver answers When to use Dashboards, Genie, and when to use other tools such as PBI, Tableau, Sigma, ChatGPT, etc. Fluff-free, full of practical tips, and geared to help you deliver immediate impact with these new Databricks capabilities.

Best Practices to Mitigate AI Security Risks

This session is repeated. AI is transforming industries, enhancing customer experiences and automating decisions. As organizations integrate AI into core operations, robust security is essential. The Databricks Security team collaborated with top cybersecurity researchers from OWASP, Gartner, NIST, HITRUST and Fortune 100 companies to evolve the Databricks AI Security Framework (DASF) to version 2.0. In this session, we’ll cover an AI security architecture using Unity Catalog, MLflow, egress controls, and AI gateway. Learn how security teams, AI practitioners and data engineers can secure AI applications on Databricks. Walk away with:• A reference architecture for securing AI applications• A worksheet with AI risks and controls mapped to industry standards like MITRE, OWASP, NIST and HITRUST• A DASF AI assistant tool to test your AI security