talk-data.com talk-data.com

Topic

AI/ML

Artificial Intelligence/Machine Learning

data_science algorithms predictive_analytics

9014

tagged

Activity Trend

1532 peak/qtr
2020-Q1 2026-Q1

Activities

9014 activities · Newest first

At Zillow, we have accelerated the volume and quality of our dashboards by leveraging a modern SDLC with version control and CI/CD. In the past three months, we have released 32 production-grade dashboards and shared them securely across the organization while cutting error rates in half over that span. In this session, we will provide an overview of how we utilize Databricks asset bundles and GitLab CI/CD to create performant dashboards that can be confidently used for mission-critical operations. As a concrete example, we'll then explore how Zillow's Data Platform team used this approach to automate our on-call support analysis, leveraging our dashboard development strategy alongside Databricks LLM offerings to create a comprehensive view that provides actionable performance metrics alongside AI-generated insights and action items from the hundreds of requests that make up our support workload.

How Databricks Powers Real-Time Threat Detection at Barracuda XDR

As cybersecurity threats grow in volume and complexity, organizations must efficiently process security telemetry for best-in-class detection and mitigation. Barracuda’s XDR platform is redefining security operations by layering advanced detection methodologies over a broad range of supported technologies. Our vision is to deliver unparalleled protection through automation, machine learning and scalable detection frameworks, ensuring threats are identified and mitigated quickly. To achieve this, we have adopted Databricks as the foundation of our security analytics platform, providing greater control and flexibility while decoupling from traditional SIEM tools. By leveraging Lakeflow Declarative Pipelines, Spark Structured Streaming and detection-as-code CI/CD pipelines, we have built a real-time detection engine that enhances scalability, accuracy and cost efficiency. This session explores how Databricks is shaping the future of XDR through real-time analytics and cloud-native security.

Scaling Demand Forecasting at Nikon: Automating Camera Accessories Sales Planning with Databricks

At Nikon, camera accessories are essential in meeting the diverse needs of professional photographers worldwide, making their timely availability a priority. Forecasting accessories, however, presents unique challenges including dependencies on parent products, sparse demand patterns, and managing predictions for thousands of items across global subsidiaries. To address this, we leveraged Databricks' unified data and AI platform to develop and deploy an automated, scalable solution for accessory sales planning. Our solution employs a hybrid approach that auto-selects best algorithm from a suite of ML and time-series models, incorporating anomaly detection and methods to handle sparse and low-demand scenarios. MLflow is utilized to automate model logging and versioning, enabling efficient management, and scalable deployment. The framework includes data preparation, model selection and training, performance tracking, prediction generation, and output processing for downstream systems.

Scaling XGBoost With Spark Connect ML on Grace Blackwell

XGBoost is one of the off-the-shelf gradient boosting algorithms for analyzing tabular datasets. Unlike deep learning, gradient-boosting decision trees require the entire dataset to be in memory for efficient model training. To overcome the limitation, XGBoost features a distributed out-of-core implementation that fetches data in batch, which benefits significantly from the latest NVIDIA GPUs and the NVLink-C2C’s ultra bandwidth. In this talk, we will share our work on optimizing XGBoost using the Grace Blackwell super chip. The fast chip-to-chip link between the CPU and the GPU enables XGBoost to scale up without compromising performance. Our work has effectively increased XGBoost’s training capacity to over 1.2TB on a single node. The approach is scalable to GPU clusters using Spark, enabling XGBoost to handle terabytes of data efficiently. We will demonstrate combining XGBoost out-of-core algorithms with the latest connect ML from Spark 4.0 for large model training workflows.

Spark 4.0 and Delta 4.0 For Streaming Data

Real-time data is one of the most important datasets for any Data and AI Platform across any industry. Spark 4.0 and Delta 4.0 include new features that make ingestion and querying of real-time data better than ever before. Features such as: Python custom data sources for simple ingestion of streaming and batch time series data sources using Spark Variant types for managing variable data types and json payloads that are common in the real time domain Delta liquid clustering for simple data clustering without the overhead or complexity of partitioning In this presentation you will learn how data teams can leverage these latest features to build industry-leading, real-time data products using Spark and Delta and includes real world examples and metrics of the improvements they make in performance and processing of data in the real time space.

AI Agents in Action: Structuring Unstructured Data on Demand With Databricks and Unstructured

LLM agents aren’t just answering questions — they’re running entire workflows. In this talk, we’ll show how agents can autonomously ingest, process and structure unstructured data using Unstructured, with outputs flowing directly into Databricks. Powered by the Model Context Protocol (MCP), agents can interface with Unstructured’s full suite of capabilities — discovering documents across sources, building ephemeral workflows and exporting structured insights into Delta tables. We’ll walk through a demo where an agent responds to a natural language request, dynamically pulls relevant documents, transforms them into usable data and surfaces insights — fast. Join us for a sneak peek into the future of AI-native data workflows, where LLMs don’t just assist — they operate.

Cracking Complex Documents with Databricks Mosaic AI

In this session, we will share how we are transforming the way organizations process unstructured and non-standard documents using Mosaic AI and agentic patterns within the Databricks ecosystem. We have developed a scalable pipeline that turns complex legal and regulatory content into structured, tabular data.We will walk through the full architecture, which includes Unity Catalog for secure and governed data access, Databricks Vector Search for intelligent indexing and retrieval and Databricks Apps to deliver clear insights to business users. The solution supports multiple languages and formats, making it suitable for teams working across different regions. We will also discuss some of the key technical challenges we addressed, including handling parsing inconsistencies, grounding model responses and ensuring traceability across the entire process. If you are exploring how to apply GenAI and large language models, this session is for you. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Learn to Program Not Write Prompts with DSPy

Writing prompts for our GenAI applications is long, tedious, and unmaintainable. A proper software development lifecycle requires proper testing and maintenance, something incredibly difficult to do on a block of text. Our current prompt engineering best practices have largely been manual trial and error, testing which of our prompts work well in certain situations. This process worsens as our prompts become more complex, adding multiple tasks and functionality within one long singular prompt. Enter DSPy, your PROGRAMATIC way of building GenAI Applications. Learn how DSPy allows you to modularize your prompt into modules and enforce typing through signatures. Then, utilize state of the art algorithms to optimize the prompts and weights against your evaluation datasets, just like machine learning! We will compare DSPy to a restaurant to help illustrate and demo DSPy’s capabilities. It's time to start programming, rather than prompting, again!

Petabyte-Scale On-Chain Insights: Real-Time Intelligence for the Next-Gen Financial Backbone

We’ll explore how CipherOwl Inc. constructed a near real-time, multi-chain data lakehouse to power anti-money laundering (AML) monitoring at a petabyte scale. We will walk through the end-to-end architecture, which integrates cutting-edge open-source technologies and AI-driven analytics to handle massive on-chain data volumes seamlessly. Off-chain intelligence complements this to meet rigorous AML requirements. At the core of our solution is ChainStorage, an OSS started by Coinbase that provides robust blockchain data ingestion and block-level serving. We enhanced it with Apache Spark™ and Arrow™, coupled for high-throughput processing and efficient data serialization, backed by Delta Lake and Kafka. For the serving layer, we employ StarRocks to deliver lightning-fast SQL analytics over vast datasets. Finally, our system incorporates machine learning and AI agents for continuous data curation and near real-time insights, which are crucial for tackling on-chain AML challenges.

Sponsored by: Neo4j | Get Your Data AI-Ready: Knowledge Graphs & GraphRAG for GenAI Success

Enterprise-grade GenAI needs a unified data strategy for accurate, reliable results. Learn how knowledge graphs make structured and unstructured data AI-ready while enabling governance and transparency. See how GraphRAG (retrieval-augmented generation with knowledge graphs) drives real success: Learn how companies like Klarna have deployed GenAI to build chatbots grounded in knowledge graphs, improving productivity and trust, while a major gaming company achieved 10x faster insights. We’ll share real examples and practical steps for successful GenAI deployment.

Sponsored by: SAP | SAP and Databricks Open a Bold New Era of Data and AI​

SAP and Databricks have formed a landmark partnership that brings together SAP's deep expertise in mission-critical business processes and semantically rich data with Databricks' industry-leading capabilities in AI, machine learning, and advanced data engineering. From curated, SAP-managed data products to zero-copy Delta Sharing integration, discover how SAP Business Data Cloud empowers data and AI professionals to build AI solutions that unlock unparalleled business insights using trusted business data.

AI and Genie: Analyzing Healthcare Improvement Opportunities

This session is repeated. Improving healthcare impacts us all. We highlight how Premier Inc. took risk-adjusted patient data from more than 1,300 member hospitals across America, applying a natural language interface using AI/BI Genie, allowing our users to discover new insights. The stakes are high, new insights surfaced represent potential care improvement and lives positively impacted. Using Genie and our AI-ready data in Unity Catalog, our team was able to stand up a Genie instance in three short days, bypassing costs and time of custom modeling and application development. Additionally, Genie allowed our internal teams to generate complex SQL, as much as 10 times faster than writing it by hand. As Genie and lakehouse apps continue to advance rapidly, we are excited to leverage these features by introducing Genie to as many as 20,000 users across hundreds of hospitals. This will support our members’ ongoing mission to enhance the care they provide to the communities they serve.

Bayada’s Snowflake-to-Databricks Migration: Transforming Data for Speed & Efficiency

Bayada is transforming its data ecosystem by consolidating Matillion+Snowflake and SSIS+SQL Server into a unified Enterprise Data Platform powered by Databricks. Using Databricks' Medallion architecture, this platform enables seamless data integration, advanced analytics and machine learning across critical domains like general ledger, recruitment and activity-based costing. Databricks was selected for its scalability, real-time analytics and ability to handle both structured and unstructured data, positioning Bayada for future growth. The migration aims to reduce data processing times by 35%, improve reporting accuracy and cut reconciliation efforts by 40%. Operational costs are projected to decrease by 20%, while real-time analytics is expected to boost efficiency by 15%. Join this session to learn how Bayada is leveraging Databricks to build a high-performance data platform that accelerates insights, drives efficiency and fosters innovation organization-wide.

Beyond Simple RAG: Unlocking Quality, Scale and Cost-Efficient Retrieval With Mosaic AI Vector Search

This session is repeated. Mosaic AI Vector Search is powering high-accuracy retrieval systems in production across a wide range of use cases — including RAG applications, entity resolution, recommendation systems and search. Fully integrated with the Databricks Data Intelligence Platform, it eliminates pipeline maintenance by automatically syncing data from source to index. Over the past year, customers have asked for greater scale, better quality out-of-the-box and cost-efficient performance. This session delivers on those needs — showcasing best practices for implementing high-quality retrieval systems and revealing major product advancements that improve scalability, efficiency and relevance. What you’ll learn: How to optimize Vector Search with hybrid retrieval and reranking for better out-of-the-box results Best practices for managing vector indexes with minimal operational overhead Real-world examples of how organizations have scaled and improved their search and recommendation systems

Deploying Databricks Asset Bundles (DABs) at Scale

This session is repeated.Managing data and AI workloads in Databricks can be complex. Databricks Asset Bundles (DABs) simplify this by enabling declarative, Git-driven deployment workflows for notebooks, jobs, Lakeflow Declarative Pipelines, dashboards, ML models and more.Join the DABs Team for a Deep Dive and learn about:The Basics: Understanding Databricks asset bundlesDeclare, define and deploy assets, follow best practices, use templates and manage dependenciesCI/CD & Governance: Automate deployments with GitHub Actions/Azure DevOps, manage Dev vs. Prod differences, and ensure reproducibilityWhat’s new and what's coming up! AI/BI Dashboard support, Databricks Apps support, a Pythonic interface and workspace-based deploymentIf you're a data engineer, ML practitioner or platform architect, this talk will provide practical insights to improve reliability, efficiency and compliance in your Databricks workflows.

Empowering Fundraising With AI: A Journey With Databricks Mosaic AI

Artificial Intelligence (AI) is more than a corporate tool; it’s a force for good. At Doctors Without Borders/Médecins Sans Frontières (MSF), we use AI to optimize fundraising, ensuring that every dollar raised directly supports life-saving medical aid worldwide. With Databricks, Mosaic AI and Unity Catalog, we analyze donor behavior, predict giving patterns and personalize outreach, increasing contributions while upholding ethical AI principles. This session will showcase how AI maximizes fundraising impact, enabling faster crisis response and resource allocation. We’ll explore predictive modeling for donor engagement, secure AI governance with Unity Catalog and our vision for generative AI in fundraising, leveraging AI-assisted storytelling to deepen donor connections. AI is not just about efficiency; it’s about saving lives. Join us to see how AI-driven fundraising is transforming humanitarian aid on a global scale.

Getting Data AI Ready: Testimonial of Good Governance Practices Constructing Accurate Genie Spaces

Genie Rooms have played an integral role in democratizing important datasets like Cell Tower and Lease Information. However, in order to ensure that this exciting new release from Databricks was configured as optimally as possible from development to deployment, we needed additional scaffolding around governance. In this talk we will describe the four main components we used in conjunction with the Genie Room to build a successful product and will provide generalizable lessons to help others get the most out of this object. At the core are a declarative, metadata approach to creating UC tables deployed on a robust framework. Second, a platform that efficiently crowdsourced targeted feedback from different user groups. Third, a tool that balances the LLM’s creativity with human wisdom. And finally, a platform that enforces our principle of separating Storage from Compute to manage access to the room at a fine-grained level and enables a whole host of interesting use-cases.

Harnessing Real-Time Data and AI for Retail Innovation

This talk explores using advanced data processing and generative AI techniques to revolutionize the retail industry. Using Databricks, we will discuss how cutting-edge technologies enable real-time data analysis and machine learning applications, creating a powerful ecosystem for large-scale, data-driven retail solutions. Attendees will gain insights into architecting scalable data pipelines for retail operations and implementing advanced analytics on streaming customer data. Discover how these integrated technologies drive innovation in retail, enhancing customer experiences, streamlining operations and enabling data-driven decision-making. Learn how retailers can leverage these tools to gain a competitive edge in the rapidly evolving digital marketplace, ultimately driving growth and adaptability in the face of changing consumer behaviors and market dynamics.

Italgas’ AI Factory and the Future of Gas Distribution

At Italgas, Europe’s leading gas distributor both by network size and number of customers, we are spearheading digital transformation through a state-of-the-art, fully-fledged Databricks Intelligent platform. Achieved 50% cost reduction and 20% performance boost migrating from Azure Synapse to Databricks SQL Deployed 41 ML/GenAI models in production, with 100% of workloads governed by Unity Catalog Empowered 80% of employees with self-service BI through Genie Dashboards Enabled natural language queries for control-room operators analyzing network status The future of gas distribution is data-driven: predictive maintenance, automated operations, and real-time decision making are now realities. Our AI Factory isn't just digitizing infrastructure—it's creating a more responsive, efficient, and sustainable gas network that anticipates needs before they arise.

LanceDB: A Complete Search and Analytical Store for Serving Production-scale AI Applications

If you're building AI applications, chances are you're solving a retrieval problem somewhere along the way. This is why vector databases are popular today. But if we zoom out from just vector search, serving AI applications also requires handling KV workloads like a traditional feature store, as well as analytical workloads to explore and visualize data. This means that building an AI application often requires multiple data stores, which means multiple data copies, manual syncing, and extra infrastructure expenses. LanceDB is the first and only system that supports all of these workloads in one system. Powered by Lance columnar format, LanceDB completely breaks open the impossible triangle of performance, scalability, and cost for AI serving. Serving AI applications is different from previous waves of technology, and a new paradigm demands new tools.