talk-data.com talk-data.com

Topic

AI/ML

Artificial Intelligence/Machine Learning

data_science algorithms predictive_analytics

9014

tagged

Activity Trend

1532 peak/qtr
2020-Q1 2026-Q1

Activities

9014 activities · Newest first

Simplifying Training and GenAI Finetuning Using Serverless GPU Compute

The last year has seen the rapid progress of Open Source GenAI models and frameworks. This talk covers best practices for custom training and OSS GenAI finetuning on Databricks, powered by the newly announced Serverless GPU Compute. We’ll cover how to use Serverless GPU compute to power AI training/GenAI finetuning workloads and framework support for libraries like LLM Foundry, Composer, HuggingFace, and more. Lastly, we’ll cover how to leverage MLFlow and the Databricks Lakehouse to streamline the end to end development of these models. Key takeaways include: How Serverless GPU compute saves customers valuable developer time and overhead when dealing with GPU infrastructure Best practices for training custom deep learning models (forecasting, recommendation, personalization) and finetuning OSS GenAI Models on GPUs across the Databricks stack Leveraging distributed GPU training frameworks (e.g. Pytorch, Huggingface) on Databricks Streamlining the path to production for these models Join us to learn about the newly announced Serverless GPU Compute and the latest updates to GPU training and finetuning on Databricks!

Sponsored by: Microsoft | Leverage the power of the Microsoft Ecosystem with Azure Databricks

Join us for this insightful session to learn how you can leverage the power of the Microsoft ecosystem along with Azure Databricks to take your business to the next level. Azure Databricks is a fully integrated, native, first-party solution on Microsoft Azure. Databricks and Microsoft continue to actively collaborate on product development, ensuring tight integration, optimized performance, and a streamlined support experience. Azure Databricks offers seamless integrations with Power BI, Azure Open AI, Microsoft Purview, Azure Data Lake Storage (ADLS) and Foundry. In this session, you’ll learn how you can leverage deep integration between Azure Databricks and the Microsoft solutions to empower your organization to do more with your data estate. You’ll also get an exclusive sneak peek into the product roadmap.

Transforming Financial Intelligence with FactSet Structured and Unstructured Data and Delta Sharing

Join us to explore the dynamic partnership between FactSet and Databricks, transforming data accessibility and insights. Discover the launch of FactSet’s Structured DataFeeds via Delta Sharing on the Databricks Marketplace, enhancing access to crucial financial data insights. Learn about the advantages of streamlined data delivery and how this integration empowers data ecosystems. Beyond structured data, explore the innovative potential of vectorized data sharing of unstructured content such as news, transcripts, and filings. Gain insights into the importance of seamless vectorized data delivery to support GenAI applications and how FactSet is preparing to simplify client GenAI workflows with AI-ready data. Experience a demo that showcases the complete journey from data delivery to actionable GenAI application responses in a real-world Financial Services scenario. See firsthand how FactSet is simplifying client GenAI workflows with AI-ready data that drives faster, more informed financial decisions.

Transforming HP’s Print ELT Reporting with GenIT: Real-Time Insights Tool Powered by Databricks AI

Timely and actionable insights are critical for staying competitive in today’s fast-paced environment. At HP Print, manual reporting for executive leadership (ELT) has been labor-intensive, hindering agility and productivity. To address this, we developed the Generative Insights Tool (GenIT) using Databricks Genie and Mosaic AI to create a real-time insights engine automating SQL generation, data visualization, and narrative creation. GenIT delivers instant insights, enabling faster decisions, greater productivity, and improved consistency while empowering leaders to respond to printer trends. With automated querying, AI-powered narratives, and a chatbot, GenIT reduces inefficiencies and ensures quality data and insights. Our roadmap integrates multi-modal data, enhances chatbot functionality, and scales globally. This initiative shows how HP Print leverages GenAI to improve decision-making, efficiency, and agility, and we will showcase this transformation at the Databricks AI Summit.

Unify Your Data and Governance With Lakehouse Federation

In today's data landscape, organizations often grapple with fragmented data spread across various databases, data warehouses and catalogs. Lakehouse Federation addresses this challenge by enabling seamless discovery, querying, and governance of distributed data without the need for duplication or migration. This session will explore how Lakehouse Federation integrates external data sources like Hive Metastore, Snowflake, SQL Server and more into a unified interface, providing consistent access controls, lineage tracking and auditing across your entire data estate. Learn how to streamline analytics and AI workloads, enhance compliance and reduce operational complexity by leveraging a single, cohesive platform for all your data needs.

Unlocking AI Value: Build AI Agents on SAP Data in Databricks

Discover how enterprises are turning SAP data into intelligent AI. By tapping into contextual SAP data through Delta Sharing on Databricks - no messy ETL needed - they’re accelerating AI innovation and business insights. Learn how they: - Build domain-specific AI that can reason on private SAP data- Deliver data intelligence to power insights for business leaders- Govern and secure their new unified data estate

AI Agents for Marketing: Leveraging Mosaic AI to Create a Multi-Purpose Agentic Marketing Assistant

Marketing professionals build campaigns, create content and use effective copywriting to tell a good story to promote a product/offer. All of this requires a thorough and meticulous process for every individual campaign. In order to assist marketing professionals at 7-Eleven, we built a multi-purpose assistant that could: Use campaign briefs to generate campaign ideas and taglines Do copy-writing for marketing content Verify images for messaging accuracy Answer general questions and browse the web as a generic assistant We will walk you through how we created multiple agents as different personas with LangGraph and Mosaic AI to create a chat assistant that assumes a different persona based on the user query. We will also explain our evaluation methodology in choosing models and prompts and how we implemented guardrails for high reliability with sensitive marketing content. This assistant by 7-Eleven was showcased at the Databricks booth at NRF earlier this year.

AI/BI Driving Speed to Value in Supply Chain

Conagra is a global food manufacturer with $12.2B in revenue, 18K+ employees, 45+ plants in US, Canada and Mexico. Conagra's Supply Chain organization is heavily focused on delivering results in productivity, waste reduction, inventory rationalization, safety and customer service levels. By migrating the Supply Chain reporting suite to Databricks over the past 2 years, Conagra's Supply Chain Analytics & Data Science team has been able to deliver new AI solutions which complement traditional BI platforms and lay the foundation for additional AI/ML applications in the future. With Databricks Genie integrated within traditional BI reports, Conagra Supply Chain users can now go from insight to action faster and with fewer clicks, enabling speed to value in a complex Supply Chain. The Databricks platform also allows the team to curate data products to be consumed by traditional BI applications today as well as the ability to rapidly scale for the AI/ML applications of tomorrow.

Best Practices for Building User-Facing AI Systems on Databricks

This session is repeated. Integrating AI agents into business systems requires tailored approaches for different maturity levels (crawl-walk-run) that balance scalability, accuracy and usability. This session addresses the critical challenge of making AI agents accessible to business users. We will explore four key integration methods: Databricks apps: The fastest way to build and run applications that leverage your data, with the full security and governance of Databricks Genie: Tool enabling non-technical users to gain data insights on Structured Data through natural language queries Chatbots: Combine real-time data retrieval with generative AI for contextual responses and process automation Batch inference: Scalable, asynchronous processing for large-scale AI tasks, optimizing efficiency and cost We'll compare these approaches, discussing their strengths, challenges and ideal use cases to help businesses select the most suitable integration strategy for their specific needs.

Building Responsible and Resilient AI: The Databricks AI Governance Framework

GenAI & machine learning are reshaping industries, driving innovation and redefining business strategies. As organizations embrace these technologies, they face significant challenges in managing AI initiatives effectively, such as balancing innovation with ethical integrity, operational resilience and regulatory compliance. This presentation introduces the Databricks AI Governance Framework (DAGF), a practical framework designed to empower organizations to navigate the complexities of AI. It provides strategies for building scalable, responsible AI programs that deliver measurable value, foster innovation and achieve long-term success. By examining the framework's five foundational pillars — AI organization, ethics, legal and regulatory compliance, transparency and interpretability, AI operations and infrastructure and AI security — this session highlights how AI governance aligns programs with the organization's strategic goals, mitigates risks and builds trust across stakeholders.

Creating LLM Judges to Measure Domain-Specific Agent Quality

This session is repeated. Measuring the effectiveness of domain-specific AI agents requires specialized evaluation frameworks that go beyond standard LLM benchmarks. This session explores methodologies for assessing agent quality across specialized knowledge domains, tailored workflows, and task-specific objectives. We'll demonstrate practical approaches to designing robust LLM judges that align with your business goals and provide meaningful insights into agent capabilities and limitations. Key session takeaways include: Tools for creating domain-relevant evaluation datasets and benchmarks that accurately reflect real-world use cases Approach for creating LLM judges to measure domain-specific metrics Strategies for interpreting those results to drive iterative improvement in agent performance Join us to learn how proper evaluation methodologies can transform your domain-specific agents from experimental tools to trusted enterprise solutions with measurable business value.

GPU Accelerated Spark Connect

Spark Connect, first included for SQL/DataFrame API in Apache Spark 3.4 and recently extended to MLlib in 4.0, introduced a new way to run Spark applications over a gRPC protocol. This has many benefits, including easier adoption for non-JVM clients, version independence from applications and increased stability and security of the associated Spark clusters. The recent Spark Connect extension for ML also included a plugin interface to configure enhanced server-side implementations of the MLlib algorithms when launching the server. In this talk, we shall demonstrate how this new interface, together with Spark SQL’s existing plugin interface, can be used with NVIDIA GPU-accelerated plugins for ML and SQL to enable no-code change, end-to-end GPU acceleration of Spark ETL and ML applications over Spark Connect, with optimal performance up to 9x at 80% cost reduction compared to CPU baselines.

How an Open, Scalable and Secure Data Platform is Powering Quick Commerce Swiggy's AI

Swiggy, India's leading quick commerce platform, serves ~13 million users across 653 cities, with 196,000 restaurant partners and 17,000 SKUs. To handle this scale, Swiggy developed a secure, scalable AI platform processing millions of predictions per second. The tech stack includes Apache Kafka for real-time streaming, Apache Spark on Databricks for analytics and ML, and Apache Flink for stream processing. The Lakehouse architecture on Delta ensures data reliability, while Unity Catalog enables centralized access control and auditing. These technologies power critical AI applications like demand forecasting, route optimization, personalized recommendations, predictive delivery SLAs, and generative AI use cases.Key Takeaway:This session explores building a data platform at scale, focusing on cost efficiency, simplicity, and speed, empowering Swiggy to seamlessly support millions of users and AI use cases.

How to Get the Most Out of Your BI Tools on Databricks

Unlock the full potential of your BI tools with Databricks. This session explores how features like Photon, Databricks SQL, Liquid Clustering, AI/BI Genie and Publish to Power BI enhance performance, scalability and user experience. Learn how Databricks accelerates query performance, optimizes data layouts and integrates seamlessly with BI tools. Gain actionable insights and best practices to improve analytics efficiency, reduce latency and drive better decision-making. Whether migrating from a data warehouse or optimizing an existing setup, this talk provides the strategies to elevate your BI capabilities.

Lakeflow Connect: The Game-Changer for Complex Event-Driven Architectures

In 2020, Delaware implemented a state-of-the-art, event-driven architecture for EFSA, enabling a highly decoupled system landscape, presented at the Data&AI Summit 2021. By centrally brokering events in near real-time, consumer applications react instantly to events from producer applications as they occur. Event producers are decoupled from consumers via a publisher/subscriber mechanism. Over the past years, we noticed some drawbacks. The processing of these custom events, primarily aimed for process integration weren’t covering all edge cases, the data quality was not always optimal due to missing events and we needed to create a complex logic for SCD2 tables. Lakeflow Connect allows us to extract the data directly from the source without the complex architecture in between, avoiding data loss and thus, data quality issues, and with some simple adjustments, an SCD2 table is created automatically. Lakeflow Connect allows us to create more efficient and intelligent data provisioning.

In today's rapidly evolving digital landscape, organizations must prioritize robust data architectures and AI strategies to remain competitive. In this session, we will explore how Procter & Gamble (P&G) has embarked on a transformative journey to digitize its operations via scalable data, analytics and AI platforms, establishing a strong foundation for data-driven decision-making and the emergence of agentic AI.Join us as we delve into the comprehensive architecture and platform initiatives undertaken at P&G to create scalable and agile data platforms unleashing BI/AI value. We will discuss our approach to implementing data governance and semantics, ensuring data integrity and accessibility across the organization. By leveraging advanced analytics and Business Intelligence (BI) tools, we will illustrate how P&G harnesses data to generate actionable insights at scale, all while maintaining security and speed.

Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines

Large Language Models (LLMs) excel at understanding messy, real-world data, but integrating them into production systems remains challenging. Prompts can be unruly to write, vary by model and can be difficult to manage in the large context of a pipeline. In this session, we'll demonstrate incorporating LLMs into a geospatial conflation pipeline, using DSPy. We'll discuss how DSPy works under the covers and highlight the benefits it provides pipeline creators and managers.

Marketing Data + AI Leaders Forum
talk
by Derek Slager (Amperity) , Dan Morris (Databricks) , Calen Holbrooks (Airtable) , Bryan Saftler (Databricks) , Elizabeth Dobbs (Databricks) , David Geisinger (Deloitte) , Kristen Brophy (ThredUp) , Joyce Hwang (Dropbox) , Zeynep Inanoglu Ozdemir (Atlassian Pty Ltd.) , Alex Dean (Snowplow) , Rick Schultz (Databricks) , Bryce Peake (Domino's) , Julie Foley Long (Grammarly)

Join us Tuesday June 10th, 9:10-12:10 PM PT Hosted by Databricks CMO, Rick Schultz, hear from executives and speakers at PetSmart, Valentino, Domino’s, AirTable, Dropbox, ThredUp, Grammarly, Deloitte, and more. Come for actionable strategies and real-world examples: Hear from marketing experts on how to build data and AI-driven marketing organizations. Learn how Databricks Marketing supercharges impact using the Data Intelligence Platform; scaling personalization, building more efficient campaigns, and empowering marketers to self-serve insights.

Responsible AI at Scale: Balancing Democratization and Regulation in the Financial Sector

We partnered with Databricks to pioneer a new standard in financial sector's enterprise AI, balancing rapid AI democratization with strict regulatory and security requirements. At the core is our Responsible AI Gateway, enforcing jailbreak prevention and compliance on every LLM query. Real-time observability, powered by Databricks, calculates risk and accuracy metrics, detecting issues before escalation. Leveraging Databricks' model hosting ensures scalable LLM access, fortifying security and efficiency. We built frameworks to democratize AI without compromising guardrails. Operating in a regulated environment, we showcase how Databricks enables democratization and responsible AI at scale, offering best practices for financial organizations to harness AI safely and efficiently.

Simplifying Data Pipelines With Lakeflow Declarative Pipelines: A Beginner’s Guide

As part of the new Lakeflow data engineering experience, Lakeflow Declarative Pipelines makes it easy to build and manage reliable data pipelines. It unifies batch and streaming, reduces operational complexity and ensures dependable data delivery at scale — from batch ETL to real-time processing.Lakeflow Declarative Pipelines excels at declarative change data capture, batch and streaming workloads, and efficient SQL-based pipelines. In this session, you’ll learn how we’ve reimagined data pipelining with Lakeflow Declarative Pipelines, including: A brand new pipeline editor that simplifies transformations Serverless compute modes to optimize for performance or cost Full Unity Catalog integration for governance and lineage Reading/writing data with Kafka and custom sources Monitoring and observability for operational excellence “Real-time Mode” for ultra-low-latency streaming Join us to see how Lakeflow Declarative Pipelines powers better analytics and AI with reliable, unified pipelines.