talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

Building Real-Time Sport Model Insights with Spark Structured Streaming

In the dynamic world of sports betting, precision and adaptability are key. Sports traders must navigate risk management, limitations of data feeds, and much more to prevent small model miscalculations from causing significant losses. To ensure accurate real-time pricing of hundreds of interdependent markets, traders provide key inputs such as player skill-level adjustments, whilst maintaining precise correlations. Black-box models aren’t enough— constant feedback loops drive informed, accurate decisions. Join DraftKings as we showcase how we expose real-time metrics from our simulation engine, to empower traders with deeper insights into how their inputs shape the model. Using Spark Structured Streaming, Kafka, and Databricks dashboards, we transform raw simulation outputs into actionable data. This transparency into our engines enables fine-grained control over pricing― leading to more accurate odds, a more efficient sportsbook, and an elevated customer experience.

Comprehensive Data Management and Governance With Azure Data Lake Storage

Given that data is the new oil, it must be treated as such. Organizations that pursue greater insight into their businesses and their customers must manage, govern, protect and observe the use of the data that drives these insights in an efficient, cost-effective, compliant and auditable manner without degrading access to that data. Azure Data Lake Storage offers many features which allow customers to apply such controls and protections to their critical data assets. Understanding how these features behave, the granularity, cost and scale implications and the degree of control or protection that they apply are essential to implement a data lake that reflects the value contained within. In this session, the various data protection, governance and management capabilities available now and upcoming in ADLS will be discussed. This will include how deep integration with Azure Databricks can provide a more comprehensive, end-to-end coverage for these concerns, yielding a highly efficient and effective data governance solution.

Delta Lake and the Data Mesh

Delta Lake has proven to be an excellent storage format. Coupled with the Databricks platform, the storage format has shined as a component of a distributed system on the lakehouse. The pairing of Delta and Spark provides an excellent platform, but users often struggle to perform comparable work outside of the Spark ecosystem. Tools such as delta-rs, Polars and DuckDb have brought access to users outside of Spark, but they are only building blocks of a larger system. In this 40-minute talk we will demonstrate how users can use data products on the Nextdata OS data mesh to interact with the Databricks platform to drive Delta Lake workflows. Additionally, we will show how users can build autonomous data products that interact with their Delta tables both inside and outside of the lakehouse platform. Attendees will learn how to integrate the Nextdata OS data mesh with the Databricks platform as both an external and integral component.

Media enterprises generate vast amounts of visual content, but unlocking its full potential requires multimodal AI at scale. Coactive AI and NBCUniversal’s Corporate Decision Sciences team are transforming how enterprises discover and understand visual content. We explore how Coactive AI and Databricks — from Delta Share to Genie — can revolutionize media content search, tagging and enrichment, enabling new levels of collaboration. Attendees will see how this AI-powered approach fuels AI workflows, enhances BI insights and drives new applications — from automating cut sheet generation to improving content compliance and recommendations. By structuring and sharing enriched media metadata, Coactive AI and NBCU are unlocking deeper intelligence and laying the groundwork for agentic AI systems that retrieve, interpret and act on visual content. This session will showcase real-world examples of these AI agents and how they can reshape future content discovery and media workflows.

How Corning Harnesses Unity Catalog for Enhanced FinOps Maturity and Cost Optimization

We will explore how leveraging Databricks' Unity Catalog has accelerated our FinOps maturity, enabling us to optimize platform utilization and achieve significant cost reductions. By implementing Unity Catalog, we've gained comprehensive visibility and governance over our data assets, leading to more informed decision-making and efficient resource allocation. Learn how Corning discovered actionable insights and leveraged best practices on utilizing Unity Catalog to streamline data management, enhance financial operations and drive substantial savings within your organization.

Migrating Legacy SAS Code to Databricks Lakehouse: What We Learned Along the Way

In PacificSource Health Plans, a health insurance company in the US, we are on a successful multi-year journey to migrate all of our data and analytics ecosystem to Databricks Enterprise Data Warehouse (lakehouse). A particular obstacle on this journey was a reporting data mart which relied on copious amounts of legacy SAS code that applied sophisticated business logic transformations for membership, claims, premiums and reserves. This core data mart was driving many of our critical reports and analytics. In this session we will share the unique and somewhat unexpected challenges and complexities we encountered in migrating this legacy SAS code. How our partner (T1A) leveraged automation technology (Alchemist) and some unique approaches to reverse engineer (analyze), instrument, translate, migrate, validate and reconcile these jobs; and what lessons we learned and carried from this migration effort.

Orchestration With Lakeflow Jobs

This session is repeated. Curious about orchestrating data pipelines on Databricks? Join us for an introduction to Lakeflow Jobs (formerly Databricks Workflows) — an easy-to-use orchestration service built into the Databricks Data Intelligence Platform. Lakeflow Jobs simplifies automating your data and AI workflows, from ETL pipelines to machine learning model training. In this beginner-friendly session, you'll learn how to: Build and manage pipelines using a visual approach Monitor workflows and rerun failures with repair runs Automate tasks like publishing dashboards or ingesting data using Lakeflow Connect Add smart triggers that respond to new files or table updates Use built-in loops and conditions to reduce manual work and make workflows more dynamic We’ll walk through common use cases, share demos and offer tips to help you get started quickly. If you're new to orchestration or just getting started with Databricks, this session is for you.

Revolutionizing Data Insights and the Buyer Experience at GM Financial with Cloud Data Modernization

Deloitte and GM (General Motors) Financial have collaborated to design and implement a cutting-edge cloud analytics platform, leveraging Databricks. In this session, we will explore how we overcame challenges including dispersed and limited data capabilities, high-cost hardware and outdated software, with a strategic and comprehensive approach. With the help of Deloitte and Databricks, we were able to develop a unified Customer360 view, integrate advanced AI-driven analytics, and establish robust data governance and cyber security measures. Attendees will gain valuable insights into the benefits realized, such as cost savings, enhanced customer experiences, and broad employee upskilling opportunities. Unlock the impact of cloud data modernization and advanced analytics in the automotive finance industry and beyond with Deloitte and Databricks.

Securing Data Collaboration: A Deep Dive Into Security, Frameworks, and Use Cases

This session will focus on the security aspects of Databricks Delta Sharing, Databricks Cleanrooms and Databricks Marketplace, providing an exploration of how these solutions enable secure and scalable data collaboration while prioritizing privacy. Highlights: Use cases — Understand how Delta Sharing facilitates governed, real-time data exchange across platforms and how Cleanrooms support multi-party analytics without exposing sensitive information Security internals — Dive into Delta Sharing's security frameworks Dynamic views — Learn about fine-grained security controls Privacy-first Cleanrooms — Explore how Cleanrooms enable secure analytics while maintaining strict data privacy standards Private exchanges — Explore the role of private exchanges using Databricks Marketplace in securely sharing custom datasets and AI models with specific partners or subsidiaries Network security & compliance — Review best practices for network configurations and compliance measures

Simplifying Training and GenAI Finetuning Using Serverless GPU Compute

The last year has seen the rapid progress of Open Source GenAI models and frameworks. This talk covers best practices for custom training and OSS GenAI finetuning on Databricks, powered by the newly announced Serverless GPU Compute. We’ll cover how to use Serverless GPU compute to power AI training/GenAI finetuning workloads and framework support for libraries like LLM Foundry, Composer, HuggingFace, and more. Lastly, we’ll cover how to leverage MLFlow and the Databricks Lakehouse to streamline the end to end development of these models. Key takeaways include: How Serverless GPU compute saves customers valuable developer time and overhead when dealing with GPU infrastructure Best practices for training custom deep learning models (forecasting, recommendation, personalization) and finetuning OSS GenAI Models on GPUs across the Databricks stack Leveraging distributed GPU training frameworks (e.g. Pytorch, Huggingface) on Databricks Streamlining the path to production for these models Join us to learn about the newly announced Serverless GPU Compute and the latest updates to GPU training and finetuning on Databricks!

Sponsored by: Amperity | Transforming Guest Experiences: GoTo Foods’ Data Journey with Amperity & Databricks

GoTo Foods, the platform company behind brands like Auntie Anne’s, Cinnabon, Jamba, and more, set out to turn a fragmented data landscape into a high-performance customer intelligence engine. In this session, CTO Manuel Valdes and Director of Marketing Technology Brett Newcome share how they unified data using Databricks Delta Sharing and Amperity’s Customer Data Cloud to speed up time to market. As part of GoTo’s broader strategy to support its brands with shared enterprise tools, the team: Unified loyalty, catering, and retail data into one customer view Cut campaign lead times from weeks to hours Activated audiences in real time without straining engineering Unlocked new revenue through smarter segmentation and personalization

Toyota, the world’s largest automaker, sought to accelerate time-to-data and empower business users with secure data collaboration for faster insights. Partnering with Cognizant, they established a Unified Data Lake, integrating SOX principles, Databricks Unity Catalog to ensure compliance and security. Additionally, they developed a Data Scanner solution to automatically detect non-sensitive data and accelerate data ingestion. Join this dynamic session to discover how they achieved it.

Sponsored by: Microsoft | Leverage the power of the Microsoft Ecosystem with Azure Databricks

Join us for this insightful session to learn how you can leverage the power of the Microsoft ecosystem along with Azure Databricks to take your business to the next level. Azure Databricks is a fully integrated, native, first-party solution on Microsoft Azure. Databricks and Microsoft continue to actively collaborate on product development, ensuring tight integration, optimized performance, and a streamlined support experience. Azure Databricks offers seamless integrations with Power BI, Azure Open AI, Microsoft Purview, Azure Data Lake Storage (ADLS) and Foundry. In this session, you’ll learn how you can leverage deep integration between Azure Databricks and the Microsoft solutions to empower your organization to do more with your data estate. You’ll also get an exclusive sneak peek into the product roadmap.

Sponsored by: Sigma | Moving from On-premises to Unified Business Intelligence with Databricks & Sigma

Faced with the limitations of a legacy, on-prem data stack and scalability bottlenecks in MicroStrategy, Saddle Creek Logistics Services needed a modern solution to handle massive data volumes and accelerate insight delivery. By migrating to a cloud-native architecture powered by Sigma and Databricks, the team achieved significant performance gains and operational efficiency. In this session, Saddle Creek will walk through how they leveraged Databricks’ cloud-native processing engine alongside a unified governance layer through Unity Catalog to streamline and secure downstream analytics in Sigma. Learn how embedded dashboards and near real-time reporting—cutting latency from 9 minutes to just 3 seconds—have empowered data-driven collaboration with external partners and driven a major effort to consolidate over 30,000 reports and objects to under 1,000.

SQL-Based ETL: Options for SQL-Only Databricks Development

Using SQL for data transformation is a powerful way for an analytics team to create their own data pipelines. However, relying on SQL often comes with tradeoffs such as limited functionality, hard-to-maintain stored procedures or skipping best practices like version control and data tests. Databricks supports building high-performing SQL ETL workloads. Attend this session to hear how Databricks supports SQL for data transformation jobs as a core part of your Data Intelligence Platform. In this session we will cover 4 options to use Databricks with SQL syntax to create Delta tables: Lakeflow Declarative Pipelines: A declarative ETL option to simplify batch and streaming pipelines dbt: An open-source framework to apply engineering best practices to SQL based data transformations SQLMesh: an open-core product to easily build high-quality and high-performance data pipelines SQL notebooks jobs: a combination of Databricks Workflows and parameterized SQL notebooks

Transforming Financial Intelligence with FactSet Structured and Unstructured Data and Delta Sharing

Join us to explore the dynamic partnership between FactSet and Databricks, transforming data accessibility and insights. Discover the launch of FactSet’s Structured DataFeeds via Delta Sharing on the Databricks Marketplace, enhancing access to crucial financial data insights. Learn about the advantages of streamlined data delivery and how this integration empowers data ecosystems. Beyond structured data, explore the innovative potential of vectorized data sharing of unstructured content such as news, transcripts, and filings. Gain insights into the importance of seamless vectorized data delivery to support GenAI applications and how FactSet is preparing to simplify client GenAI workflows with AI-ready data. Experience a demo that showcases the complete journey from data delivery to actionable GenAI application responses in a real-world Financial Services scenario. See firsthand how FactSet is simplifying client GenAI workflows with AI-ready data that drives faster, more informed financial decisions.

Transforming HP’s Print ELT Reporting with GenIT: Real-Time Insights Tool Powered by Databricks AI

Timely and actionable insights are critical for staying competitive in today’s fast-paced environment. At HP Print, manual reporting for executive leadership (ELT) has been labor-intensive, hindering agility and productivity. To address this, we developed the Generative Insights Tool (GenIT) using Databricks Genie and Mosaic AI to create a real-time insights engine automating SQL generation, data visualization, and narrative creation. GenIT delivers instant insights, enabling faster decisions, greater productivity, and improved consistency while empowering leaders to respond to printer trends. With automated querying, AI-powered narratives, and a chatbot, GenIT reduces inefficiencies and ensures quality data and insights. Our roadmap integrates multi-modal data, enhances chatbot functionality, and scales globally. This initiative shows how HP Print leverages GenAI to improve decision-making, efficiency, and agility, and we will showcase this transformation at the Databricks AI Summit.

Unlocking AI Value: Build AI Agents on SAP Data in Databricks

Discover how enterprises are turning SAP data into intelligent AI. By tapping into contextual SAP data through Delta Sharing on Databricks - no messy ETL needed - they’re accelerating AI innovation and business insights. Learn how they: - Build domain-specific AI that can reason on private SAP data- Deliver data intelligence to power insights for business leaders- Govern and secure their new unified data estate

Using Databricks to Power News Sentiment, a Capital IQ Pro Application

The News Sentiment application enhances the discoverability of news content through our flagship platform, Capital IQ Pro. We processed news articles for 10,000+ public companies through entity recognition, along with a series of proprietary financial sentiment models to assess whether the news was positive or negative, as well as its significance and relevance to the company. We built a database containing over 1.5 million signals and operationalized the end-to-end ETL as a daily Workflow on Databricks. The development process included model training and selection. We utilized training data from our internal financial analysts to train Google’s T5-Flan to create our proprietary sentiment model and two additional models. Our models are deployed on Databricks Model-Serving as serverless endpoints that can be queried on-demand. The last phase of the project was to develop a UI, in which we utilized Databricks serverless SQL warehouses to surface this data in real-time.

Advanced RAG Overview — Thawing Your Frozen RAG Pipeline

The most common RAG systems rely on a frozen RAG system — one where there’s a single embedding model and single vector index. We’ve achieved a modicum of success with that, but when it comes to increasing accuracy for production systems there is only so much this approach solves. In this session we will explore how to move from the frozen systems to adaptive RAG systems which produce more tailored outputs with higher accuracy. Databricks services: Lakehouse, Unity Catalog, Mosaic, Sweeps, Vector Search, Agent Evaluation, Managed Evaluation, Inference Tables