talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

From Prediction to Prevention: Transforming Risk Management in Insurance

Protecting insurers against emerging threats is critical. This session reveals how leading companies use Databricks’ Data Intelligence Platform to transform risk management, enhance fraud detection, and ensure compliance. Learn how advanced analytics, AI, and machine learning process vast data in real time to identify risks and mitigate threats. Industry leaders will share strategies for building resilient operations that protect against financial losses and reputational harm. Key takeaways: AI-powered fraud prevention using anomaly detection and predictive analytics Real-time risk assessment models integrating IoT, behavioral, and external data Strategies for robust compliance and governance with operational efficiency Discover how data intelligence is revolutionizing insurance risk management and safeguarding the industry’s future.

Lakebase: Fully Managed Postgres for the Lakehouse

Lakebase is a new Postgres-compatible OLTP database designed to support intelligent applications. Lakebase eliminates custom ETL pipelines with built-in lakehouse table synchronization, supports sub-10ms latency for high-throughput workloads, and offers full Postgres compatibility, so you can build applications more quickly.In this session, you’ll learn how Lakebase enables faster development, production-level concurrency, and simpler operations for data engineers and application developers building modern, data-driven applications. We'll walk through key capabilities, example use cases, and how Lakebase simplifies infrastructure while unlocking new possibilities for AI and analytics.

Lakeflow Connect enables you to easily and efficiently ingest data from enterprise applications like Salesforce, ServiceNow, Google Analytics, SharePoint, NetSuite, Dynamics 365 and more. In this session, we’ll dive deep on using connectors for the most popular SaaS applications, reviewing common use cases such as analyzing consumer behavior, predicting churn and centralizing HR analytics. You'll also hear from an early customer about how Lakeflow Connect helped unify their customer data to drive an improved automotive experience. We’ll wrap up with a Q&A so you have the opportunity to learn from our experts.

Modernizing Critical Infrastructure: AI and Data-Driven Solutions in Nuclear and Utility Operations

This session showcases how both Westinghouse Electric and Alabama Power Company are leveraging cloud-based tools, advanced analytics, and machine learning to transform operational resilience across the energy sector. In the first segment, we'll explore AI's crucial role in enhancing safety, efficiency, and compliance in nuclear operations through technologies like HiVE and Bertha, focusing on the unique reliability and credibility requirements of the regulated nuclear industry. We’ll then highlight how Alabama Power Company has modernized its grid management and storm preparedness by using Databricks to develop SPEAR and RAMP—applications that combine real-time data and predictive analytics to improve reliability, efficiency, and customer service.

Retail Genie: No-Code AI Apps for Empowering BI Users to be Self-Sufficient

Explore how Databricks AI/BI Genie revolutionizes retail analytics, empowering business users to become self-reliant data explorers. This session highlights no-code AI apps that create a conversational interface for retail data analysis. Genie spaces harness NLP and generative AI to convert business questions into actionable insights, bypassing complex SQL queries. We'll showcase retail teams effortlessly analyzing sales trends, inventory and customer behavior through Genie's intuitive interface. Witness real-world examples of AI/BI Genie's adaptive learning, enhancing accuracy and relevance over time. Learn how this technology democratizes data access while maintaining governance via Unity Catalog integration. Discover Retail Genie's impact on decision-making, accelerating insights and cultivating a data-driven retail culture. Join us to see the future of accessible, intelligent retail analytics in action.

Revolutionizing Banking Data, Analytics and AI: Building an Enterprise Data Hub With Databricks

Explore the transformative journey of a regional bank as it modernizes its enterprise data infrastructure amidst the challenges of legacy systems and past mergers and acquisitions. The bank is creating an Enterprise Data Hub using Deloitte's industry experience and the Databricks Data Intelligence Platform to drive growth, efficiency and Large Financial Institution readiness needs. This session will showcase how the new data hub will be a one-stop-shop for LOB and enterprise needs, while unlocking the advanced analytics and GenAI possibilities. Discover how this initiative is going to empower the ambitions of a regional bank to realize their “big bank muscle, small bank hustle.”

Self-Service Assortment and Space Analytics at Walmart Scale

Assortment and space analytics optimizes product selection and shelf allocation to boost sales, improve inventory management and enhance customer experience. However, challenges like evolving demand, data accuracy and operational alignment hinder success. Older approaches struggled due to siloed tools, slow performance and poor governance. Databricks unified platform resolved these issues, enabling seamless data integration, high-performance analytics and governed sharing. The innovative AI/BI Genie interface empowered self-service analytics, driving non-technical user adoption. This solution helped Walmart cut time to value by 90% and saved $5.6M annually in FTE hours leading to increased productivity. Looking ahead, AI agents will let store managers and merchants execute decisions via conversational interfaces, streamlining operations and enhancing accessibility. This transformation positions retailers to thrive in a competitive, customer-centric market.

Sponsored by: AWS | Ripple: Well-Architected Data & AI Platforms - AWS and Databricks in Harmony

Join us as we explore the well-architected framework for modern data lakehouse architecture, where AWS's comprehensive data, AI, and infrastructure capabilities align with Databricks' unified platform approach. Building upon core principles of Operational Excellence, Security, Reliability, Performance, and Cost Optimization, we'll demonstrate how Data and AI Governance alongside Interoperability and Usability enable organizations to build robust, scalable platforms. Learn how Ripple modernized its data infrastructure by migrating from a legacy Hadoop system to a scalable, real-time analytics platform using Databricks on AWS. This session covers the challenges of high operational costs, latency, and peak-time bottlenecks—and how Ripple achieved 80% cost savings and 55% performance improvements with Photon, Graviton, Delta Lake, and Structured Streaming.

Discover how SAP Business Data Cloud and Databricks can transform your business by unifying SAP and non-SAP data for advanced analytics and AI. In this session, we’ll highlight Optimizing Cash Flow with AI with integrated diverse data sources and AI algorithms that enable accurate cash flow forecasting to help businesses identify trends, prevent bottlenecks, and improve liquidity. You’ll also learn about the importance of high-quality, well-governed data as the foundation for reliable AI models and actionable reporting. Key Takeaways: • How to integrate and leverage SAP and external data in Databricks • Using AI for predictive analytics and better decision-making • Building a trusted data foundation to drive business performance Leave this session with actionable strategies to optimize your data, enhance efficiency, and unlock new growth opportunities.

Sponsored by: Firebolt | 10ms Queries on Iceberg: Turbocharging Your Lakehouse for Interactive Experiences with Firebolt

Open table formats such as Apache Iceberg or Delta Lake have transformed the data landscape. For the first time, we’re seeing a real open storage ecosystem emerging across database vendors. So far, open table formats have found little adoption powering low-latency, high-concurrency analytics use-cases. Data stored in open formats often gets transformed and ingested into closed systems for serving. The reason for this is simple: most modern query engines don’t properly support these workloads. In this talk we take a look under the hood of Firebolt and dive into the work we’re doing to support low-latency and high concurrency on Iceberg: caching of data and metadata, adaptive object storage reads, subresult reuse, and multi-dimensional scaling. After this session, you will know how you can build low-latency data applications on top of Iceberg. You’ll also have a deep understanding of what it takes for modern high-performance query engines to do well on these workloads.

Sponsored by: Informatica | Power Analytics and AI on Databricks With Master (Golden) Record Data

Supercharge advanced analytics and AI insights on Databricks with accurate and consistent master data. This session explores how Informatica’s Master Data Management (MDM) integrates with Databricks to provide high-quality, integrated golden record data like customer, supplier, product 360 or reference data to support downstream analytics, Generative AI and Agentic AI. Enterprises can accelerate and de-risk the process of creating a golden record via a no-code/low-code interface, allowing data teams to quickly integrate siloed data and create a complete and consistent record that improves decision-making speed and accuracy.

Using Clean Rooms for Privacy-Centric Data Collaboration

Databricks Clean Rooms make privacy-safe collaboration possible for data, analytics, and AI — across clouds and platforms. Built on Delta Sharing, Clean Rooms enable organizations to securely share and analyze data together in a governed, isolated environment — without ever exposing raw data. In this session, you’ll learn how to get started with Databricks Clean Rooms and unlock advanced use cases including: Cross-platform collaboration and joint analytics Training machine learning and AI models Enforcing custom privacy policies Analyzing unstructured data Incorporating proprietary libraries in Python and SQL notebooks Auditing clean room activity for compliance Whether you're a data scientist, engineer or data leader, this session will equip you to drive high-value collaboration while maintaining full control over data privacy and governance.

What Does It Take to Optimize Every Drop Of Milk Across a 150-year-old Global Dairy Cooperative?

In this session, Joëlle van der Bijl, Chief Data & Analytics Officer at FrieslandCampina, shares the bold journey of replacing legacy data systems with a single, unified data, analytics, and AI platform built on Databricks. Rather than evolving gradually, the company took a leap: transforming its entire data foundation in one go. Today, this data-centric vision is delivering high-value impact: from optimizing milk demand and supply to enabling commercial AI prediction models and scaling responsible AI across the business. Learn how FrieslandCampina is using Databricks to blend tradition with innovation, and unlock a smarter, more sustainable future for dairy.

Your Wish is AI Command — Get to Grips With Databricks Genie

Picture the scene — you're exploring a deep, dark cave looking for insights to unearth when, in a burst of smoke, Genie appears and offers you not three but unlimited data wishes. This isn't a folk tale, it's the growing wave of Generative BI that is going to be a part of analytics platforms. Databricks Genie is a tool powered by a SQL-writing LLM that redefines how we interact with data. We'll look at the basics of creating a new Genie room, scoping its data tables and asking questions. We'll help it out with some complex pre-defined questions and ensure it has the best chance of success. We'll give the tool a personality, set some behavioural guidelines and prepare some hidden easter eggs for our users to discover. Generative BI is going to be a fundamental part of the analytics toolset used across businesses. If you're using Databricks, you should be aware of Genie, if you're not, you should be planning your Generative BI Roadmap, and this session will answer your wishes.

Supported by Our Partners • Sonar —  Code quality and code security for ALL code.  •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. • Augment Code — AI coding assistant that pro engineering teams love. — Kent Beck is one of the most influential figures in modern software development. Creator of Extreme Programming (XP), co-author of The Agile Manifesto, and a pioneer of Test-Driven Development (TDD), he’s shaped how teams write, test, and think about code. Now, with over five decades of programming experience, Kent is still pushing boundaries—this time with AI coding tools. In this episode of Pragmatic Engineer, I sit down with him to talk about what’s changed, what hasn’t, and why he’s more excited than ever to code. In our conversation, we cover: • Why Kent calls AI tools an “unpredictable genie”—and how he’s using them • Why Kent no longer has an emotional attachment to any specific programming language • The backstory of The Agile Manifesto—and why Kent resisted the word “agile” • An overview of XP (Extreme Programming) and how Grady Booch played a role in the name  • Tape-to-tape experiments in Kent’s childhood that laid the groundwork for TDD • Kent’s time at Facebook and how he adapted to its culture and use of feature flags • And much more! — Timestamps (00:00) Intro (02:27) What Kent has been up to since writing Tidy First (06:05) Why AI tools are making coding more fun for Kent and why he compares it to a genie (13:41) Why Kent says languages don’t matter anymore (16:56) Kent’s current project building a small talk server (17:51) How Kent got involved with The Agile Manifesto (23:46) Gergely’s time at JP Morgan, and why Kent didn’t like the word ‘agile’ (26:25) An overview of “extreme programming” (XP)  (35:41) Kent’s childhood tape-to-tape experiments that inspired TDD (42:11) Kent’s response to Ousterhout’s criticism of TDD (50:05) Why Kent still uses TDD with his AI stack  (54:26) How Facebook operated in 2011 (1:04:10) Facebook in 2011 vs. 2017 (1:12:24) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: • — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Summary In this episode of the Data Engineering Podcast Alex Albu, tech lead for AI initiatives at Starburst, talks about integrating AI workloads with the lakehouse architecture. From his software engineering roots to leading data engineering efforts, Alex shares insights on enhancing Starburst's platform to support AI applications, including an AI agent for data exploration and using AI for metadata enrichment and workload optimization. He discusses the challenges of integrating AI with data systems, innovations like SQL functions for AI tasks and vector databases, and the limitations of traditional architectures in handling AI workloads. Alex also shares his vision for the future of Starburst, including support for new data formats and AI-driven data exploration tools.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th. This episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial.Your host is Tobias Macey and today I'm interviewing Alex Albu about how Starburst is extending the lakehouse to support AI workloadsInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the interaction points of AI with the types of data workflows that you are supporting with Starburst?What are some of the limitations of warehouse and lakehouse systems when it comes to supporting AI systems?What are the points of friction for engineers who are trying to employ LLMs in the work of maintaining a lakehouse environment?Methods such as tool use (exemplified by MCP) are a means of bolting on AI models to systems like Trino. What are some of the ways that is insufficient or cumbersome?Can you describe the technical implementation of the AI-oriented features that you have incorporated into the Starburst platform?What are the foundational architectural modifications that you had to make to enable those capabilities?For the vector storage and indexing, what modifications did you have to make to iceberg?What was your reasoning for not using a format like Lance?For teams who are using Starburst and your new AI features, what are some examples of the workflows that they can expect?What new capabilities are enabled by virtue of embedding AI features into the interface to the lakehouse?What are the most interesting, innovative, or unexpected ways that you have seen Starburst AI features used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI features for Starburst?When is Starburst/lakehouse the wrong choice for a given AI use case?What do you have planned for the future of AI on Starburst?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links StarburstPodcast EpisodeAWS AthenaMCP == Model Context ProtocolLLM Tool UseVector EmbeddingsRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeStarburst Data ProductsLanceLanceDBParquetORCpgvectorStarburst IcehouseThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

A Practitioner’s Guide to Databricks Serverless

This session is repeated. Databricks Serverless revolutionizes data engineering and analytics by eliminating the complexities of infrastructure management. This talk will provide an overview of this powerful serverless compute option, highlighting how it enables practitioners to focus solely on building robust data pipelines. We'll explore the core benefits, including automatic scaling, cost optimization and seamless integration with the Databricks ecosystem. Learn how serverless workflows simplify the orchestration of various data tasks, from ingestion to dashboards, ultimately accelerating time-to-insight and boosting productivity. This session is ideal for data engineers, data scientists and analysts looking to leverage the agility and efficiency of serverless computing in their data workflows.

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance

We will present a framework for FinCrime detection leveraging Databricks lakehouse architecture specifically how institutions can achieve both data flexibility & ACID transaction guarantees essential for FinCrime monitoring. The framework incorporates advanced ML models for anomaly detection, pattern recognition, and predictive analytics, while maintaining clear data lineage & audit trails required by regulatory bodies. We will also discuss some specific improvements in reduction of false positives, improvement in detection speed, and faster regulatory reporting, delve deep into how the architecture addresses specific FATF recommendations, Basel III risk management requirements, and BSA compliance obligations, particularly in transaction monitoring and SAR. The ability to handle structured and unstructured data while maintaining data quality and governance makes it particularly valuable for large financial institutions dealing with complex, multi-jurisdictional compliance requirements.

Future of Anti-Cheat With Riot Games

As online gaming evolves, so do cheating methods that exploit client-server vulnerabilities. Traditional anti-cheat, such as kernel-level drivers and runtime detections, has long been the primary defense. However, advanced cheats like Direct Memory Access (DMA) exploits and AI-powered Computer Vision (CV) hacks increasingly render client-side detection ineffective. This presentation examines the escalating arms race between cheat creators and developers, highlighting client-side limitations. With CV cheats mimicking human behavior, anti-cheat must shift toward server-side, data-driven detection. By leveraging AI, machine learning, and behavioral analytics to analyze player patterns, input anomalies, and decision inconsistencies, future solutions can move beyond static detection to adaptive security models, ensuring fair play at scale. The session will also include real-life examples from Riot Games’ anti-cheat efforts, specifically insights and case studies from the development and operation of Riot Vanguard, to illustrate how these strategies are applied in practice.

Maximize Retail Data Insights in Genie with DeltaSharing via Crisp’s Collaborative Commerce Platform

Crisp streamlines a brand’s data ingestion across 60+ retail sources, to build a foundation of sales and inventory intelligence on Databricks. Data is normalized and analysis-ready, and integrates seamlessly with AI tools - such as Databricks’ Genie and Blueprints. This session will provide an overview of the Crisp retail data platform and how our semantic layer, normalized and harmonized data sets can help drive powerful insights for supply chain, BI/Analytics, and data science teams.