talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

Large Language Models (LLMs) are transformative, but static knowledge and hallucinations limit their direct enterprise use. Retrieval-Augmented Generation (RAG) is the standard solution, yet moving from prototype to production is fraught with challenges in data quality, scalability, and evaluation.

This talk argues the future of intelligent retrieval lies not in better models, but in a unified, data-first platform. We'll demonstrate how the Databricks Data Intelligence Platform, built on a Lakehouse architecture with integrated tools like Mosaic AI Vector Search, provides the foundation for production-grade RAG.

Looking ahead, we'll explore the evolution beyond standard RAG to advanced architectures like GraphRAG, which enable deeper reasoning within Compound AI Systems. Finally, we'll show how the end-to-end Mosaic AI Agent Framework provides the tools to build, govern, and evaluate the intelligent agents of the future, capable of reasoning across the entire enterprise.

Face To Face
by Gavi Regunath (Advancing Analytics) , Simon Whiteley (Advancing Analytics) , Holly Smith (Databricks)

We’re excited to be back at Big Data LDN this year—huge thanks to the organisers for hosting Databricks London once more!

Join us for an evening of insights, networking, and community with the Databricks Team and Advancing Analytics!

🎤 Agenda:

6:00 PM – 6:10 PM | Kickoff & Warm Welcome

Grab a drink, say hi, and get the lowdown on what’s coming up. We’ll set the scene for an evening of learning and laughs.

6:10 PM – 6:50 PM | The Metadata Marathon: How three projects are racing forward – Holly Smith (Staff Developer Advocate, Databricks)

With the enormous amount of discussion about open storage formats between nerds and even not-nerds, it can be hard to keep track of who’s doing what and how this actually makes any impact on day to day data projects.

Holly will take a closer look at the three big projects in this space; Delta, Hudi and Iceberg. They’re all trying to solve for similar data problems and have tackled the various challenges in different ways. Her talk will start with the very basics of how we got here, what the history is before diving deep into the underlying tech, their roadmaps, and their impacts on the data landscape as a whole.

6:50 PM – 7:10 PM | What’s New in Databricks & Databricks AI – Simon Whiteley & Gavi Regunath

Hot off the press! Simon and Gavi will walk you through the latest and greatest from Databricks, including shiny new AI features and platform updates you’ll want to try ASAP.

7:10 PM onwards | Q&A Panel + Networking

Your chance to ask the experts anything—then stick around for drinks, snacks, and some good old-fashioned data geekery.

Face To Face
by Rajlakshmi Purkayastha (Esure) , Naz Ghader-Pour (NTT Data) , Paul Davies (Domestic and General) , Karishma Jaitly (Domestic and General) , Robin Sutara (Databricks)

Forecasting is no longer just about historical trends and spreadsheets. AI is redefining how organisations anticipate demand, manage risk and make faster, smarter decisions.

In this expert-led panel of Women in Data® senior leaders from esure, Domestic & General and Databricks, moderated by a leading voice from NTT DATA, we will explore how AI-enabled forecasting is transforming planning across industries. They will take a candid look at the current landscape, how to realign goals and priorities and how to forge a business that is dynamic, data-rich and future-ready.

Powered by: Women in Data®

AI agents need seamless access to enterprise data to deliver real value. DataHub's new MCP server creates the universal bridge that connects any AI agent to your entire data infrastructure through a single interface.

This session demonstrates how organizations are breaking down data silos by enabling AI agents to intelligently discover and interact with data across Snowflake, Databricks, BigQuery, and other platforms. See live examples of AI-powered data discovery, real-time incident response, and automated impact analysis.

Learn how forward-thinking data leaders are positioning their organizations at the center of the AI revolution by implementing universal data access strategies that scale across their entire ecosystem.

The entertainment industry is sitting on a huge natural resource: decades of creativity and craftsmanship from talented professionals. Koobrik is an advanced language model designed by, and for, the creative industries. As a Warner Brothers’ accelerator company, the model is already utilised by HBO, A24, DC Comics and many more, to harness ethical artificial intelligence.

Join Koobriks’ CEO and Founder, Orlando Wood, as he shares insights into:

- Building an ethical AI model for the entertainment industry

- The unique challenges of creative data as an asset class

- The AWS and Databricks tech stack powering Koobrik

- Real-world applications, from comic books to screenplays

Over the last four years, ASDA has been through an incredible period of transformation. From data strategy, governance, platforms, culture, skills, and everything in between Alex Meakin (Senior Director of Data Delivery & Strategy) and his team have embraced it all and turned initials aspirations into real measurable impact. 

Brought to you by Women in Data®, Alex will be joined by moderator Robin Sutara and key partners from PwC and Databricks as he shares his honest reflections, key milestones, and practical lessons on building data maturity, driving adoption, and sustaining momentum.  

Whether you're navigating the complexities of your own transformation, still building on those early wins, or reigniting interest, this session offers practical insights, candid lessons, and fresh perspective from those who have been through it, end to end. 

Powered by: Women in Data®

In today’s landscape, data truly is the new currency. But unlocking its full value requires overcoming silos, ensuring trust and quality, and then applying the right AI and analytics capabilities to create real business impact. In this session, we’ll explore how Oakbrook Finance is tackling these challenges head-on — and the role that Fivetran and Databricks play in enabling that journey.

Oakbrook Finance is a UK-based consumer lender transforming how people access credit. By combining advanced data science with a customer-first approach, Oakbrook delivers fair, transparent, and flexible credit solutions — proving that lending can be both innovative and human-centred.

So you’ve heard of Databricks, but still not sure what the fuss is all about. Yes you’ve heard it’s Spark, but then there’s this Delta thing that’s both a data lake and a data warehouse (isn’t that what Iceberg is?) And then there's Unity Catalog, that's not just a catalog, it also does access management but even surprising things like optimise your data and programmatic access to lineage and billing? But then serverless came out and now you don’t even have to learn Spark? And of course there’s a bunch of AI stuff to use or create yourself. So why not spend 30 mins learning the details of what Databricks does, and how it can turn you into a rockstar Data Engineer.

Large Language Models (LLMs) are transformative, but static knowledge and hallucinations limit their direct enterprise use. Retrieval-Augmented Generation (RAG) is the standard solution, yet moving from prototype to production is fraught with challenges in data quality, scalability, and evaluation.

This talk argues the future of intelligent retrieval lies not in better models, but in a unified, data-first platform. We'll demonstrate how the Databricks Data Intelligence Platform, built on a Lakehouse architecture with integrated tools like Mosaic AI Vector Search, provides the foundation for production-grade RAG.

Looking ahead, we'll explore the evolution beyond standard RAG to advanced architectures like GraphRAG, which enable deeper reasoning within Compound AI Systems. Finally, we'll show how the end-to-end Mosaic AI Agent Framework provides the tools to build, govern, and evaluate the intelligent agents of the future, capable of reasoning across the entire enterprise.

In today’s fragmented data landscape, organisations are under pressure to unify their data estates while maintaining agility, governance, and performance. This session explores how Microsoft Fabric, OneLake, and Azure Databricks come together to deliver a powerful, open, and integrated platform for centralised data orchestration—without compromise. From ingestion to insight, this session will showcase how “no excuses” becomes a reality when your data is truly unified, with a real-time demonstration highlighting the platform’s capabilities in action.

In this session, we will explore how organisations can leverage ArcGIS to analyse spatial data within their data platforms, such as Databricks and Microsoft Fabric. We will discuss the importance of spatial data and its impact on decision-making processes. The session will cover various aspects, including the ingestion of streaming data using ArcGIS Velocity, the processing and management of large volumes of spatial data with ArcGIS GeoAnalytics for Microsoft Fabric, and the use of ArcGIS for visualisation and advanced analytics with GeoAI. Join us to discover how these tools can provide actionable insights and enhance operational efficiency.

In today’s fast-paced financial landscape, data-driven decision-making is no longer optional, it’s essential. This session explores how Databricks empowers finance teams to accelerate intelligence. We’ll dive into how unified data platforms and lakehouse architecture streamline data ingestion and processing, enabling faster insights and smarter automation.

As Europe’s top B2B used-goods auction platform, TBAuctions is entering the AI era. Roberto Bonilla, Lead Data Engineer, shows how Databricks, Azure, Terraform, MLflow and LangGraph unify to simplify complex AI workflows. Bas Lucieer, Head of Data, details the strategy and change management that bring a sales-driven organization along, ensuring adoption and lasting value. Together they show tech + strategy = marketplace edge.

Veel organisaties investeren fors in moderne dataplatformen, maar zien dat adoptie achterblijft. In deze sessie leer je hoe je vanuit ons 7-stappen model voor datagedreven werken (stap 2 - ‘Maak een plan’ en stap 6 - ‘Deel de kennis’) zorgt voor platformkeuzes die gedragen worden door de business. Inclusief tips om Azure, Databricks of Fabric niet alleen technisch, maar ook organisatorisch te laten landen.

Hoe maak je 100 miljoen sensormetingen per dag bruikbaar voor engineers en analisten? In deze sessie laten we zien hoe Heerema met een klein datateam een schaalbaar self-service data platform bouwde met Databricks en dbt, waarmee ruwe metingen worden omgezet in betrouwbare datamodellen voor verschillende analyses en teams.

Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that data modeling is optional or secondary, emphasizing its crucial role in ensuring alignment between business requirements and data structures. The conversation covers challenges in complex environments, the impact of technical decisions on data strategy, and the evolving role of AI in data management. Serge stresses the need for business stakeholders' involvement in data initiatives and a systematic approach to data modeling, warning against relying solely on technical expertise without considering business alignment.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Enterprises today face an enormous challenge: they’re investing billions into Snowflake and Databricks, but without strong foundations, those investments risk becoming fragmented, expensive, and hard to govern. And that’s especially evident in large, complex enterprise data environments. That’s why companies like DirecTV and Pfizer rely on SqlDBM. Data modeling may be one of the most traditional practices in IT, but it remains the backbone of enterprise data strategy. In today’s cloud era, that backbone needs a modern approach built natively for the cloud, with direct connections to the very platforms driving your business forward. Without strong modeling, data management becomes chaotic, analytics lose trust, and AI initiatives fail to scale. SqlDBM ensures enterprises don’t just move to the cloud—they maximize their ROI by creating governed, scalable, and business-aligned data environments. If global enterprises are using SqlDBM to tackle the biggest challenges in data management, analytics, and AI, isn’t it worth exploring what it can do for yours? Visit dataengineeringpodcast.com/sqldbm to learn more.Your host is Tobias Macey and today I'm interviewing Serge Gershkovich about how and why data modeling is a sociotechnical endeavorInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the activities that you think of when someone says the term "data modeling"?What are the main groupings of incomplete or inaccurate definitions that you typically encounter in conversation on the topic?How do those conceptions of the problem lead to challenges and bottlenecks in execution?Data modeling is often associated with data warehouse design, but it also extends to source systems and unstructured/semi-structured assets. How does the inclusion of other data localities help in the overall success of a data/domain modeling effort?Another aspect of data modeling that often consumes a substantial amount of debate is which pattern to adhere to (star/snowflake, data vault, one big table, anchor modeling, etc.). What are some of the ways that you have found effective to remove that as a stumbling block when first developing an organizational domain representation?While the overall purpose of data modeling is to provide a digital representation of the business processes, there are inevitable technical decisions to be made. What are the most significant ways that the underlying technical systems can help or hinder the goals of building a digital twin of the business?What impact (positive and negative) are you seeing from the introduction of LLMs into the workflow of data modeling?How does tool use (e.g. MCP connection to warehouse/lakehouse) help when developing the transformation logic for achieving a given domain representation? What are the most interesting, innovative, or unexpected ways that you have seen organizations address the data modeling lifecycle?What are the most interesting, unexpected, or challenging lessons that you have learned while working with organizations implementing a data modeling effort?What are the overall trends in the ecosystem that you are monitoring related to data modeling practices?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links sqlDBMSAPJoe ReisERD == Entity Relation DiagramMaster Data ManagementdbtData ContractsData Modeling With Snowflake book by Serge (affiliate link)Type 2 DimensionData VaultStar SchemaAnchor ModelingRalph KimballBill InmonSixth Normal FormMCP == Model Context ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

This book is your guide to the modern market of data analytics platforms and the benefits of using Snowflake, the data warehouse built for the cloud. As organizations increasingly rely on modern cloud data platforms, the core of any analytics framework—the data warehouse—is more important than ever. This updated 2nd edition ensures you are ready to make the most of the industry’s leading data warehouse. This book will onboard you to Snowflake and present best practices for deploying and using the Snowflake data warehouse. The book also covers modern analytics architecture, integration with leading analytics software such as Matillion ETL, Tableau, and Databricks, and migration scenarios for on-premises legacy data warehouses. This new edition includes expanded coverage of SnowPark for developing complex data applications, an introduction to managing large datasets with Apache Iceberg tables, and instructions for creating interactive data applications using Streamlit, ensuring readers are equipped with the latest advancements in Snowflake's capabilities. What You Will Learn Master key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake Integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Manage large datasets with Apache Iceberg Tables Implement continuous data loading with Snowpipe and Dynamic Tables Who This Book Is For Data professionals, business analysts, IT administrators, and existing or potential Snowflake users