talk-data.com talk-data.com

Topic

AI/ML

Artificial Intelligence/Machine Learning

data_science algorithms predictive_analytics

9014

tagged

Activity Trend

1532 peak/qtr
2020-Q1 2026-Q1

Activities

9014 activities · Newest first

Large-Scale Video Intelligence

The explosion of video data demands search beyond simple metadata. How do we find specific visual moments, actions, or faces within petabytes of footage? This talk dives into architecting a robust, scalable multi-modal video search system. We will explore an architecture combining efficient batch preprocessing for feature extraction (including person detection, face/CLIP-style embeddings) with optimized vector database indexing. Attendees will learn practical strategies for managing massive datasets, optimizing ML inference (e.g., lightweight models, specialized runtimes), and bridging pre-computed indexes with real-time analysis for deeper insights. This session is for data scientists, ML engineers, and architects looking to build sophisticated video understanding capabilities.

Audience: Data Scientists, Machine Learning Engineers, Data Engineers, System Architects.

Takeaway: Attendees will learn architectural patterns and practical techniques for building scalable multi-modal video search systems, including feature extraction, vector database utilization, and ML pipeline optimization.

Background Knowledge: Familiarity with Python, core machine learning concepts (e.g., embeddings, classification), and general data processing pipelines is beneficial. Experience with video processing or computer vision is a plus but not strictly required.

Send us a text In this episode, we're joined by Sam Debruyn and Dorian Van den Heede who reflect on their talks at SQL Bits 2025 and dive into the technical content they presented. Sam walks through how dbt integrates with Microsoft Fabric, explaining how it improves lakehouse and warehouse workflows by adding modularity, testing, and documentation to SQL development. He also touches on Fusion’s SQL optimization features and how it compares to tools like SQLMesh. Dorian shares his MLOps demo, which simulates beating football bookmakers using historical data,nshowing how to build a full pipeline with Azure ML, from feature engineering to model deployment. They discuss the role of Python modeling in dbt, orchestration with Azure ML, and the practical challenges of implementing MLOps in real-world scenarios. Toward the end, they explore how AI tools like Copilot are changing the way engineers learn and debug code, raising questions about explainability, skill development, and the future of junior roles in tech. It’s rich conversation covering dbt, MLOps, Python, Azure ML, and the evolving role of AI in engineering.

Understanding ETL (Updated Edition)

"Extract, transform, load" (ETL) is at the center of every application of data, from business intelligence to AI. Constant shifts in the data landscape—including the implementations of lakehouse architectures and the importance of high-scale real-time data—mean that today's data practitioners must approach ETL a bit differently. This updated technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You'll come away equipped to make informed decisions when implementing ETL and confident about choosing the technology stack that will help you succeed. Discover what ETL looks like in the new world of data lakehouses Learn how to deal with real-time data Explore low-code ETL tools Understand how to best achieve scale, performance, and observability

Face To Face
by Gavi Regunath (Advancing Analytics) , Simon Whiteley (Advancing Analytics) , Holly Smith (Databricks)

We’re excited to be back at Big Data LDN this year—huge thanks to the organisers for hosting Databricks London once more!

Join us for an evening of insights, networking, and community with the Databricks Team and Advancing Analytics!

🎤 Agenda:

6:00 PM – 6:10 PM | Kickoff & Warm Welcome

Grab a drink, say hi, and get the lowdown on what’s coming up. We’ll set the scene for an evening of learning and laughs.

6:10 PM – 6:50 PM | The Metadata Marathon: How three projects are racing forward – Holly Smith (Staff Developer Advocate, Databricks)

With the enormous amount of discussion about open storage formats between nerds and even not-nerds, it can be hard to keep track of who’s doing what and how this actually makes any impact on day to day data projects.

Holly will take a closer look at the three big projects in this space; Delta, Hudi and Iceberg. They’re all trying to solve for similar data problems and have tackled the various challenges in different ways. Her talk will start with the very basics of how we got here, what the history is before diving deep into the underlying tech, their roadmaps, and their impacts on the data landscape as a whole.

6:50 PM – 7:10 PM | What’s New in Databricks & Databricks AI – Simon Whiteley & Gavi Regunath

Hot off the press! Simon and Gavi will walk you through the latest and greatest from Databricks, including shiny new AI features and platform updates you’ll want to try ASAP.

7:10 PM onwards | Q&A Panel + Networking

Your chance to ask the experts anything—then stick around for drinks, snacks, and some good old-fashioned data geekery.

Face To Face
by Hugo Lu , Jon Cooke (Dataception) , Parmar , Chris Freestone , David Richardson , Paul Rankin (Paul Rankin IT) , Jesse Anderson (Big Data Institute) , Taylor McGrath (Boomi) , Karl Ivo 🎧 Sokolov , Nick White , Chris Tabb (LEIT DATA) , Kelsey Hammock , Jean-Georges Perrin (Actian) , Mehdi Ouazza (MotherDuck) , Adi Polak (Treeverse) , Eevamaija Virtanen

https://www.bigdataldn.com/en-gb/conference/session-details.4500.251781.the-high-performance-data-and-ai-debate.html

Join us for an unmissable evening of insight, discussion, and lively debate at The High Performance Data and AI Debate, hosted by Chris Tabb — a unique Big Data London special running from 6:00–8:00 PM. This fast-paced, interactive event brings together some of the brightest minds in data and AI to tackle the most pressing questions shaping the future of teams, architecture, and products in an AI-first world.

The evening kicks off at 6:00 PM with a welcome and free drinks. Then, across three rapid-fire 20-minute debates, our expert panels will explore:

AI & Data – Teams (Chair: Eevamaija Virtanen)

Mehdi Ouazza, Paul Rankin, Jesse Anderson, Hugo Lu

AI & Data – Architecture (Chair: Adi Polak)

Chris Freestone, David Richardson, Nick White, Karl Ivo Sokolov

AI & Data – Products (Chair: Jai Parmar)

Kelsey Hammock, Jean-Georges (jgp) Perrin, Taylor McGrath, Jon Cooke

Refuel with free pizza at 6:50 PM, then stay for the Town Hall Debate, where all speakers return to the stage for an open-floor Q&A — your chance to challenge their ideas, share perspectives, and shape the conversation.

Expect fresh perspectives, healthy disagreement, and practical takeaways you can bring back to your organisation. Whether you’re leading a data team, designing cutting-edge architectures, or building AI-powered products, this is your space to engage with the people shaping what’s next.

This talk presents a technical case study of applying agentic AI systems to automate community operations at PyCon DE & PyData, treated as an open-source testbed. The key lesson is simple: AI only works when put on a leash. Reliable results required good architecture, a clear plan, and structured data models — from YAML and Pydantic schemas to reproducible pipelines with GitHub Actions. With that foundation, LLM agents supported logistics, FAQs, video processing, and scheduling; without it, they failed. By contrasting successes and failure modes across different coding agents, the talk demonstrates that robust design, validation, and controlled context are prerequisites for making agentic AI usable in real-world workflows.

In this session, we’ll take a closer look at the security risks that come with integrating LLMs into applications. LLMs can be powerful allies in cybersecurity — helping with detection, testing, and protection — but they can just as easily be exploited for attacks. We’ll explore key threats such as prompt injection, jailbreaking, and agent-specific vulnerabilities, and discuss why they are currently seen as the most pressing risks. Finally, we’ll look at defense strategies, from prompt-level safeguards to system-wide controls, and show how best practices can make a real difference in securing AI systems.

The path to AI enablement runs through governance. High-quality data, model transparency, and ethical oversight aren’t barriers — they are accelerators. In this talk, we’ll connect the dots between Data Governance and AI Governance, show how unified governance, helps embed new requirements to existing processes, while fostering innovation. We will discuss actionable steps to build AI-ready organisations that innovate with proper guardrails.

As AI adoption accelerates across industries, many organisations are realising that building a model is only the beginning. Real-world deployment of AI demands robust infrastructure, clean and connected data, and secure, scalable MLOps pipelines. In this panel, experts from across the AI ecosystem share lessons from the frontlines of operationalising AI at scale.

We’ll dig into the tough questions:

• What are the biggest blockers to AI adoption in large enterprises — and how can we overcome them?

• Why does bad data still derail even the most advanced models, and how can we fix the data quality gap?

• Where does synthetic data fit into real-world AI pipelines — and how do we define “real” data?

• Is Agentic AI the next evolution, or just noise — and how should MLOps prepare?

• What does a modern, secure AI stack look like when using external partners and APIs?

Expect sharp perspectives on data integration, model lifecycle management, and the cyber-physical infrastructure needed to make AI more than just a POC.

Face To Face
by Jeremiah Stone (snapLogic) , Dr Mary Osbourne (SAS) , Mike Ferguson (Big Data LDN) , David Kalmuk (IBM Core Software) , Chris Aberger (Alation) , Vivienne Wei (Salesforce)

In this, the 10th year of Big Data LDN, in its flagship Great Dat Debate keynote panel, conference chair and leading industry analyst Mike Ferguson welcomes executives from leading software vendors to discuss key topics in data management and analytics. Panellists will debate the challenges and success factors in building an agentic enterprise, the importance of unified data and AI governance, the implications of key industry trends in data management, how best to deal with real-world customer challenges, how to build a modern data and analytics (D&A) architecture, and issues on-the-horizon that companies should be planning for today.

Attendees will learn best practices for data and analytics implementation in a modern data and AI -driven enterprise from seasoned executives and an experienced industry analyst in a packed, unscripted, candid discussion.

Brought to You By: •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. Statsig built a complete set of data tools that allow engineering teams to measure the impact of their work. This toolkit is SO valuable to so many teams, that OpenAI - who was a huge user of Statsig - decided to acquire the company, the news announced last week. Talk about validation! Check out Statsig. •⁠ Linear – The system for modern product development. Here’s an interesting story: OpenAI switched to Linear as a way to establish a shared vocabulary between teams. Every project now follows the same lifecycle, uses the same labels, and moves through the same states. Try Linear for yourself. — What does it take to do well at a hyper-growth company? In this episode of The Pragmatic Engineer, I sit down with Charles-Axel Dein, one of the first engineers at Uber, who later hired me there. Since then, he’s gone on to work at CloudKitchens. He’s also been maintaining the popular Professional programming reading list GitHub repo for 15 years, where he collects articles that made him a better programmer.  In our conversation, we dig into what it’s really like to work inside companies that grow rapidly in scale and headcount. Charles shares what he’s learned about personal productivity, project management, incidents, interviewing, plus how to build flexible skills that hold up in fast-moving environments.  Jump to interesting parts: • 10:41 – the reality of working inside a hyperscale company • 41:10 – the traits of high-performing engineers • 1:03:31 – Charles’ advice for getting hired in today’s job market We also discuss: • How to spot the signs of hypergrowth (and when it’s slowing down) • What sets high-performing engineers apart beyond shipping • Charles’s personal productivity tips, favorite reads, and how he uses reading to uplevel his skills • Strategic tips for building your resume and interviewing  • How imposter syndrome is normal, and how leaning into it helps you grow • And much more! If you’re at a fast-growing company, considering joining one, or looking to land your next role, you won’t want to miss this practical advice on hiring, interviewing, productivity, leadership, and career growth. — Timestamps (00:00) Intro (04:04) Early days at Uber as engineer #20 (08:12) CloudKitchens’ similarities with Uber (10:41) The reality of working at a hyperscale company (19:05) Tenancies and how Uber deployed new features (22:14) How CloudKitchens handles incidents (26:57) Hiring during fast-growth (34:09) Avoiding burnout (38:55) The popular Professional programming reading list repo (41:10) The traits of high-performing engineers  (53:22) Project management tactics (1:03:31) How to get hired as a software engineer (1:12:26) How AI is changing hiring (1:19:26) Unexpected ways to thrive in fast-paced environments (1:20:45) Dealing with imposter syndrome  (1:22:48) Book recommendations  (1:27:26) The problem with survival bias  (1:32:44) AI’s impact on software development  (1:42:28) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: •⁠ Software engineers leading projects •⁠ The Platform and Program split at Uber •⁠ Inside Uber’s move to the Cloud •⁠ How Uber built its observability platform •⁠ From Software Engineer to AI Engineer – with Janvi Kalra — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

The rapid evolution of AI, fueled by powerful Large Language Models (LLMs) and autonomous agents, is reshaping how we build, deploy, and manage AI systems. This presentation explores the critical intersection of MLOps and AI architecture, highlighting the paradigm shifts required to integrate LLMs and agents into production. We will address key architectural challenges, including scalability, observability, and security, while examining emerging MLOps practices such as robust data pipelines, model monitoring, and continuous optimization. Attendees will gain practical insights and actionable strategies to navigate the complexities of modern AI deployments, unlocking the full potential of LLMs and agents while ensuring operational excellence.

As AI evolves with powerful Large Language Models (LLMs) and autonomous agents, deploying and managing these systems requires new approaches. This presentation explores the crucial intersection of MLOps and AI architecture, highlighting the shift toward scalable, observable, and secure AI deployments. We’ll examine key architectural considerations for integrating LLMs and agents into production, alongside evolving MLOps practices such as robust data pipelines, model monitoring, and continuous optimization.

Face To Face
by Sam Khalil (ekona.ai) , Kshitij Kumar (Data-Hat AI) , David Reed (DataIQ) , Jane Smith (ThoughtSpot) , Dr. Joe Perez (NC Dept of Health & Human Services) , Anusha Adige (EY)
LLM

As AI agents become embedded in everyday workflows — from healthcare diagnostics to financial services chatbots — the line between human and machine continues to blur. This panel brings together industry leaders to tackle the tough questions:

• How do we trust AI agents in high-risk environments?

• What are the new rules of ownership and accountability when autonomous systems act on data?

• Is AI replacing or enhancing the human workforce — and how do we keep the balance right?

We'll unpack how AI agents are evolving across sectors, debate whether the current LLM paradigm is enough, and explore the new guardrails needed to futureproof agentic AI — without losing control.