talk-data.com talk-data.com

Topic

LLM

Large Language Models (LLM)

nlp ai machine_learning

27

tagged

Activity Trend

158 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Big Data LDN 2025 ×

In this talk, you will learn about some of the common challenges that you might encounter while developing meaningful applications with large language models.

Using non-deterministic systems as the basis for important applications is certainly an 'interesting' new frontier for software development, but hope is not lost. In this session, we will explore some of the well known (and less well known) issues in building applications on the APIs provided by LLM providers, and on 'open' LLMs, such as Mistral, Llama, or DeepSeek.

We will also (of course) dive into some of the approaches that you can take to address these challenges, and mitigate some of the inherent behaviors that are present within LLMs, enabling you to build more reliable and robust systems on top of LLMs, unlocking the potential of this new development paradigm.

This presentation provides an overview of how NVIDIA RAPIDS accelerates data science and data engineering workflows end-to-end. Key topics include leveraging RAPIDS for machine learning, large-scale graph analytics, real-time inference, hyperparameter optimization, and ETL processes. Case studies demonstrate significant performance improvements and cost savings across various industries using RAPIDS for Apache Spark, XGBoost, cuML, and other GPU-accelerated tools. The talk emphasizes the impact of accelerated computing on modern enterprise applications, including LLMs, recommenders, and complex data processing pipelines.

With the pace of change of AI being experienced across the industry and the constant bombardment of contradictory advice it is easy to become overwhelmed and not know where to start. 

The promise of LLMs have been undermined by vendor and journalistic hype and an inability to rely on quantitative answers being accurate. Afterall, what good would a colleague be (artificial or not) if you already need to know the answer to validate any question that you ask of them?

The promise of neuro-symbolic AI that combines two well established technologies (semantic knowledge graphs with machine learning) enable you to get more accurate LLM powered analytics and most importantly faster time to greater data value.

In this practical, engaging and fun talk, Ben will equip you with the principles and fundamentals that never change but often go under-utilised, as well as discussing and demonstrating the latest techniques, platforms and tools so that you can get started with confidence.

Ben will show that far from taking months, data products can take minutes or hours to prepare, publish and start gaining value from, all in a sustainable and maintainable manner.

LLMs and MCP have fundamentally changed my work as a solo data practitioner. I spend less time writing code, but throughput and durability have improved dramatically. MCP servers now act as core tooling for LLMs in my workflows, but have also started to become data products in their own right. 

This is a practical talk that will focus on my experience of shifting to this way of thinking and working, including what has worked for me and the realities of where there are still rough edges.

Site Reliability Engineers spend countless hours diagnosing complex production issues and optimizing system performance. This talk explores how Large Language Models can augment SRE workflows with real-world examples.

We'll also demonstrate live tooling that leverages AI to optimize database configurations - automatically recommending ordering keys, generating materialized views, and suggesting performance improvements based on query patterns and system metrics.

Rather than theoretical possibilities, this talk focuses on practical implementations with measurable results. Attendees will learn actionable strategies for integrating LLMs into their SRE workflows, understand the current boundaries of AI effectiveness in operations, and see concrete examples of tooling already delivering value in production environments.

This talk will explore how NVIDIA Blueprints are accelerating AI development and deployment across various industries, with a focus on building intelligent video analytics agents. Powered by generative AI, vision-language models (VLMs), large language models (LLMs), and NVIDIA NIM Microservices, these agents can be directed through natural language to perform tasks such as video summarization, visual question answering, and real-time alerts. This talk will show how VSS accelerates insight from video, helping industries transform footage into accurate, actionable intelligence.

For years, data governance has been about guiding people and their interpretations. We build glossaries, descriptions and documentation to keep analysts and business users aligned. But what happens when your primary “user” isn’t human? As agentic workflows, LLMs, and AI-driven decision systems become mainstream, the way we govern data must evolve. The controls that once relied on human interpretation now need to be machine-readable, unambiguous, and able to support near-real-time reasoning. The stakes are high: a governance model designed for people may look perfectly clear to us but lead an AI straight into hallucinations, bias, or costly automation errors.

This session explores what it really means to make governance “AI-ready.” We’ll look at the shift from human-centric to agent-centric governance, practical strategies for structuring metadata so that agents can reliably understand and act on it, and what new risks emerge when AI is the primary consumer of your data catalog. We'll discuss patterns, emerging practices, and a discuss how to transition to a new governance operating model. Whether you’re a data leader, platform engineer, or AI practitioner, you’ll leave with an appreciation of governance approaches for a world where your first stakeholder might not even be human.

Learn how to transform your data warehouse for AI/LLM readiness while making advanced analytics accessible to all team members, regardless of technical expertise. 

We'll share practical approaches to adapting data infrastructure and building user-friendly AI tools that lower the barrier to entry for sophisticated analysis. 

Key takeaways include implementation best practices, challenges encountered, and strategies for balancing technical requirements with user accessibility. Ideal for data teams looking to democratize AI-powered analytics in their organization.

Data teams know the pain of moving from proof-of-concepts to production. We’ve all seen brittle scripts, one-off notebooks, and manual fixes turn into hidden risks. With large language models, the same story is playing out, unless we borrow the lessons of modern data engineering.

This talk introduces a declarative approach to LLM engineering using DSPy and Dagster. DSPy treats prompts, retrieval strategies, and evaluation metrics as first-class, composable building blocks. Instead of tweaking text by hand, you declare the behavior you want, and DSPy optimizes and tunes the pipeline for you. Dagster is built on a similar premise; with Dagster Components, you can build modular and declarative pipelines.

This approach means:

- Trust & auditability: Every LLM output can be traced back through a reproducible graph.

- Safety in production: Automated evaluation loops catch drift and regressions before they matter.

- Scalable experimentation: The same declarative spec can power quick tests or robust, HIPAA/GxP-grade pipelines.

By treating LLM workflows like data pipelines: declarative, observable, and orchestrate, we can avoid the prompt spaghetti trap and build AI systems that meet the same reliability bar as the rest of the stack.

The Generative AI revolution is here, but so is the operational headache. For years, teams have matured their MLOps practices for traditional models, but the rapid adoption of LLMs has introduced a parallel, often chaotic, world of LLMOps. This results in fragmented toolchains, duplicated effort, and a state of "Ops Overload" that slows down innovation.

This session directly confronts this challenge. We will demonstrate how a unified platform like Google Cloud's Vertex AI can tame this complexity by providing a single control plane for the entire AI lifecycle.

Data is one of the most valuable assets in any organisation, but accessing and analysing it has been limited to technical experts. Business users often rely on predefined dashboards and data teams to extract insights, creating bottlenecks and slowing decision-making.

This is changing with the rise of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). These technologies are redefining how organisations interact with data, allowing users to ask complex questions in natural language and receive accurate, real-time insights without needing deep technical expertise.

In this session, I’ll explore how LLMs and RAG are driving true data democratisation by making analytics accessible to everyone, enabling real-time insights with AI-powered search and retrieval and overcoming traditional barriers like SQL, BI tool complexity, and rigid reporting structures.

Large Language Models (LLMs) are transformative, but static knowledge and hallucinations limit their direct enterprise use. Retrieval-Augmented Generation (RAG) is the standard solution, yet moving from prototype to production is fraught with challenges in data quality, scalability, and evaluation.

This talk argues the future of intelligent retrieval lies not in better models, but in a unified, data-first platform. We'll demonstrate how the Databricks Data Intelligence Platform, built on a Lakehouse architecture with integrated tools like Mosaic AI Vector Search, provides the foundation for production-grade RAG.

Looking ahead, we'll explore the evolution beyond standard RAG to advanced architectures like GraphRAG, which enable deeper reasoning within Compound AI Systems. Finally, we'll show how the end-to-end Mosaic AI Agent Framework provides the tools to build, govern, and evaluate the intelligent agents of the future, capable of reasoning across the entire enterprise.

It sounds simple: “Hey AI, refresh my Salesforce data.” But what really happens when that request travels through your stack?

Using Airbyte’s architecture as a model, this talk explores the complexity behind natural language data triggers - from spinning up connectors and handling credentials, to enforcing access controls and orchestrating safe, purpose-driven movement. We’ll introduce a unified framework for thinking about all types of data movement, from bulk ingestion to fine-grained activation - a model we’ve developed to bring clarity to a space crowded with overlapping terms and toolchains.

We’ll also explore how this foundation—and any modern data movement platform—must evolve for an AI-native world, where speed, locality, and security are non-negotiable. That includes new risks: leaking credentials into LLMs, or triggering unintended downstream effects from a single prompt.

We’ll close with a live demo: spinning up a local data plane and moving data via Airbyte—simply by chatting with a bot.

The rapid evolution of AI, fueled by powerful Large Language Models (LLMs) and autonomous agents, is reshaping how we build, deploy, and manage AI systems. This presentation explores the critical intersection of MLOps and AI architecture, highlighting the paradigm shifts required to integrate LLMs and agents into production. We will address key architectural challenges, including scalability, observability, and security, while examining emerging MLOps practices such as robust data pipelines, model monitoring, and continuous optimization. Attendees will gain practical insights and actionable strategies to navigate the complexities of modern AI deployments, unlocking the full potential of LLMs and agents while ensuring operational excellence.

As AI evolves with powerful Large Language Models (LLMs) and autonomous agents, deploying and managing these systems requires new approaches. This presentation explores the crucial intersection of MLOps and AI architecture, highlighting the shift toward scalable, observable, and secure AI deployments. We’ll examine key architectural considerations for integrating LLMs and agents into production, alongside evolving MLOps practices such as robust data pipelines, model monitoring, and continuous optimization.

Face To Face
by Sam Khalil (ekona.ai) , Kshitij Kumar (Data-Hat AI) , David Reed (DataIQ) , Jane Smith (ThoughtSpot) , Dr. Joe Perez (NC Dept of Health & Human Services) , Anusha Adige (EY)

As AI agents become embedded in everyday workflows — from healthcare diagnostics to financial services chatbots — the line between human and machine continues to blur. This panel brings together industry leaders to tackle the tough questions:

• How do we trust AI agents in high-risk environments?

• What are the new rules of ownership and accountability when autonomous systems act on data?

• Is AI replacing or enhancing the human workforce — and how do we keep the balance right?

We'll unpack how AI agents are evolving across sectors, debate whether the current LLM paradigm is enough, and explore the new guardrails needed to futureproof agentic AI — without losing control.

Face To Face
by Guy Fighel (Hetz Ventures) , Gal Peretz (Carbyne) , Lee Twito (Lemonade)

The data engineer’s role is shifting in the AI era. With LLMs and agents as new consumers, the challenge moves from SQL and schemas to semantics, context engineering, and making databases LLM-friendly. This session explores how data engineers can design semantic layers, document relationships, and expose data through MCPs and AI interfaces. We’ll highlight new skills required, illustrate pipelines that combine offline and online LLM processing, and show how data can serve business users, developers, and AI agents alike.

Face To Face
by Maximilien Tirard (Wolfram Research)

While there has been much excitement about the potential of large language models (LLMs) to automate tasks that previously required human intelligence or creativity, many early projects have failed because of LLMs’ innate willingness to lie. This presentation explores these “hallucination” issues and proposes a solution.

By combining generative AI with more traditional symbolic computation, reliability can be maintained, explainability improved, and private knowledge and data injected. This talk will show simple examples of combining language-based thinking with computational thinking to generate solutions that neither could achieve on its own.

An example application of an AI scientific research assistant will be shown that brings together the ideas presented in a most demanding real-world task, where false information is not acceptable. This is a fast-evolving space with enormous potential—and we’re just getting started.

This session will explore the evolving role of data engineers. Data engineering is currently a bottleneck due to overwhelming requests and complex knowledge work. Maia acts as a "digital data engineer" or a "virtual data team" that amplifies productivity by 100x. It enables users, from skilled engineers to citizen data analysts, to author pipelines in natural business language. The session will demonstrate Maia's ability to accelerate mundane and advanced tasks,troubleshoot and debug pipelines in real-time, and generate high-quality, auditable pipelines using Matillion's proprietary, human-readable Data Pipeline Language (DPL), which avoids "spaghetti code" common with generic LLMs.

Are you ready to build the next generation of data-driven applications? This session demystifies the world of Autonomous Agents, explaining what they are and why they are the future of AI. We’ll dive into Google Cloud's comprehensive platform for creating and deploying these agents, from our multimodal data handling to the seamless integration of Gemini models. You will learn the principles behind building your own custom data agents and understand why Google Cloud provides the definitive platform for this innovation. Join us to gain the knowledge and tools needed to architect and deploy intelligent, self-sufficient data solutions.