talk-data.com talk-data.com

Topic

LLM

Large Language Models (LLM)

nlp ai machine_learning

1405

tagged

Activity Trend

158 peak/qtr
2020-Q1 2026-Q1

Activities

1405 activities · Newest first

In this episode, we talk with Michael Lanham, an AI and software innovator with over two decades of experience spanning game development, fintech, oil and gas, and agricultural tech. Michael shares his journey from building neural network-based games and evolutionary algorithms to writing influential books on AI agents and deep learning. He offers insights into the evolving AI landscape, practical uses of AI agents, and the future of generative AI in gaming and beyond.

TIMECODES 00:00 Micheal Lanham’s career journey and AI agent books 05:45 Publishing journey: AR, Pokémon Go, sound design, and reinforcement learning 10:00 Evolution of AI: evolutionary algorithms, deep learning, and agents 13:33 Evolutionary algorithms in prompt engineering and LLMs 18:13 AI agent books second edition and practical applications 20:57 AI agent workflows: minimalism, task breakdown, and collaboration 26:25 Collaboration and orchestration among AI agents 31:24 Tools and reasoning servers for agent communication 35:17 AI agents in game development and generative AI impact 38:57 Future of generative AI in gaming and immersive content 41:42 Coding agents, new LLMs, and local deployment 45:40 AI model trends and data scientist career advice 53:36 Cognitive testing, evaluation, and monitoring in AI 58:50 Publishing details and closing remarks

Connect with Micheal Linkedin - https://www.linkedin.com/in/micheal-lanham-189693123/ Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/...Check other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClubLinkedIn -   / datatalks-club   Twitter -   / datatalksclub   Website - https://datatalks.club/

At PyData Berlin, community members and industry voices highlighted how AI and data tooling are evolving across knowledge graphs, MLOps, small-model fine-tuning, explainability, and developer advocacy.

  • Igor Kvachenok (Leuphana University / ProKube) combined knowledge graphs with LLMs for structured data extraction in the polymer industry, and noted how MLOps is shifting toward LLM-focused workflows.
  • Selim Nowicki (Distill Labs) introduced a platform that uses knowledge distillation to fine-tune smaller models efficiently, making model specialization faster and more accessible.
  • Gülsah Durmaz (Architect & Developer) shared her transition from architecture to coding, creating Python tools for design automation and volunteering with PyData through PyLadies.
  • Yashasvi Misra (Pure Storage) spoke on explainable AI, stressing accountability and compliance, and shared her perspective as both a data engineer and active Python community organizer.
  • Mehdi Ouazza (MotherDuck) reflected on developer advocacy through video, workshops, and branding, showing how creative communication boosts adoption of open-source tools like DuckDB.

Igor Kvachenok Master’s student in Data Science at Leuphana University of Lüneburg, writing a thesis on LLM-enhanced data extraction for the polymer industry. Builds RDF knowledge graphs from semi-structured documents and works at ProKube on MLOps platforms powered by Kubeflow and Kubernetes.

Connect: https://www.linkedin.com/in/igor-kvachenok/

Selim Nowicki Founder of Distill Labs, a startup making small-model fine-tuning simple and fast with knowledge distillation. Previously led data teams at Berlin startups like Delivery Hero, Trade Republic, and Tier Mobility. Sees parallels between today’s ML tooling and dbt’s impact on analytics.

Connect: https://www.linkedin.com/in/selim-nowicki/

Gülsah Durmaz Architect turned developer, creating Python-based tools for architectural design automation with Rhino and Grasshopper. Active in PyLadies and a volunteer at PyData Berlin, she values the community for networking and learning, and aims to bring ML into architecture workflows.

Connect: https://www.linkedin.com/in/gulsah-durmaz/

Yashasvi (Yashi) Misra Data Engineer at Pure Storage, community organizer with PyLadies India, PyCon India, and Women Techmakers. Advocates for inclusive spaces in tech and speaks on explainable AI, bridging her day-to-day in data engineering with her passion for ethical ML.

Connect: https://www.linkedin.com/in/misrayashasvi/

Mehdi Ouazza Developer Advocate at MotherDuck, formerly a data engineer, now focused on building community and education around DuckDB. Runs popular YouTube channels ("mehdio DataTV" and "MotherDuck") and delivered a hands-on workshop at PyData Berlin. Blends technical clarity with creative storytelling.

Connect: https://www.linkedin.com/in/mehd-io/

Techie vs Comic: The sequel

A data scientist by day and a standup comedian by night. This was how Arda described himself prior to his critically acclaimed performance about his two identities during PyData 2024, where they merged.

Now he doesn't even know.

After another year of stage performances, awkward LinkedIn interactions and mysterious cloud errors, Arda is back for another tale of absurdity. In this closing talk, he will illustrate the hilarity of his life as a data scientist in the age of LLMs and his non-existent comfort zone, proving good sequels can exist

Real-Time Context Engineering for LLMs

Context engineering has replaced prompt engineering as the main challenge in building agents and LLM applications. Context engineering involves providing LLMs with relevant and timely context data from various data sources, which allows them to make context-aware decisions. The context data provided to the LLM must be produced in real-time to enable it to react intelligently at human perceivable latencies (a second or two at most). If the application takes longer to react, humans would perceive it as laggy and unintelligent. In this talk, we will introduce context engineering and motivate for real-time context engineering for interactive applications. We will also demonstrate how to integrate real-time context data from applications inside Python agents using the Hopsworks feature store and corresponding application IDs. Application IDs are the key to unlock application context data for agents and LLMs. We will walk through an example of an interactive application (TikTok clone) that we make AI-enabled with Hopsworks.

Sieves: Plug-and-Play NLP Pipelines With Zero-Shot Models

Generative models are dominating the spotlight lately - and rightly so. Their flexibility and zero-shot capabilities make it incredibly fast to prototype NLP applications. However, one-shotting complex NLP problems often isn't the best long-term strategy. Decomposing problems into modular, pipelined tasks leads to better debuggability, greater interpretability, and more reliable performance.

This modular pipeline approach pairs naturally with zero- and few-shot (ZFS) models, enabling rapid yet robust prototyping without requiring large datasets or fine-tuning. Crucially, many real-world applications need structured data outputs—not free-form text. Generative models often struggle to consistently produce structured results, which is why enforcing structured outputs is now a core feature across contemporary NLP tools (like Outlines, DSPy, LangChain, Ollama, vLLM, and others).

For engineers building NLP pipelines today, the landscape is fragmented. There’s no single standard for structured generation yet, and switching between tools can be costly and frustrating. The NLP tooling landscape lacks a flexible, model-agnostic solution that minimizes setup overhead, supports structured outputs, and accelerates iteration.

Introducing Sieves: a modular toolkit for building robust NLP document processing pipelines using ZFS models.

Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game

The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?

How to Keep Your LLM Chatbots Real: A Metrics Survival Guide

In this brave new world of vibe coding and YOLO-to-prod mentality, let’s take a step back and keep things grounded (pun intended). None of us would ever deploy a classical ML model to production without clearly defined metrics and proper evaluation, so let's talk about methodologies for measuring performance of LLM-powered chatbots. Think of retriever recall, answer relevancy, correctness, faithfulness and hallucination rates. With the wild west of metric standards still in full swing, I’ll guide you through the challenges of curating a synthetic test set, and selecting suitable metrics and open-source packages that help evaluating your use case. Everything is possible, from simple LLM-as-a-judge approaches like those inherent to many packages like MLFLow now up to complex multi-step quantification approaches with Ragas. If you work in the GenAI space or with LLM-powered chatbots, this session is for you! Prior or background knowledge is of advantage, but not required.

Recently, the integration of Generative AI (GenAI) technologies into both our personal and professional lives has surged. In most organizations, the deployment of GenAI applications is on the rise, and this trend is expected to continue in the foreseeable future. Evaluating GenAI systems presents unique challenges not present in traditional ML. The main peculiarity is the absence of ground truth for textual metrics such as: text clarity, location extraction accuracy, factual accuracy and so on. Nevertheless the non-negligible model serving cost demands an even more thorough evaluation of the system to be deployed in production.

Defining the metric ground truth is a costly and time consuming process requiring human annotation. To address this, we are going to present how to evaluate LLM-based applications by leveraging LLMs themselves as evaluators. Moreover we are going to outline the complexities and evaluation methods for LLM-based Agents which operate with autonomy and present further evaluation challenges. Lastly, we will explore the critical role of evaluation in the GenAI lifecycle and outline the steps taken to integrate these processes seamlessly.

Whether you are an AI practitioner, user or enthusiast, join us to gain insights into the future of GenAI evaluation and its impact on enhancing application performance.

Model Context Protocol: Principles and Practice

Large‑language‑model agents are only as useful as the context and tools they can reach.

Anthropic’s Model Context Protocol (MCP) proposes a universal, bidirectional interface that turns every external system—SQL databases, Slack, Git, web browsers, even your local file‑system—into first‑class “context providers.”

In just 30 minutes we’ll step from high‑level buzzwords to hands‑on engineering details:

  • How MCP’s JSON‑RPC message format, streaming channels, and version‑negotiation work under the hood.
  • Why per‑tool sandboxing via isolated client processes hardens security (and what happens when an LLM tries rm ‑rf /).
  • Techniques for hierarchical context retrieval that stretch a model’s effective window beyond token limits.
  • Real‑world patterns for accessing multiple tools—Postgres, Slack, GitHub—and plugging MCP into GenAI applications.

Expect code snippets and lessons from early adoption.

You’ll leave ready to wire your own services into any MCP‑aware model and level‑up your GenAI applications—without the N×M integration nightmare.

Untitled13.ipynb

For well over a decade, Python notebooks revolutionized our field. They gave us so much creative freedom and dramatically lowered the entry barrier for newcomers. Yet despite all this ... it has been a decade! And the notebook is still in roughly the same form factor.

So what if we allow ourselves to rethink notebooks ... really rethink it! What features might we come up with? Can we make the notebook understand datasources? What about LLMs? Can we generate widgets on the fly? What if we make changes to Python itself?

This presentation will be a stream of demos that help paint a picture of what the future might hold. I will share my latest work in the anywidget/marimo ecosystem as well as some new hardware integrations.

The main theme that I will work towards: if you want better notebooks, reactive Python might very well be the future.

In this talk, you will learn about some of the common challenges that you might encounter while developing meaningful applications with large language models.

Using non-deterministic systems as the basis for important applications is certainly an 'interesting' new frontier for software development, but hope is not lost. In this session, we will explore some of the well known (and less well known) issues in building applications on the APIs provided by LLM providers, and on 'open' LLMs, such as Mistral, Llama, or DeepSeek.

We will also (of course) dive into some of the approaches that you can take to address these challenges, and mitigate some of the inherent behaviors that are present within LLMs, enabling you to build more reliable and robust systems on top of LLMs, unlocking the potential of this new development paradigm.

This presentation provides an overview of how NVIDIA RAPIDS accelerates data science and data engineering workflows end-to-end. Key topics include leveraging RAPIDS for machine learning, large-scale graph analytics, real-time inference, hyperparameter optimization, and ETL processes. Case studies demonstrate significant performance improvements and cost savings across various industries using RAPIDS for Apache Spark, XGBoost, cuML, and other GPU-accelerated tools. The talk emphasizes the impact of accelerated computing on modern enterprise applications, including LLMs, recommenders, and complex data processing pipelines.

With the pace of change of AI being experienced across the industry and the constant bombardment of contradictory advice it is easy to become overwhelmed and not know where to start. 

The promise of LLMs have been undermined by vendor and journalistic hype and an inability to rely on quantitative answers being accurate. Afterall, what good would a colleague be (artificial or not) if you already need to know the answer to validate any question that you ask of them?

The promise of neuro-symbolic AI that combines two well established technologies (semantic knowledge graphs with machine learning) enable you to get more accurate LLM powered analytics and most importantly faster time to greater data value.

In this practical, engaging and fun talk, Ben will equip you with the principles and fundamentals that never change but often go under-utilised, as well as discussing and demonstrating the latest techniques, platforms and tools so that you can get started with confidence.

Ben will show that far from taking months, data products can take minutes or hours to prepare, publish and start gaining value from, all in a sustainable and maintainable manner.

LLMs and MCP have fundamentally changed my work as a solo data practitioner. I spend less time writing code, but throughput and durability have improved dramatically. MCP servers now act as core tooling for LLMs in my workflows, but have also started to become data products in their own right. 

This is a practical talk that will focus on my experience of shifting to this way of thinking and working, including what has worked for me and the realities of where there are still rough edges.

Site Reliability Engineers spend countless hours diagnosing complex production issues and optimizing system performance. This talk explores how Large Language Models can augment SRE workflows with real-world examples.

We'll also demonstrate live tooling that leverages AI to optimize database configurations - automatically recommending ordering keys, generating materialized views, and suggesting performance improvements based on query patterns and system metrics.

Rather than theoretical possibilities, this talk focuses on practical implementations with measurable results. Attendees will learn actionable strategies for integrating LLMs into their SRE workflows, understand the current boundaries of AI effectiveness in operations, and see concrete examples of tooling already delivering value in production environments.

This talk will explore how NVIDIA Blueprints are accelerating AI development and deployment across various industries, with a focus on building intelligent video analytics agents. Powered by generative AI, vision-language models (VLMs), large language models (LLMs), and NVIDIA NIM Microservices, these agents can be directed through natural language to perform tasks such as video summarization, visual question answering, and real-time alerts. This talk will show how VSS accelerates insight from video, helping industries transform footage into accurate, actionable intelligence.

For years, data governance has been about guiding people and their interpretations. We build glossaries, descriptions and documentation to keep analysts and business users aligned. But what happens when your primary “user” isn’t human? As agentic workflows, LLMs, and AI-driven decision systems become mainstream, the way we govern data must evolve. The controls that once relied on human interpretation now need to be machine-readable, unambiguous, and able to support near-real-time reasoning. The stakes are high: a governance model designed for people may look perfectly clear to us but lead an AI straight into hallucinations, bias, or costly automation errors.

This session explores what it really means to make governance “AI-ready.” We’ll look at the shift from human-centric to agent-centric governance, practical strategies for structuring metadata so that agents can reliably understand and act on it, and what new risks emerge when AI is the primary consumer of your data catalog. We'll discuss patterns, emerging practices, and a discuss how to transition to a new governance operating model. Whether you’re a data leader, platform engineer, or AI practitioner, you’ll leave with an appreciation of governance approaches for a world where your first stakeholder might not even be human.

Learn how to transform your data warehouse for AI/LLM readiness while making advanced analytics accessible to all team members, regardless of technical expertise. 

We'll share practical approaches to adapting data infrastructure and building user-friendly AI tools that lower the barrier to entry for sophisticated analysis. 

Key takeaways include implementation best practices, challenges encountered, and strategies for balancing technical requirements with user accessibility. Ideal for data teams looking to democratize AI-powered analytics in their organization.