talk-data.com talk-data.com

Topic

LLM

Large Language Models (LLM)

nlp ai machine_learning

1405

tagged

Activity Trend

158 peak/qtr
2020-Q1 2026-Q1

Activities

1405 activities · Newest first

An Illustrated Guide to AI Agents

Artificial intelligence is entering a new phase. No longer limited to answering prompts or completing simple writing tasks, AI agents can now reason, plan, and act with increasing independence. From accelerating scientific breakthroughs to supporting creative work, these systems are quickly reshaping industries and everyday life. This book provides the conceptual foundation and practical insights you need to understand—and effectively work with—this emerging technology. Through hundreds of clear graphic illustrations, Maarten Grootendorst and Jay Alammar explain how AI agents are built, how they think, and where they're heading. Designed for professionals, students, and curious learners alike, this guide goes beyond the buzz to reveal what's actually happening inside these systems, why it matters, and how to apply the knowledge in real-world contexts. With its visual storytelling and accessible explanations, An Illustrated Guide to AI Agents is your essential reference for navigating the next frontier of artificial intelligence. Explore the core architecture of AI agents: tools, memory, and planning Understand reasoning LLMs, multimodal models, and multi-agent collaboration Learn advanced methods, including distillation, quantization, and reinforcement learning Evaluate real-world applications, strengths, and limitations of AI agents

Context Engineering with DSPy

AI agents need the right context at the right time to do a good job. Too much input increases cost and harms accuracy, while too little causes instability and hallucinations. Context Engineering with DSPy introduces a practical, evaluation-driven way to design AI systems that remain reliable, predictable, and easy to maintain as they grow. AI engineer and educator Mike Taylor explains DSPy in a clear, approachable style, showing how its modular structure, portable programs, and built-in optimizers help teams move beyond guesswork. Through real examples and step-by-step guidance, you'll learn how DSPy's signatures, modules, datasets, and metrics work together to solve context engineering problems that evolve as models change and workloads scale. This book supports AI engineers, data scientists, machine learning practitioners, and software developers building AI agents, retrieval-augmented generation (RAG) systems, and multistep reasoning workflows that hold up in production. Understand the core ideas behind context engineering and why they matter Structure LLM pipelines with DSPy's maintainable, reusable components Apply evaluation-driven optimizers like GEPA and MIPROv2 for measurable improvements Create reproducible RAG and agentic workflows with clear metrics Develop AI systems that stay robust across providers, model updates, and real-world constraints

Evals for AI Engineers

Stop using guesswork to find out how your AI applications are performing. Evals for AI Engineers equips you with the proven tools and processes required to systematically test, measure, and enhance the reliability of AI applications, especially those using LLMs. Written by AI engineers with extensive experience in real-world consulting (across 35+ AI products) and cutting-edge research, this practical resource will help you move from assumptions to robust, data-driven evaluation. Ideal for software engineers, technical product managers, and technical leads, this hands-on guide dives into techniques like error analysis, synthetic data generation, automated LLM-as-a-judge systems, production monitoring, and cost optimization. You'll learn how to debug LLM behavior, design test suites based on synthetic and real data, and build data flywheels that improve over time. Whether you're starting without user data or scaling a production system, you'll gain the skills to build AI you can trust—with processes that are repeatable, measurable, and aligned with real-world outcomes. Run systematic error analyses to uncover, categorize, and prioritize failure modes Build, implement, and automate evaluation pipelines using code-based and LLM-based metrics Optimize AI performance and costs through smart evaluation and feedback loops Apply key principles and techniques for monitoring AI applications in production

Designing AI Interfaces

As artificial intelligence becomes central to modern product design, UX professionals must adapt their toolkits to meet new demands. In Designing AI Interfaces, senior product designer Louise Macfadyen offers a timely, practice-oriented guide for building intuitive, ethical, and effective user experiences with large language models (LLMs) and autonomous AI systems. From content moderation to interruptibility, this book presents actionable design patterns for today's most advanced AI interactions—with clear technical insights to help designers understand how AI systems process inputs, generate outputs, and make decisions on users' behalf. Written specifically for product designers navigating the AI transition, this book provides concrete strategies for managing risk, enabling transparency, and fostering user trust in increasingly agentic systems. Readers will learn how to enable users to steer and shape AI responses in real time, incorporate ethical and UX principles into actionable design strategies, and navigate trade-offs in autonomy and control—all while gaining fluency in key AI concepts to collaborate more effectively with engineering teams. Design effective and ethical interfaces for LLMs and AI agents Apply best-practice patterns for content warnings, permissions, and oversight Gain a mental model for how AI systems reason and act Collaborate confidently with engineering and product teams Evaluate your org's AI maturity and advocate for responsible implementation

Generative AI on Kubernetes

Generative AI is revolutionizing industries, and Kubernetes has fast become the backbone for deploying and managing these resource-intensive workloads. This book serves as a practical, hands-on guide for MLOps engineers, software developers, Kubernetes administrators, and AI professionals ready to unlock AI innovation with the power of cloud native infrastructure. Authors Roland Huß and Daniele Zonca provide a clear road map for training, fine-tuning, deploying, and scaling GenAI models on Kubernetes, addressing challenges like resource optimization, automation, and security along the way. With actionable insights with real-world examples, readers will learn to tackle the opportunities and complexities of managing GenAI applications in production environments. Whether you're experimenting with large-scale language models or facing the nuances of AI deployment at scale, you'll uncover expertise you need to operationalize this exciting technology effectively. Learn to run GenAI models on Kubernetes for efficient scalability Get techniques to train and fine-tune LLMs within Kubernetes environments See how to deploy production-ready AI systems with automation and resource optimization Discover how to monitor and scale GenAI applications to handle real-world demand Uncover the best tools to operationalize your GenAI workloads Learn how to run agent-based and AI-driven applications

GenAI solutions include several choices and trade-offs. A critical decision is: should you build custom AI solutions in-house or buy off-the-shelf products? This session brings together a debate on the trade-offs, risk and rewards of each approach. The session will be based on scenarios and use-cases to highlight key considerations such as cost, reliability , flexibility and speed for different decisions such as LLMs vs. SLMs, RAG vs. AI agents, packaged platform capability vs. bespoke custom solution, packaged vs. open-source.

ML and Generative AI in the Data Lakehouse

In today's race to harness generative AI, many teams struggle to integrate these advanced tools into their business systems. While platforms like GPT-4 and Google's Gemini are powerful, they aren't always tailored to specific business needs. This book offers a practical guide to building scalable, customized AI solutions using the full potential of data lakehouse architecture. Author Bennie Haelen covers everything from deploying ML and GenAI models in Databricks to optimizing performance with best practices. In this must-read for data professionals, you'll gain the tools to unlock the power of large language models (LLMs) by seamlessly combining data engineering and data science to create impactful solutions. Learn to build, deploy, and monitor ML and GenAI models on a data lakehouse architecture using Databricks Leverage LLMs to extract deeper, actionable insights from your business data residing in lakehouses Discover how to integrate traditional ML and GenAI models for customized, scalable solutions Utilize open source models to control costs while maintaining model performance and efficiency Implement best practices for optimizing ML and GenAI models within the Databricks platform

AI-Native LLM Security

"AI Native LLM Security" is your essential guide to understanding and securing large language models and AI systems. With a focus on implementing practical strategies and leveraging frameworks like OWASP Top 10, this book equips professionals to identify and mitigate risks effectively. By reading this, you'll gain the expertise to confidently manage LLM security challenges. What this Book will help me do Learn about adversarial AI attacks and methods to defend against them. Understand secure-by-design methodologies and their application to LLM systems. Gain insights on implementing MLSecOps practices for robust AI security. Navigate ethical considerations and legal aspects of AI security. Secure AI development life cycles with practical strategies and standards. Author(s) The authors, Vaibhav Malik, Ken Huang, and Adam Dawson, are experts in AI security with collective experience covering cybersecurity, AI development, and security frameworks. Their dedication to advancing trustworthy AI ensures that this book is both technically comprehensive and approachable. Who is it for? This book is perfect for cybersecurity experts, AI developers, and technology managers aiming to secure and manage AI systems. Readers should have a basic understanding of AI and security concepts. If you're a security architect, ML engineer, DevOps professional, or a leader overseeing AI initiatives, this book will help you address LLM security effectively for your field.

LLMOps in Practice: Building Secure, Governed Pipelines for Large Language Models

As organizations move from prototyping LLMs to deploying them in production, the biggest challenges are no longer about model accuracy - they’re about trust, security, and control. How do we monitor model behavior, prevent prompt injection, track drift, and enforce governance across environments?

This talk presents a real-world view of how to design secure and governed LLM pipelines, grounded in open-source tooling and reproducible architectures. We’ll discuss how multi-environment setups (sandbox, runner, production) can isolate experimentation from deployment, how to detect drift and hallucination using observability metrics, and how to safeguard against prompt injection, data leakage, and bias propagation.

Attendees will gain insight into how tools like MLflow, Ray, and TensorFlow Data Validation can be combined for ** version tracking, monitoring, and auditability**, without turning your workflow into a black box. By the end of the session, you’ll walk away with a practical roadmap on what makes an LLMOps stack resilient: reproducibility by design, continuous evaluation, and responsible governance across the LLM lifecycle.

Surviving the Agentic Hype with Small Language Models

The AI landscape is abuzz with talk of "agentic intelligence" and "autonomous reasoning." But beneath the hype, a quieter revolution is underway: Small Language Models (SLMs) are starting to perform the core reasoning and orchestration tasks once thought to require massive LLMs. In this talk, we’ll demystify the current state of “AI agents,” show how compact models like Phi-2, xLAM 8B, and Nemotron-H 9B can plan, reason, and call tools effectively, and demonstrate how you can deploy them on consumer-grade hardware. Using Python and lightweight frameworks such as LangChain, we’ll show how anyone can quickly build and experiment with their own local agentic systems. Attendees will leave with a grounded understanding of agent architectures, SLM capabilities, and a roadmap for running useful agents without the GPU farm.

Evaluating AI Agents in production with Python

This talk covers methods of evaluating AI Agents, with an example of how the speakers built a Python-based evaluation framework for a user-facing AI Agent system which has been in production for over a year. We share tools and Python frameworks used (as well as tradeoffs and alternatives), and discuss methods such as LLM-as-Judge, rules-based evaluations, ML metrics used, as well as selection tradeoffs.

Program Synthesis (PS) is the task of automatically generating logical procedures or source code from a small set of input-output examples. While LLMs and agents dominate current AI conversations, they often struggle with these kinds of precise reasoning tasks—where smaller, well-structured models for PS can succeed. In this talk, we’ll walk through the end-to-end development of an PS system, covering dataset representation using graph structures, model architectures, and tree search algorithms. The working example for this talk is the generation of procedural textures for 3D modeling, but the methodology is domain-agnostic. Participants will leave with a deeper understanding of PS, its real-world potential, and the trade-offs between different architectural approaches. The session is designed for practitioners with a solid understanding of ML concepts and some familiarity with NN architectures such as transformers and CNNs.

Is Your LLM Evaluation Missing the Point?

Your LLM evaluation suite shows 93% accuracy. Then domain experts point out it's producing catastrophically wrong answers for real-world use cases. This talk explores the collaboration gap between AI engineers and domain experts that technical evaluation alone cannot bridge. Drawing from government, healthcare, and civic tech case studies, we'll examine why tools like PromptFoo, DeepEval, and RAGAS are necessary but insufficient and how structured collaboration with domain stakeholders reveals critical failures invisible to standard metrics. You'll leave with practical starting points for building cross-functional evaluation that catches problems before deployment.

Building accurate AI workflows can get complicated fast. By explicitly defining and modularizing agent tasks, my AI flows have become more precise, consistent, and efficient—delivering improved outcomes consistently. But can we prove it? In this talk, I'll walk you through an agentic app built with Langflow, and show how giving agents narrower, well-defined tasks leads directly to more accurate, consistent results. We'll put that theory to the test using evals with Pytest and LangSmith, iterating across different agent setups, analyzing data, and tightening up the app. By the end, we'll have a clear, repeatable workflow that lets us have confidence in how future agent or LLM changes will affect outcomes, before we ever hit deploy.

AI red teaming is crucial for identifying security and safety vulnerabilities (e.g., jailbreaks, prompt injection, harmful content generation) of Large Language Models. However, manual and brute-force adversarial testing is resource-intensive and often inefficiently consumes time and compute resources exploring low-risk regions of the input space. This talk introduces a practical, Python-based methodology for accelerating red teaming using model uncertainty quantification (UQ).

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management?

Advances in large language models (LLMs) have propelled a recent flurry of AI tools for data management and operations. For example, AI-powered code assistants leverage LLMs to generate code for dataflow pipelines. RAG pipelines enable LLMs to ground responses with relevant information from external data sources. Data agents leverage LLMs to turn natural language questions into data-driven answers and actions. While challenges remain, these advances are opening exciting new opportunities for data scientists and engineers. In this talk, we will examine recent advances, along with some still incubating in research labs, with the goal of understanding where this is all heading, and present our perspective on what’s next for AI in data management and data operations.