AI Engineer World's Fair 2025

How to Build Trustworthy AI — Allie Howe

2025-06-16 Watch

video

Trust is a multifaceted outcome that results when product and engineering teams work together to build AI that is aligned, explainable, and secure. Learn strategies for how to build trustworthy AI and why trust is paramount for AI systems.

Trustworthy AI = AI Security + AI Safety

Learn about the differences between AI Security and AI Safety and how the three focus areas of MLSecOps + AI Red Teaming + AI Runtime Security can help you achieve both and ultimately build Trustworthy AI.

Trustworthy AI Issues in the news: https://x.com/syddiitwt/status/1923427722241487297 https://fingfx.thomsonreuters.com/gfx/legaldocs/egvblxokkvq/Walters%20v%20OpenAI%20-%20order.pdf?ref=claritasgrc.ai

MLSecOps Resources Modelscan https://github.com/protectai/modelscan Community: mlsecops.com

AI Red Teaming Resources: https://azure.github.io/PyRIT/ https://ashy-coast-00aeb501e.6.azurestaticapps.net/MS_AIRT_Lessons_eBook.pdf

AI Runtime Security Resources: https://www.pillar.security/solutions#ai-detection https://noma.security/

Showcasing Trustworthy AI to Customers/Prospects https://www.vanta.com/collection/trust/what-is-a-trust-center

Exposing Agents as MCP servers with mcp-agent: Sarmad Qadri

2025-06-11 Watch

video

In this talk, we will show that agents can be represented as MCP servers, allowing them to be run from any MCP client (such as Claude, Cursor and other applications).

This is made possible with mcp-agent, a simple, composable framework to build agents using Model Context Protocol.

Overview

Currently "agentic" behavior exists only on the MCP client side – clients like Claude or Cursor use MCP servers, which are often simple tool APIs, to solve tasks.

However, if Agents are MCP servers themselves, then any MCP client can invoke, coordinate and orchestrate agents the same way it does with any other MCP server.

This paradigm shift enables: 1. Agent Composition: Build complex multi-agent systems over the same base protocol (MCP). 2. Platform Independence: Use your agents from any MCP-compatible client 3. Scalability: Run agent workflows on dedicated infrastructure, not just within client environments 4. Customization: Develop your own agent workflows and reuse them across any MCP client.

Background

mcp-agent was inspired by 2 foundational updates that Anthropic introduced for AI application developers:

Model Context Protocol - a standardized interface to let any software be accessible to AI assistants via MCP servers.
Building Effective Agents - a seminal writeup on simple, composable patterns for building production-ready AI agents.

mcp-agent puts these two foundational pieces into an AI application framework:

It handles the pesky business of managing the lifecycle of MCP server connections.
It implements every pattern described in Building Effective Agents, and does so in a composable way, allowing you to chain these patterns together.

Now as MCP continues to grow adoption, we are exploring advanced agent architectures that allow for sophisticated workflows in simple ways.

Private video

2025-06-11 Watch

video

This video is private.

Beyond Conversation: Why Documents Transform Natural Language into Code - Filip Kozera

2025-06-10 Watch

video

Natural language is quickly becoming our most powerful programming abstraction, perfectly suited to capture the inherent fuzziness and complexity of real-world problems. But despite the power of AI chatbots, endlessly brainstorming in conversational interfaces rarely leads to clarity or reliable results.

This session explores how structured, document-based natural language is uniquely positioned as the ultimate interface for humans to precisely describe complex systems. We'll discuss why conversational interfaces often fail at forcing clarity, and how shifting to a document-driven model ensures that humans articulate their intent clearly and rigorously.

Attendees will learn:

Why natural language (not code) is the most intuitive way to describe complex systems

How documents inherently force clarity, rigor, and structured thinking compared to chatbots

Real-world examples of document-based programming for building reliable, deployable AI systems

Practical insights into transitioning from conversational brainstorming to structured document-driven workflows

Break It 'Til You Make It: Building the Self-Improving Stack for AI Agents - Aparna Dhinakaran

2025-06-10 Watch

video

Building and shipping an AI agent is just the beginning. In real-world systems, the real work starts after deployment — when agents drift, fail silently, or underperform in edge cases no one anticipated.

This talk is about building the full monitoring and improvement stack that keeps agents reliable, efficient, and improving over time. We’ll walk through how to connect evals, tracing, observability, experimentation, and optimization into a virtuous cycle — one where agents not only perform, but learn and adapt in production.

Drawing on real-world deployments, I’ll cover:

Composing evaluation layers that surface meaningful failure modes -Tracing and instrumentation for deep visibility into agent behavior -Running experiments that actually improve outcomes -Closing the loop with feedback-driven optimization
People know to improve the agents application, but do they also know they need to improve their evals in tandem?

If you’re scaling agents beyond the prototype phase, this is the talk that helps you move from working once to working continuously.

Just do it. (let your tools think for themselves) - Robert Chandler

2025-06-10 Watch

video

There's a new type of wrapper in town. The MCP API wrapper.

Make them thin and you'll be wondering why your chatbot is struggling to even send a Slack message (true story). But make them agentic and the world is unlocked.

In this talk I'll demonstrate the drawbacks using low level APIs as MCPs and show the magic that happens when your 'tools' are actually other agents. It's prompts all the way down baby!

MCPs are Boring (or: Why we are losing the Sparkle of LLMs) - Manuel Odendahl

2025-06-10 Watch

video

With the mainstream spread especially in coding and with agents, we are starting to imprison ourselves in little cargo culted boxes of what llms and agents are.

I’ll hopefully show you a couple of ideas so you can delve deeper and learn to unleash and harness the shoggoth.

You’re absolutely right – this is the talk you don’t want to miss!

https://x.com/ProgramWithAi/status/1929226124019564993

Supercharging developer workflow with Amazon Q Developer - Vikash Agrawal

2025-06-10 Watch

video

Supercharging Developer Workflow with Amazon Q Developer

Tired of repetitive coding tasks? What if AI could handle coding, testing, documentation, and deployment for you? In this session, we’ll build the classic 2048 game from scratch using Amazon Q Developer, demonstrating how AI can streamline the development workflow.

Key highlights: ✅ /dev – AI-powered code generation ✅ /test – Automated unit test creation ✅ /doc – Instant documentation generation ✅ /review – AI-assisted code review ✅ Amazon Q Developer in CLI ✅ /dev – Deployment script generation ✅ Deploy & Debug – Seamless AWS deployment & debugging in CloudWatch

By the end of this session, you’ll see firsthand how Amazon Q Developer can boost productivity, reduce boilerplate, and help you ship faster. Let’s build smarter, not harder! 🚀

The Many Ends of Programming - Ray Myers

2025-06-10 Watch

video

AI will reshape Software Engineering – but how remains an open question. Will the developers’ role evolve, or vanish entirely? Are we heading toward an Innovator’s Paradise or an Infinite Pile of Garbage?

Visions of the future are so wildly divergent that we struggle to even agree on terms, let alone direction. In this talk, we’ll cut through the noise by exploring six distinct “endgames” for programming in the age of AI. Each offers a different lens on what we build, how we build, and who (or what) is doing the building.

By naming and examining these futures, we gain a clearer view of what’s ahead and a chance to choose our destination.

Why Bolt.new Won and Most DevTools AI Pivots Failed - Victoria Melnikova

2025-06-10 Watch

video

Everyone's pivoting to AI—but most are doing it wrong. After conducting in-depth interviews with leaders at 17 developer tools startups that attempted to "add AI" to their roadmap, I've uncovered the patterns that led to either spectacular success or painful failure. This isn't abstract theory—it's battle-tested wisdom from companies that bet their future on AI and lived to tell the tale.

You'll learn: - The three most common AI pivot traps that led otherwise promising startups to burn through runway with nothing to show for it - Why adding an AI feature doesn't constitute a real AI transformation (and what actually does) - The counterintuitive "backward pivot" strategy that worked for 5 of the most successful transitions - A practical framework for evaluating if your existing developer tooling can meaningfully evolve in the AI era or needs to be reimagined from scratch

AI Engineer World's Fair 2025 - Day 1 Keynotes & MCP track ft. Anthropic MCP team

2025-06-05 Watch

video

full schedule here: https://ai.engineer/schedule

thanks @yashgargk for timestamps:

0:00:00 - start 0:15:15 - Welcome to AI Engineer - Laurie Voss (LlamaIndex) 0:22:17 - Designing AI-Intensive Applications - Shawn Wang (Latent Space) 0:35:46 - Spark to System: Building the Open Agentic Web - Asha Sharma (Microsoft) 0:59:02 - State of Startups and AI 2025 - Sarah Guo (Conviction) 1:24:44 - 2025 in LLMs so far - Simon Willison (Datasette) 1:43:20 - Agentic GraphRAG - Stephen Chin (Neo4j), Andreas Kollegger (Neo4j) 1:47:58 - Track Intros - Laurie Voss (LlamaIndex) 1:51:00 - Break 2:29:26 - MCP Track Intro - Henry Mao (Smithery) 2:31:16 - MCP Origins & RFS - Theodora Chu (Anthropic) 2:49:47 - What we learned from shipping remote MCP support at Anthropic - John Welsh (Anthropic) 3:03:51 - Full Spectrum MCP: Uncovering Hidden Servers and Clients Capabilities - Harald Kirschner (VS Code, Microsoft) 3:18:54 - MCP isn’t good, yet - David Cramer (Sentry) 3:36:34 - Break 5:08:05 - MCP is all you need - Samuel Colvin (Pydantic) 5:25:43 - Observable tools - the state of MCP observability - Alex Volkov (Weights & Biases), Benjamin Eckel (Dylibso) 5:43:00 - The rise of the agentic economy on the shoulders of MCP - Jan Curn (Apify) 6:02:05 - Break 7:08:00 - Buffer 7:09:28 - Closing thoughts on Agentic GraphRAG + Demo - Stephen Chin (Neo4j), Andreas Kollegger (Neo4j) 7:15:22 - Building Agents at Cloud-Scale - Antje Barth (AWS) 7:34:26 - Windsurf everywhere, doing everything, all at once - Kevin Hou (Windsurf) 7:50:31 - Buffer 7:51:30 - #define AI Engineer - Greg Brockman (OpenAI), Shawn Wang (Latent Space)

The 4 Patterns of AI Native Development — Patrick Debois

2025-06-04 Watch

video

AI is fundamentally reshaping software development roles and activities. While the change is obvious, understanding the actual shifts taking place on the individual developer remains challenging.

In this talk, we introduce the four AI Native Dev patterns that are currently emerging: - From producer to manager: we say what AI needs to do - From implementation to intent: we care less on the how but focus on the why - From delivery to discovery: we experiment and learn - From content creation to knowledge: capture knowhow to get better

We backup these patterns by showcasing features in tools that support these shift.

The aim of the patterns is to help grasp how to position you and your team members 's career effectively in this changing landscape.

Breaking the Chain: Agent Continuations for Resumable AI Workflows - Greg Benson

2025-06-03 Watch

video

AI agents are powerful—but brittle. Once an agent chain starts, you either let it run or you tear it down and lose state. Agent Continuations change that contract. Borrowing from programming‑language continuations, we capture an agent’s entire call stack—tools, goals, partial responses—in a compact JSON blob combined with the familiar messages array. The result is a protocol‑level "Agent State" that lets you:

Pause anytime for human-in-the-loop approval gates, rate‑limit resets, or progressive UI updates.
Migrate agents across nodes, clouds, even different agent execution platforms
Checkpoint long‑running multi‑agent plans using off‑the‑shelf storage and enable restarting in the presence of agent failure
Resume seamlessly through standard LLM function‑calling APIs, so every framework that speaks OpenAI JSON can speak continuations.

Our approach works with single-level agent loops and multi-level agents in which agents can call subagents.

Attendees will leave with open‑source Python snippets and a mental model that turns “monolithic” agents into restart‑able, human‑aware services—shrinking failure windows and unlocking new UX patterns for AI products.

Key Takeaways

Why Continuations are a good construct for Agent State
Protocol spec and reference JSON examples and a - Python implementation Live demo: suspend a three‑layer agent with suspending for human approval

** Links **

https://github.com/SnapLogic/agent-continuations https://agentcreator.com

Are MCPs Overhyped? A Rant about MCPs — Henry Mao, Smithery

2025-06-03 Watch

video

AI agents are becoming smarter but lack the broad capability to take action in practice. At Smithery, we believe the missing link is an AI orchestration layer—a unified interface that gives agents context, action, and a way to learn from real interactions. This talk explores the problem space in the Model Context Protocol (MCP) ecosystem and how we're tackling it at Smithery.

GPU-less, Trust-less, Limit-less: Reimagining the Confidential AI Cloud - Mike Bursell

2025-06-03 Watch

video

What happens when private AI models or sensitive data need to run in the public cloud?

Can we still maintain control – without relying on blind trust? Can we eliminate that blind trust and make infrastructure verifiable by design?

In this talk, you’ll discover what a “GPU-less” future really means: not the absence of acceleration, but the freedom to collaborate and deploy private AII workloads in a confidential, self-sovereign AI cloud – with open, on-chain guarantees that centralized clouds simply can’t offer.

No GPU-provider lock-in. No black-box execution. Just algorithmic, sovereign infrastructure – where the confidential cloud is a protocol, not a service.

You’ll learn the foundations of Confidential AI and see real-world results powered by it. Then, through four demos on Super Protocol, you’ll learn how to:

AI Marketplace & Confidentiality Check – Deploy models in a few clicks and verify on-chain they’re running inside hardware-backed confidential environments.
n8n Healthcare AI Workflow – Build and run agentic automations for sensitive data – entirely within confidential environments.
Distributed vLLM Inference – Parallelize LLM inference across multiple GPU servers– with zero data exposure and no dependency on any single provider.
Provable Medical-Data Training & On-Chain Reporting – Train on multiple sensitive datasets inside confidential environments – no data or IP exposed to participants, infrastructure providers, or Super Protocol – and generate verifiable on-chain proofs of exactly what ran, where, and how.

Join us to discover how you can leverage Confidential AI today – and unlock new possibilities.

Extra resources: - NVIDIA on Super Protocol: https://developer.nvidia.com/blog/exploring-the-case-of-super-protocol-with-self-sovereign-ai-and-nvidia-confidential-computing - Website https://superprotocol.com/ - Super AI Marketplace: https://marketplace.superprotocol.com/ - Documentation: https://docs.superprotocol.com/

Grounded Reasoning Systems for Cloud Architecture - Iman Makaremi

2025-06-03 Watch

video

As LLMs move into enterprise workflows, developers face a new kind of architecture challenge: how do you build reliable, interpretable systems powered by agents and reasoning?

This talk unpacks how we designed and implemented an AI orchestration framework for enterprise architecture — combining LangGraph for multi-agent workflows, Flyte for distributed execution, and AWS Bedrock for LLM inference using Claude 3. The product: an AI copilot for enterprise architects, deeply rooted in your tech stack context.

At the core of this system is a domain-specific knowledge graph that acts as long-term memory for the agents. It enables persistent, structured representations of architectural state, system dependencies, and business context — giving the agents the grounding they need to generate accurate recommendations, translate natural language into SQL or code, and maintain continuity across workflows.

We’ll also cover how we’ve integrated observability practices — including planned OpenTelemetry instrumentation — to trace and debug autonomous AI systems in production.

If you’re a developer or AI engineer thinking beyond the chatbot and looking to embed reasoning into complex system design and data tasks, this talk offers an end-to-end blueprint — from orchestration and grounding to production monitoring.

The Agent Native Company — Rick Blalock, Agentuity

2025-06-03 Watch

video

Are you just using AI—or are you building a company around it?

In this talk, I break down what it means to be an agent-native company—a business designed from the ground up with AI agents at the core of operations, culture, and product. Drawing from my own founder experience (building 14 months of product in 8 weeks with just 6 people and a stack of agents), I’ll walk you through the real-world shift happening right now across tech.

🔍 What you'll learn:

The difference between AI-enhanced vs. AI-native orgs

Why the future of hiring is about AI fluency, not just professional networks or credentials

The rise of new job titles like “Agent Manager” (yes, that’s a real job)

How lean teams can use AI agents to achieve 10x—or even 100x—impact

What “culture is the new stack” really means when humans and AI work together

🧠 Featuring real-world examples, practical hiring insights, and a peek into how workflows and job roles are changing fast.

📈 Whether you’re a founder, tech leader, or just curious about the future of work, this is your guide to scaling smart—with AI at the wheel.

Why the Best AI Agents Are Built Without Frameworks (Primitives over Frameworks) — Ahmad Awais, CHAI

2025-06-03 Watch

video

Cursor, v0, chai.new, lovable, bolt — what do they all have in common? They weren’t built on AI frameworks—they're built using primitives optimized for speed, scale, and flexibility.

LLMs are evolving fast—like, literally every week. New standards pop up (looking at you, MCP), and APIs change faster than you can keep track. Frameworks just can't move at this speed.

In this talk, I'll challenge conventional engineering wisdom, sharing my real-world experience scaling thousands of AI agents to handle over 100 million monthly runs.

You'll discover how using AI primitives can dramatically speed up iteration, provide bigger scale, and simplify maintenance.

I'll share eight practical agent architectures—covering memory management, auto tool integration, and simple serverless deployment—to help you quickly build reliable and scalable AI agents.

By the end of this session, you'll clearly see why we must rethink and rebuild our infrastructure and focus on AI-native primitives instead of heavy, bloated, and quickly outdated frameworks.

I wonder if we need another S3-moment but for the AI agent infrastructure.

7 Habits of Highly Effective Generative AI Evaluations - Justin Muller

2025-06-03 Watch

video

Evaluations are the single most reliable indicator of the health and long term viability of any gen AI project. As a Principal Applied AI Architect for AWS, I've had the opportunity to look at over 100 different attempts at evaluation frameworks over the last few years. In this talk I share some stories about the best and worst, and then distill the 7 most common elements I've seen in successful evaluations.

Slides at https://d2ot4ns4zf41bm.cloudfront.net/slides/7+Habits+AI+World's+Fair.pptx

Agentic Enterprise - What your CEO must know about AI - Hubert Misztela

2025-06-03 Watch

video

How large organizations will be transformed by AI? What people and organizations are scared of because of AI? What people do not know about AI Agents? What enterprises need? Why we might be wrong about Agents and LLMs impact all together?

Workflows optimization. AI beyond LLMs and Agents: Representation Learning + GenAI + New Interfaces. Context is the new oil, not the data.

Your CEO has to pivot. Now.

Agents reported thousands of bugs, how many were real? - Ian Butler and Nick Gregory

2025-06-03 Watch

video

Ever had an AI-generated tweak unexpectedly break your entire project? Agentic software development has impressive promise, but the reality still falls short. In this talk we introduce SM-100, a groundbreaking benchmark designed specifically to evaluate autonomous agents on software maintenance tasks.

We're also excited to announce Bismuth, a generalist software agent with strong performance on such maintenance tasks.

https://bismuth.sh & https://sm100bench.com

Arrakis: How To Build An AI Sandbox From Scratch - Abhishek Bhardwaj, OpenAI

2025-06-03 Watch

video

Arrakis (https://github.com/abshkbh/arrakis) provides MicroVM-based secure sandboxes for code execution and full computer use. It features first-class support for backtracking, a Python SDK, and a Model Context Protocol (MCP) server.

In this talk, we go under the hood to explore how to architect an AI sandbox from the ground up. We’ll also dive into why sandboxes are becoming essential infrastructure for AI models and agents — enabling the next big unlock in intelligence.

Links - Slides for the talk available here - https://tinyurl.com/arrakis-aie Vibe coding with Claude and Arrakis -https://x.com/abshkbh/status/1907480355529203809

Blender MCP and The Future Of Creative Tools - Siddharth Ahuja

2025-06-03 Watch

video

A dive into the Blender MCP to see how it was made, what use cases for creators it unlocks, and how the future might look for creators.

Building Reliable Support Agents Using the Effect Typescript Library - Michael Fester

2025-06-03 Watch

video

In this video, we walk through how our team built production-ready support agents using the Effect TypeScript library. The video includes a demo of the agent in action, along with a breakdown of the architecture and design decisions behind it.

We cover what worked well, what was challenging, and why we are continuing to invest in Effect for future development. If you’re building internal tools, working with LLMs, or automating customer support, this talk shares practical lessons on creating robust systems with strong guarantees.

Topics include: Architectural patterns for agent-based systems Tradeoffs in developer experience Techniques for reliability and fault tolerance

Feel free to reach out or share your thoughts: Twitter: x.com/michaelfester LinkedIn: linkedin.com/in/michaelfester

The Coherence Trap: Why LLMs Feel Smart (But Aren’t Thinking) - Travis Frisinger

2025-06-03 Watch

video

Why AI engineers must rethink what intelligence means in the age of large language models.

LLMs aren’t thinking. No awareness. No reasoning. No plan. And yet—they feel smart. Shockingly so.

This talk introduces coherence reconstruction, a mental model that explains why LLMs are so useful despite their lack of true understanding. You’ll learn how they generate meaning through latent coherence—a kind of internal gravity that pulls language into alignment with context.

We’ll break down:

Why hallucinations happen—and why you can’t fully eliminate them.
How prompts act like force vectors, shaping behavior in structured ways.
What this all means for reasoning tasks, evaluation practices, and agent design.

If you’re building tools, agents, or workflows with LLMs, this talk will reframe how you think about reliability, cognition, and what "understanding" even means.

🔗 Additional resources: Blog: https://aibuddy.software/ AI Decision Loop Paper: https://aibuddy.software/papers/2500_chatgpt_conversations_case_study.pdf AI Decision Loop Git Repo: https://github.com/T-rav/gpt-chat-analysis AI Coherence Paper: https://aibuddy.software/papers/AI_Coherence_A_Theory_of_Utility_in_Large_Language_Models.pdf Cat Metal Album: https://www.youtube.com/watch?v=gdV5l0JvdNo&list=PL0X82GOpevvYfPLM-JibRJEizHqCJ6U4H&index=7