Agents are powerful—but without feedback, they're flying blind. In this talk, we’ll walk through how to build self-improving agents by closing the loop with evaluation, experimentation, tracing, and prompt optimization. You’ll learn how to capture the right telemetry, run meaningful tests, and apply insights in a way that actually improves performance over time. Whether you’re building copilots, chatbots, or autonomous workflows, this session will give you the practical tools and architecture patterns you need to make your agents smarter—automatically.
talk-data.com
Topic
prompt optimization
2
tagged
Activity Trend
Large language models (LLMs) have achieved impressive performance in many domains, including code generation and reasoning. However, to accomplish challenging tasks, generating the correct solution in one go becomes challenging. In this talk, I will first discuss our work self-debugging, which instructs LLMs to debug their own predicted programs. In particular, we demonstrate that self-debugging can teach LLMs to perform rubber duck debugging; i.e., without any human feedback on the code correctness or error messages, the model is able to identify its mistakes by investigating the execution results and explaining the generated code in natural language. Self-debugging notably improves both the model performance and sample efficiency, matching or outperforming baselines that generate more than 10× candidate programs. In the second part, I will further demonstrate that LLMs can also improve their own prompts to achieve better performance, acting as optimizers.