NEO: Unlocking Scalable LLM Inference with Smart CPU Offloading

2025-12-09 · AI/ML Conversations Meetup:Smart CPU Offloading for Scalable LLM Inference

talk

Date: 2025-12-09. NEO: Unlocking Scalable LLM Inference with Smart CPU Offloading.

Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare

2025-12-09 · PyData Boston 2025

talk

by Aman Bhandari , Lily Xu

AI/ML GenAI

Informed Consent Forms (ICFs) are critical documents in clinical trials. They are the first, and often most crucial, touchpoint between a patient and a clinical trial study. Yet the process of developing them is laborious, high-stakes, and heavily regulated. Each form must be tailored to jurisdictional requirements and local ethics boards, reviewed by cross-functional teams, and written in plain language that patients can understand. Producing them at scale across countries and disease areas demands manual effort and creates major operational bottlenecks. We used a combination of traditional AI and large language models to autodraft the ICF across clinical trial types, across countries and across disease areas at scale. The build, test, iteration and deployment offers both technical and non technical lessons learned for generative AI applications for complex documents at scale and for meaningful impact.

Generative Programming with Mellea: from Agentic Soup to Robust Software

2025-12-08 · PyData Boston 2025 Watch

talk

by Jake Lorocco , Nathan Fulton

AI/ML Python

Agentic frameworks make it easy to build and deploy compelling demos. But building robust systems that use LLMs is difficult because of inherent environmental non-determinism. Each user is different, each request is different; the very flexibility that makes LLMs feel magical in-the-small also makes agents difficult to wrangle in-the-large.

Developers who have built large agentic-like systems know the pain. Exceptional cases multiply, prompt libraries grow, instructions are co-mingled with user input. After a few iterations, an elegant agent evolves into a big ball of mud.

This hands-on tutorial introduces participants to Mellea, an open-source Python library for writing structured generative programs. Mellea puts the developer back in control by providing the building blocks needed to circumscribe, control, and mediate essential non-determinism.

Going multi-modal: How to leverage the lastest multi-modal LLMs and deep learning models on real world applications

2025-12-08 · PyData Boston 2025

talk

by Isaac Godfried

Multimodal deep learning models continue improving rapidly, but creating real-world applications that effectively leverage multiple data types remains challenging. This hands-on tutorial covers model selection, embedding storage, fine-tuning, and production deployment through two practical examples: a historical manuscript search system and flood forecasting with satellite imagery and time series data.

"Save your API Keys for someone else" -- Using the HuggingFace and Ollama ecosystems to run good-enough LLMs on your laptop

2025-12-08 · PyData Boston 2025

talk

by Ian Stokes-Rees

Analytics API GenAI Python

In this 90 minute tutorial we'll get anyone with some basic Python and Command Line skills up and running with their own 100% laptop based set of LLMs, and explain some successful patterns for leveraging LLMs in a data analysis environment. We'll also highlight pit-falls waiting to catch you out, and encourage you that your pre-GenAI analytics skills are still relevant today and likely will be for the foreseeable future by demonstrating the limits of LLMs for data analysis tasks.

Building LLM Agents Made Simple

2025-12-08 · PyData Boston 2025 Watch

talk

by Eric Ma

API GitHub Python

Learn to build practical LLM agents using LlamaBot and Marimo notebooks. This hands-on tutorial teaches the most important lesson in agent development: start with workflows, not technology.

We'll build a complete back-office automation system through three agents: a receipt processor that extracts data from PDFs, an invoice writer that generates documents, and a coordinator that orchestrates both. This demonstrates the fundamental pattern for agent systems—map your boring workflows first, build focused agents for specific tasks, then compose them so agents can use other agents as tools.

By the end, you'll understand how to identify workflows worth automating, build agents with decision-making loops, compose agents into larger systems, and integrate them into your own work. You'll leave with working code and confidence to automate repetitive tasks.

Prerequisites: Intermediate Python, familiarity with APIs, basic LLM understanding. Participants should have Ollama and models installed beforehand (setup instructions provided).

Materials: GitHub repository with Marimo notebooks. Setup uses Pixi for dependency management.

Create your Health Research Agent

2025-12-08 · PyData Boston 2025

talk

by Leonardo Ferreira

AI/ML Docker Linux Python

PubMed is a free search interface for biomedical literature, including citations and abstracts from many life science scientific journals. It is maintained by the National Library of Medicine at the NIH. Yet, most users only interact with it through simple keyword searches. In this hands-on tutorial, we will introduce PubMed as a data source for intelligent biomedical research assistants — and build a Health Research AI Agent using modern agentic AI frameworks such as LangChain, LangGraph, and Model Context Protocol (MCP) with minimum hardware requirements and no key tokens. To ensure compatibility, the agent will run in a Docker container which will host all necessary elements.

Participants will learn how to connect language models to structured biomedical knowledge, design context-aware queries, and containerize the entire system using Docker for maximum portability. By the end, attendees will have a working prototype that can read and reason over PubMed abstracts, summarize findings according to a semantic similarity engine, and assist with literature exploration — all running locally on modest hardware.

Expected Audience: Enthusiasts, researchers, and data scientists interested in AI agents, biomedical text mining, or practical LLM integration. Prior Knowledge: Python and Docker familiarity; no biomedical background required. Minimum Hardware Requirements: 8GB RAM (+16GB recommended), 30GB disk space, Docker pre-installed. MacOS, Windows, Linux. Key Takeaway: How to build a lightweight, reproducible research agent that combines open biomedical data with modern agentic AI frameworks.

Hands-On with LLM-Powered Recommenders: Hybrid Architectures for Next-Gen Personalization

2025-12-08 · PyData Boston 2025 Watch

talk

by Sheetal Borar , Astha Puri

Data Streaming

Recommender systems power everything from e-commerce to media streaming, but most pipelines still rely on collaborative filtering or neural models that focus narrowly on user–item interactions. Large language models (LLMs), by contrast, excel at reasoning across unstructured text, contextual information, and explanations. This tutorial bridges the two worlds. Participants will build a hybrid recommender system that uses structured embeddings for retrieval and integrates an LLM layer for personalization and natural-language explanations. We’ll also discuss practical engineering constraints: scaling, latency, caching, distillation/quantization, and fairness. By the end, attendees will leave with a working hybrid recommender they can extend for their own data, along with a playbook for when and how to bring LLMs into recommender workflows responsibly.

Building Agentic AI: Workflows, Fine-Tuning, Optimization, and Deployment

2025-12-08 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Sinan Ozdemir (LoopGenius)

AI/ML RAG Vector DB ai-ml artificial-intelligence-ai data generative-ai

Transform Your Business with Intelligent AI to Drive Outcomes Building reactive AI applications and chatbots is no longer enough. The competitive advantage belongs to those who can build AI that can respond, reason, plan, and execute. Building Agentic AI: Workflows, Fine-Tuning, Optimization, and Deployment takes you beyond basic chatbots to create fully functional, autonomous agents that automate real workflows, enhance human decision-making, and drive measurable business outcomes across high-impact domains like customer support, finance, and research. Whether you're a developer deploying your first model, a data scientist exploring multi-agent systems and distilled LLMs, or a product manager integrating AI workflows and embedding models, this practical handbook provides tried and tested blueprints for building production-ready systems. Harness the power of reasoning models for applications like computer use, multimodal systems to work with all kinds of data, and fine-tuning techniques to get the most out of AI. Learn to test, monitor, and optimize agentic systems to keep them reliable and cost-effective at enterprise scale. Master the complete agentic AI pipeline Design adaptive AI agents with memory, tool use, and collaborative reasoning capabilities Build robust RAG workflows using embeddings, vector databases, and LangGraph state management Implement comprehensive evaluation frameworks beyond accuracy, including precision, recall, and latency metrics Deploy multimodal AI systems that seamlessly integrate text, vision, audio, and code generation Optimize models for production through fine-tuning, quantization, and speculative decoding techniques Navigate the bleeding edge of reasoning LLMs and computer-use capabilities Balance cost, speed, accuracy, and privacy in real-world deployment scenarios Create hybrid architectures that combine multiple agents for complex enterprise applications Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

AWS re:Invent 2025 - Build production AI agents with the Strands Agents SDK for TypeScript (AIM3331)

2025-12-07 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML AWS Cloud Computing JavaScript Python TypeScript

Discover how to build enterprise-ready AI agents using the newly launched Strands Agents SDK for TypeScript. This session introduces developers to a simple model-driven framework for building agents that run on any cloud, support multiple LLM providers, and integrate with the tools you already have. Learn how TypeScript developers can now leverage the same production-ready agent framework that Python teams have been using, with full type safety and seamless integration into modern JavaScript ecosystems. We'll cover key features, demonstrate multi-agent patterns, and explore deployment options from Amazon EKS to Amazon Bedrock AgentCore with live coding examples.

Learn More: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Customize AI models & accelerate time to production with Amazon SageMaker AI

2025-12-07 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML AWS Cloud Computing Data Quality Amazon SageMaker

Customizing models often requires lengthy iteration cycles. Now with Amazon SageMaker AI, you can accelerate the model customization process from months to days. With an easy-to-use interface, you can quickly get started and customize popular models with your own data, including Amazon Nova, Llama, Qwen, DeepSeek, and GPT-OSS, with the latest customization techniques such as reinforcement learning and direct preference optimization. In addition, with the AI agent-guided workflow (in preview), you can use natural language to generate synthetic data, analyze data quality, and handle model training and evaluation—all entirely serverless. Join us to learn how you can accelerate your model customization journey.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Symbolic AI in the age of LLMs (DAT443)

2025-12-07 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML AWS Cloud Computing GenAI

While generative AI captures headlines, the untapped potential lies in combining it with decades-proven symbolic AI techniques. This session explores how organizations can leverage symbolic AI, such as ontologies and logic-based reasoning to build knowledge graphs and to enhance their AI capabilities. Dive deep into these topics to understand what ontologies look like, where semantics comes from, and what it means to build working knowledge graphs. Learn concrete strategies for defining and adopting ontologies and how to use reasoning for your benefit.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Integrate any agent framework with Amazon Bedrock AgentCore (AIM396)

2025-12-06 · AWS re:Invent 2024 Watch

video

Agile/Scrum API AWS Cloud Computing Cyber Security

Bring existing agents to AWS with Amazon Bedrock AgentCore. Whether you've built custom agents or use frameworks like LangChain, LangGraph, CrewAI, or LlamaIndex, this session shows how to run them on secure, scalable AWS infrastructure without rewriting your logic. Discover how AgentCore's API-based orchestration integrates external agent frameworks and services, centralizing tooling, observability, memory, and security. Learn how Cohere Health successfully integrated their healthcare workflow agents with AgentCore to enable secure, scalable processing of healthcare prior authorization decisions, leveraging AgentCore's services like Runtime and Memory to meet strict healthcare compliance requirements while maintaining the flexibility to use their chosen frameworks.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

The Truth About AI Agents, Hardware Wars, and Mixed Model Arts. Freestyle Fridays w/ Matt Housley

2025-12-05 · The Joe Reis Show Listen

podcast_episode

by Matt Housley (Halfpipe Systems) , Joe Reis (DeepLearning.AI)

AI/ML AWS

It's Friday! Matt Housley and I catch up to discuss the aftermath of AWS re:Invent and why the industry’s obsession with AI Agents might be premature. We also dive deep into the hardware wars between Google and NVIDIA , the "brain-damaged" nature of current LLMs , and the growing "enshittification" of the internet and platforms like LinkedIn. Plus, I reveals some details about my upcoming "Mixed Model Arts" project.

AWS re:Invent 2025 - Accelerate analytics and AI w/ an open and secure lakehouse architecture-ANT309

2025-12-05 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML Analytics AWS Cloud Computing Data Lakehouse Iceberg Amazon SageMaker Cyber Security

Data lakes, data warehouses, or both? Join this session to explore how to build a unified, open, and secure data lakehouse architecture, fully compatible with Apache Iceberg, in Amazon SageMaker. Learn how the lakehouse breaks down data silos and opens your data estate offering flexibility to use your preferred query engines and tools that accelerate time to insights. Learn about recent launches that improve data interoperability and performance, and enable large language models (LLMs) and AI agents to interact with your data. Discover robust security features, including consistent fine-grained access controls, attribute-based access control, and tag-based access control that help democratize data without compromises.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

2025-12-05 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML API AWS Cloud Computing

Building on our advancements unveiled at the New York Summit, this session explores the evolution of Nova Agents beyond conventional web browsing, human-in-the-loop (HITL) oversight, and standard tool use. We will dive into hybrid approaches, innovative LLM-tool interactions, and API-driven strategies that boost efficiency, reliability, and autonomy in agentic AI systems. Additionally, we'll highlight how supervisors can utilize HITL to approve, refine, and assume control of agent workflows, enabling more robust and flexible AI operations.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Coding an MCP server for Amazon Aurora (DAT429)

2025-12-04 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML AWS Aurora Cloud Computing

Do you want to build an MCP server for your application running on Amazon Aurora where you control the access without handing full control to an LLM and the AI agents simply call it? In this code talk, learn about MCP servers, how they can help you, and key considerations to ensure you can safely and predictably enable LLMs to call it. See live coding firsthand and learn the code required to build a MCP that is designed to meet your requirements. Discover how you can integrate AI agents with MCP servers while ensuring secure database access.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

The Real Superpower of AI Isn't Technical — It's Contextual

2025-12-04 · #1 - London - Data & Agentic AI in Financial Services - Brainstation

talk

AI/ML Cloud Computing

We admire LLMs for speed and quality, but technology alone doesn’t differentiate. Context does. Generic tools solve generic problems; your cloud infrastructure, cost models, and optimization workflows are specific to your organisation. This talk shows how AI, tuned to your context, delivers value that used to require bespoke development. Unlike static tools or dedicated teams, AI evolves with your environment in real time, both a blessing and a curse for the needs of Financial Services.

AWS re:Invent 2025 - Performance engineering on Neuron: How to optimize your LLM with NKI (AIM414)

2025-12-04 · AWS re:Invent 2024 Watch

video

Agile/Scrum AWS Cloud Computing

Trying to eek out every ounce of performance of your LLM? Trying to speed up inference or understand what is going on inside a language model? In this session, you will learn how to profile a model on AWS purpose built accelerators and build a custom kernel to achieve better performance.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Scaling foundation model inference on Amazon SageMaker AI (AIM424)

2025-12-03 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML AWS Cloud Computing Amazon SageMaker Data Streaming

Learn how to optimize and deploy popular open-source models like Qwen3, GPT-OSS, and Llama4 using advanced inference engines such as vLLM on SageMaker. We'll explore key features including bidirectional streaming for audio and text applications, and share proven optimization techniques for inferencing. Through live demos, learn to boost performance with KV caching, intelligent routing, and autoscaling to maintain stability under varying loads. We'll demonstrate solutions for building Agentic workflows with SageMaker AI, LangChain, and Amazon Bedrock AgentCore integration and share best practices helping you confidently move from prototype to trusted AI experiences that delight users.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

talk-data.com

LLM

Activity Trend

Top Events

Top Speakers

NEO: Unlocking Scalable LLM Inference with Smart CPU Offloading

Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare

Generative Programming with Mellea: from Agentic Soup to Robust Software

Going multi-modal: How to leverage the lastest multi-modal LLMs and deep learning models on real world applications

"Save your API Keys for someone else" -- Using the HuggingFace and Ollama ecosystems to run good-enough LLMs on your laptop

Building LLM Agents Made Simple

Create your Health Research Agent

Hands-On with LLM-Powered Recommenders: Hybrid Architectures for Next-Gen Personalization

Building Agentic AI: Workflows, Fine-Tuning, Optimization, and Deployment

AWS re:Invent 2025 - Build production AI agents with the Strands Agents SDK for TypeScript (AIM3331)

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Customize AI models & accelerate time to production with Amazon SageMaker AI

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Symbolic AI in the age of LLMs (DAT443)

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Integrate any agent framework with Amazon Bedrock AgentCore (AIM396)

AWSreInvent #AWSreInvent2025 #AWS

The Truth About AI Agents, Hardware Wars, and Mixed Model Arts. Freestyle Fridays w/ Matt Housley

AWS re:Invent 2025 - Accelerate analytics and AI w/ an open and secure lakehouse architecture-ANT309

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Coding an MCP server for Amazon Aurora (DAT429)

AWSreInvent #AWSreInvent2025 #AWS

The Real Superpower of AI Isn't Technical — It's Contextual

AWS re:Invent 2025 - Performance engineering on Neuron: How to optimize your LLM with NKI (AIM414)

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Scaling foundation model inference on Amazon SageMaker AI (AIM424)

AWSreInvent #AWSreInvent2025 #AWS