talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (1 result)

Activities & events

Title & Speakers Event
Bennie Haelen – author

In today's race to harness generative AI, many teams struggle to integrate these advanced tools into their business systems. While platforms like GPT-4 and Google's Gemini are powerful, they aren't always tailored to specific business needs. This book offers a practical guide to building scalable, customized AI solutions using the full potential of data lakehouse architecture. Author Bennie Haelen covers everything from deploying ML and GenAI models in Databricks to optimizing performance with best practices. In this must-read for data professionals, you'll gain the tools to unlock the power of large language models (LLMs) by seamlessly combining data engineering and data science to create impactful solutions. Learn to build, deploy, and monitor ML and GenAI models on a data lakehouse architecture using Databricks Leverage LLMs to extract deeper, actionable insights from your business data residing in lakehouses Discover how to integrate traditional ML and GenAI models for customized, scalable solutions Utilize open source models to control costs while maintaining model performance and efficiency Implement best practices for optimizing ML and GenAI models within the Databricks platform

data ai-ml artificial-intelligence-ai generative-ai AI/ML Data Engineering Data Lakehouse Data Science Databricks GenAI LLM
O'Reilly AI & ML Books

I missed my parents, so I built an AI that talks like them. This isn’t about replacing people—it’s about remembering the voices that make us feel safe. In this 90-minute episode of Data & AI with Mukundan, we explore what happens when technology stops chasing efficiency and starts chasing empathy. Mukundan shares the story behind “What Would Mom & Dad Say?”, a Streamlit + GPT-4 experiment that generates comforting messages in the voice of loved ones. You’ll hear: The emotional spark that inspired the projectThe plain-English prompts anyone can use to teach AI empathyBoundaries & ethics of emotional AIHow this project reframed loneliness, creativity, and connectionTakeaway: AI can’t love you—but it can remind you of the people who do. 🔗 Try the free reflection prompts below: THE ONE-PROMPT VERSION: “What Would Mom & Dad Say?”
“You are speaking to me as one of my parents. Choose the tone I mention: either Mom (warm and reflective) or Dad (practical and encouraging). First, notice the emotion in what I tell you—fear, stress, guilt, joy, or confusion—and name it back to me so I feel heard. Then reply in 3 parts: Start by validating what I’m feeling, in a caring way.Share a short story, lesson, or perspective that fits the situation.End with one hopeful or guiding question that helps me think forward. Keep your words gentle, honest, and simple. No technical language. Speak like someone who loves me and wants me to feel calm and capable again.”

Join the Discussion (comments hub): https://mukundansankar.substack.com/notes Tools I use for my Podcast and Affiliate PartnersRecording Partner: Riverside → Sign up here (affiliate)Host Your Podcast: RSS.com (affiliate )Research Tools: Sider.ai (affiliate)Sourcetable AI: Join Here(affiliate)🔗 Connect with Me:Free Email NewsletterWebsite: Data & AI with MukundanGitHub: https://github.com/mukund14Twitter/X: @sankarmukund475LinkedIn: Mukundan SankarYouTube: Subscribe

AI/ML GitHub LLM Spark

What if your job hunt could run like a data system? In this episode, I share the story of how I used three AI agents — Researcher, Writer, and Reviewer — to rebuild my job search from the ground up. These agents read job descriptions, tailor resumes, and even critique tone and clarity — saving hours every week. But this episode isn’t just about automation. It’s about agency. I’ll talk about rejection, burnout, and the mindset shift that changed everything: treating every rejection as a data point, not a defeat. Whether you’re in tech, analytics, or just tired of the job search grind — this one’s for you. 🔹 Learn how I automated resume tailoring with GPT-4 🔹 Understand how to design AI systems that protect your mental energy 🔹 Discover why “efficiency” means doing less of what drains you 🔹 Hear the emotional story behind building these agents from scratch Join the Discussion (comments hub): https://mukundansankar.substack.com/notesTools I use for my Podcast and Affiliate PartnersRecording Partner: Riverside → Sign up here (affiliate)Host Your Podcast: RSS.com (affiliate )Research Tools: Sider.ai (affiliate)Sourcetable AI: Join Here(affiliate)🔗 Connect with Me:Free Email NewsletterWebsite: Data & AI with MukundanGitHub: https://github.com/mukund14Twitter/X: @sankarmukund475LinkedIn: Mukundan SankarYouTube: Subscribe

AI/ML Analytics GitHub LLM
Small & Smart AI Models 2025-10-31 · 18:50

The AI landscape is evolving beyond gigantic models like GPT-4 towards a new generation of small, smart, and specialised models that can run privately, securely and efficiently on everyday devices. In this talk, Mehmood explores how these compact models, trained on domain-specific data, deliver powerful performance while reducing energy costs, improving privacy, and removing the need for constant cloud access. From customer service chatbots that understand regional dialects to intelligent on-device assistants in healthcare and retail, discover how small AI is making intelligence more sustainable, secure, and accessible for businesses of all sizes.

AI/ML Cloud Computing LLM
PyData Bradford - October Meetup

What happens when an AI starts asking better questions than you? In this 60-minute episode, I share the real story behind “The AI That Thinks Like an Analyst” — a Streamlit + GPT-4 project that changed the way I see data, curiosity, and creativity. This isn’t a technical tutorial. It’s a journey into the mind of a data professional learning to think deeper — and how building this AI taught me the most human lesson of all: how to stay curious. We’ll explore: Why the hardest part of analysis isn’t code — it’s curiosity.How I built a privacy-first Streamlit app that generates questions instead of answers.What AI can teach us about slowing down, observing, and thinking like explorers.The moment I realized data analysis and self-reflection are the same skill.If you’ve ever felt stuck staring at your data, unsure what to ask next — this episode is for you. 📖 Read the full story: https://mukundansankar.substack.com/p/the-no-upload-ai-analyst-v4-secure Join the Discussion (comments hub): https://mukundansankar.substack.com/notesTools I use for my Podcast and Affiliate PartnersRecording Partner: Riverside → Sign up here (affiliate)Host Your Podcast: RSS.com (affiliate )Research Tools: Sider.ai (affiliate)Sourcetable AI: Join Here(affiliate)🔗 Connect with Me:Free Email NewsletterWebsite: Data & AI with MukundanGitHub: https://github.com/mukund14Twitter/X: @sankarmukund475LinkedIn: Mukundan SankarYouTube: Subscribe

AI/ML GitHub LLM
Data & AI with Mukundan | Learn AI by Building

The rapid growth of generative AI, driven by models like OpenAI's GPT-4.1, GPT-4.5, o3, and DeepSeek’s R1, has captured the attention of consumers, businesses, and executives worldwide. These powerful language models rely heavily on the quality of input prompts, making prompt engineering a vital skill for unlocking their full potential. In this interactive, demo-driven session, participants will explore essential and advanced techniques in prompt design, including: • What is Prompt Engineering? • Advanced Prompting Techniques • Few-shot Prompting (guiding responses with examples) • Chain-of-Thought (CoT) Prompting (step-by-step reasoning) • Instruction Fine-tuning (enforcing specific constraints) • Persona-based Prompting (customizing for roles) • Multi-step Prompting (iterative output refinement) • Debugging & Refining AI Responses • Leveraging reasoning models like o3 • Prompt Engineering Best Practices Attendees will depart with a clear framework and practical suggestions for crafting effective prompts and maximizing the value of AI tools.

AI/ML GenAI LLM
Talk to AI Better: Advanced Prompt Engineering in Action

Hands-On: Prompt Engineering Online Bootcamp Date: 06 September 2025, 9 AM to 12.30 PM Eastern Time Level: Beginners/Intermediate Registration Link: https://www.eventbrite.com/e/hands-on-prompt-engineering-bootcamp-tickets-1443819937299?aff=oddtdtcreator Who Should Attend? This hands-on workshop is for developers, senior software engineers, IT pros, architects, IT managers, citizen developers, technology product managers, IT leaders, enterprise architects, chief analytics officers, chief information officers, chief technology officers, and decision-makers who want to learn how prompt engineering can help bring AI into next-gen apps and agents. Experience with C#, Python, or JavaScript is helpful but not required. No prior knowledge of AI is necessary. While this isn’t a data & analytics-focused session, data scientists, data stewards, and tech-savvy data protection officers will also find it super valuable. Description: The rise of generative AI, led by ChatGPT and other large language models, has sparked serious interest across industries and leadership teams. Models like OpenAI’s GPT-3.5, GPT-4o, and GPT-4.5 rely heavily on the quality of the prompts they're given. The better the prompt, the better the output. That makes prompt engineering a must-have skill. In this half-day virtual hands-on workshop, Microsoft AI and Business Applications MVP and Microsoft Certified Trainer, Prashant G Bhoyar, will cover these topics in detail:

  • What is Prompt Engineering? Learn how prompts work, why they matter, and what affects their performance.
  • Advanced Prompting Techniques:
  • Few-shot prompting (use examples to get better answers)
  • Chain-of-thought prompting (step-by-step reasoning)
  • Instruction fine-tuning (force the model to follow your rules)
  • Persona-based prompting (get responses in specific roles)
  • Multi-step prompting (refine results through iteration)
  • Debugging & Refining AI Responses
  • How to fine-tune prompts for ChatGPT, Claude, Google Gemini, and other tools
  • Automating prompts with APIs and workflows
  • Working with reasoning models like o3
  • Best practices for real-world use

By the end of this bootcamp, you'll be ready to create clear, powerful prompts that unlock the full potential of generative AI in your projects. The labs will feature a mix of Python, C#, and low-code/no-code UI tools, so even if you don't want to write code, you're covered. Workshop Resources: You’ll get access to Microsoft Copilot, Azure, and Azure OpenAI services (worth $500) for hands-on labs. If you already have your own Microsoft Copilot or Azure subscription, you’re welcome to use that too. Attendee Workstation Requirements: You must have your own computer (Windows or Mac) with:

  • Camera, speakers, microphone, and a reliable internet connection. Tablets will not work for this workshop.
  • A modern browser (Microsoft Edge, Google Chrome, Firefox, or Safari).
  • Access to www.azure.com and https://copilotstudio.microsoft.com.
  • Nice to have is the ability to run C# 10 or Python code, using Visual Studio 2022, VSCode 1.66+, Visual Studio for Mac, Rider, or a similar IDE.
Hands-On: Prompt Engineering Online Bootcamp

Description: Join Mukundan Sankar as he explores the challenges of delivering an effective elevator pitch and how AI can assist in crafting one. Mukundan shares personal anecdotes and demonstrates AI-generated pitches tailored for different career stages. Key Takeaways: The importance of a well-crafted elevator pitch How AI can personalize pitches for different roles Real-life examples of AI-generated pitches Resources: 1]Elevator Pitch AI Code Mukundan's Blog Post: https://substack.com/home/post/p-170400977 2] Thinking about starting a podcast but worried it’ll take forever to grow? Here’s the thing — you don’t need a huge audience to get started or to earn money. I run my show on RSS.com, and it’s the simplest way to get your podcast live on Spotify, Apple, Amazon, YouTube, iHeartRadio, Deezer, and more — all in one step. Their analytics tell me exactly where my listeners are tuning in from, so I know what’s working. And here’s the best part — with their paid plan, you can start earning revenue through ads with as little as 10 downloads a month. That’s right — you don’t need to wait for thousands of listeners to start monetizing. Start your podcast for free today at RSS.com. (Affiliate link — I may earn a commission at no extra cost to you.) 3] 💡 Sider.ai– Your AI Copilot for Productivity: Sider.ai is the all-in-one AI assistant that works inside your browser, letting you research, write, summarize, and brainstorm without switching tabs. Whether you’re prepping for an interview, drafting your next pitch, or refining your business plan, Sider.ai can supercharge your productivity. It’s like having GPT-4 on standby, ready to help you think faster and write better. Try Sider.ai today and see how much more you can accomplish in less time. (Affiliate link — I may earn a commission if you sign up.)

AI/ML Analytics LLM
Data & AI with Mukundan | Learn AI by Building

To participate, please complete your free registration here

𝗖𝗮𝗻 𝗲𝘃𝗲𝗿𝘆 𝗔𝗜 𝗼𝘂𝘁𝗽𝘂𝘁 𝗯𝗲 𝘁𝗿𝘂𝘀𝘁𝗲𝗱?

During this event you'll explore methods for evaluating AI applications using three tools designed to protect you from generative mistakes.

1. Testing AI Applications with DeepEval DeepEval is an open-source evaluation framework designed for structured and automated testing of AI outputs. It allows you to define custom metrics, set expectations, and benchmark responses from LLMs. In this session, we'll explore how QA engineers and developers can use DeepEval to test the quality, accuracy, and reliability of AI-generated responses across different use cases like chatbots, summarization, and code generation.

2. Testing AI Applications with LLM as Judge LLM-as-a-Judge is a powerful technique where an AI model evaluates the outputs of another model. Instead of relying solely on manual review or static metrics, we'll learn how to use trusted LLMs (like GPT-4) to provide qualitative assessments-grading correctness, coherence, tone, or factuality. This method enables scalable and human-like evaluation in real-time AI testing pipelines.

3. Evaluating LLMs with Hugging Face Evaluate Hugging Face's evaluate library offers a robust suite of prebuilt metrics and tools to measure the performance of LLMs and NLP models. This topic will cover how to integrate and use evaluate in your testing workflows to assess text generation, classification, translation, and more-using standardized metrics like BLEU, ROUGE, and accuracy, alongside custom metrics for GenAI applications.

A Q&A session with Karthik K.K. will follow. Prepare your questions!

AI QA Test Engineering: Testing AI Applications with the Power of AI

Join us for the third of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

When

June 27 at 9 AM Pacific

Where

Online. Register for the Zoom!

MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders

We present MedVAE, a family of six generalizable 2D and 3D variational autoencoders trained on over one million images from 19 open-source medical imaging datasets using a novel two-stage training strategy. MedVAE downsizes high-dimensional medical images into compact latent representations, reducing storage by up to 512× and accelerating downstream tasks by up to 70× while preserving clinically relevant features. We demonstrate across 20 evaluation tasks that these latent representations can replace high-resolution images in computer-aided diagnosis pipelines without compromising performance. MedVAE is open-source with a streamlined finetuning pipeline and inference engine, enabling scalable model development in resource-constrained medical imaging settings.

About the Speakers

Ashwin Kumar is a PhD Candidate in Biomedical Physics at Stanford University, advised by Akshay Chaudhari and Greg Zaharchuk. He focuses on developing deep learning methodologies to advance medical image acquisition and analysis.

Maya Varma is a PhD student in computer science at Stanford University. Her research focuses on the development of artificial intelligence methods for addressing healthcare challenges, with a particular focus on medical imaging applications.

Leveraging Foundation Models for Pathology: Progress and Pitfalls

How do you train ML models on pathology slides that are thousands of times larger than standard images? Foundation models offer a breakthrough approach to these gigapixel-scale challenges. This talk explores how self-supervised foundation models trained on broad histopathology datasets are transforming computational pathology. We’ll examine their progress in handling weakly-supervised learning, managing tissue preparation variations, and enabling rapid prototyping with minimal labeled examples. However, significant challenges remain: increasing computational demands, the potential for bias, and questions about generalizability across diverse populations. This talk will offer a balanced perspective to help separate foundation model hype from genuine clinical value.

About the Speaker

Heather D. Couture is a consultant and founder of Pixel Scientia Labs, where she partners with mission-driven founders and R&D teams to support applications of computer vision for people and planetary health. She has a PhD in Computer Science and has published in top-tier computer vision and medical imaging venues. She hosts the Impact AI Podcast and writes regularly on LinkedIn, for her newsletter Computer Vision Insights, and for a variety of other publications.

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

Recent advances in promptable segmentation have transformed medical imaging workflows, yet most existing models are constrained to static 2D or 3D applications. This talk presents LesionLocator, the first end-to-end framework for universal 4D lesion segmentation and tracking using dense spatial prompts. The system enables zero-shot tumor analysis across whole-body 3D scans and multiple timepoints, propagating a single user prompt through longitudinal follow-ups to segment and track lesion progression. Trained on over 23,000 annotated scans and supplemented with a synthetic time-series dataset, LesionLocator achieves human-level performance in segmentation and outperforms state-of-the-art baselines in longitudinal tracking tasks. The presentation also highlights advances in 3D interactive segmentation, including our open-set tool nnInteractive, showing how spatial prompting can scale from user-guided interaction to clinical-grade automation.

About the Speaker

Maximilian Rokussis is a PhD scholar at the German Cancer Research Center (DKFZ), working in the Division of Medical Image Computing under Klaus Maier-Hein. He focuses on 3D multimodal and multi-timepoint segmentation with spatial and text prompts. With several MICCAI challenge wins and first-author publications at CVPR and MICCAI, he co-leads the Helmholtz Medical Foundation Model initiative and develops AI solutions at the interface of research and clinical radiology.

LLMs for Smarter Diagnosis: Unlocking the Future of AI in Healthcare

Large Language Models are rapidly transforming the healthcare landscape. In this talk, I will explore how LLMs like GPT-4 and DeepSeek-R1 are being used to support disease diagnosis, predict chronic conditions, and assist medical professionals without relying on sensitive patient data. Drawing from my published research and real-world applications, I’ll discuss the technical challenges, ethical considerations, and the future potential of integrating LLMs in clinical settings. The talk will offer valuable insights for developers, researchers, and healthcare innovators interested in applying AI responsibly and effectively.

About the Speaker

Gaurav K Gupta graduated from Youngstown State University, Bachelor’s in Computer Science and Mathematics.

June 27 - Visual AI in Healthcare

Join us for the third of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

When

June 27 at 9 AM Pacific

Where

Online. Register for the Zoom!

MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders

We present MedVAE, a family of six generalizable 2D and 3D variational autoencoders trained on over one million images from 19 open-source medical imaging datasets using a novel two-stage training strategy. MedVAE downsizes high-dimensional medical images into compact latent representations, reducing storage by up to 512× and accelerating downstream tasks by up to 70× while preserving clinically relevant features. We demonstrate across 20 evaluation tasks that these latent representations can replace high-resolution images in computer-aided diagnosis pipelines without compromising performance. MedVAE is open-source with a streamlined finetuning pipeline and inference engine, enabling scalable model development in resource-constrained medical imaging settings.

About the Speakers

Ashwin Kumar is a PhD Candidate in Biomedical Physics at Stanford University, advised by Akshay Chaudhari and Greg Zaharchuk. He focuses on developing deep learning methodologies to advance medical image acquisition and analysis.

Maya Varma is a PhD student in computer science at Stanford University. Her research focuses on the development of artificial intelligence methods for addressing healthcare challenges, with a particular focus on medical imaging applications.

Leveraging Foundation Models for Pathology: Progress and Pitfalls

How do you train ML models on pathology slides that are thousands of times larger than standard images? Foundation models offer a breakthrough approach to these gigapixel-scale challenges. This talk explores how self-supervised foundation models trained on broad histopathology datasets are transforming computational pathology. We’ll examine their progress in handling weakly-supervised learning, managing tissue preparation variations, and enabling rapid prototyping with minimal labeled examples. However, significant challenges remain: increasing computational demands, the potential for bias, and questions about generalizability across diverse populations. This talk will offer a balanced perspective to help separate foundation model hype from genuine clinical value.

About the Speaker

Heather D. Couture is a consultant and founder of Pixel Scientia Labs, where she partners with mission-driven founders and R&D teams to support applications of computer vision for people and planetary health. She has a PhD in Computer Science and has published in top-tier computer vision and medical imaging venues. She hosts the Impact AI Podcast and writes regularly on LinkedIn, for her newsletter Computer Vision Insights, and for a variety of other publications.

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

Recent advances in promptable segmentation have transformed medical imaging workflows, yet most existing models are constrained to static 2D or 3D applications. This talk presents LesionLocator, the first end-to-end framework for universal 4D lesion segmentation and tracking using dense spatial prompts. The system enables zero-shot tumor analysis across whole-body 3D scans and multiple timepoints, propagating a single user prompt through longitudinal follow-ups to segment and track lesion progression. Trained on over 23,000 annotated scans and supplemented with a synthetic time-series dataset, LesionLocator achieves human-level performance in segmentation and outperforms state-of-the-art baselines in longitudinal tracking tasks. The presentation also highlights advances in 3D interactive segmentation, including our open-set tool nnInteractive, showing how spatial prompting can scale from user-guided interaction to clinical-grade automation.

About the Speaker

Maximilian Rokussis is a PhD scholar at the German Cancer Research Center (DKFZ), working in the Division of Medical Image Computing under Klaus Maier-Hein. He focuses on 3D multimodal and multi-timepoint segmentation with spatial and text prompts. With several MICCAI challenge wins and first-author publications at CVPR and MICCAI, he co-leads the Helmholtz Medical Foundation Model initiative and develops AI solutions at the interface of research and clinical radiology.

LLMs for Smarter Diagnosis: Unlocking the Future of AI in Healthcare

Large Language Models are rapidly transforming the healthcare landscape. In this talk, I will explore how LLMs like GPT-4 and DeepSeek-R1 are being used to support disease diagnosis, predict chronic conditions, and assist medical professionals without relying on sensitive patient data. Drawing from my published research and real-world applications, I’ll discuss the technical challenges, ethical considerations, and the future potential of integrating LLMs in clinical settings. The talk will offer valuable insights for developers, researchers, and healthcare innovators interested in applying AI responsibly and effectively.

About the Speaker

Gaurav K Gupta graduated from Youngstown State University, Bachelor’s in Computer Science and Mathematics.

June 27 - Visual AI in Healthcare

Join us for the third of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

When

June 27 at 9 AM Pacific

Where

Online. Register for the Zoom!

MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders

We present MedVAE, a family of six generalizable 2D and 3D variational autoencoders trained on over one million images from 19 open-source medical imaging datasets using a novel two-stage training strategy. MedVAE downsizes high-dimensional medical images into compact latent representations, reducing storage by up to 512× and accelerating downstream tasks by up to 70× while preserving clinically relevant features. We demonstrate across 20 evaluation tasks that these latent representations can replace high-resolution images in computer-aided diagnosis pipelines without compromising performance. MedVAE is open-source with a streamlined finetuning pipeline and inference engine, enabling scalable model development in resource-constrained medical imaging settings.

About the Speakers

Ashwin Kumar is a PhD Candidate in Biomedical Physics at Stanford University, advised by Akshay Chaudhari and Greg Zaharchuk. He focuses on developing deep learning methodologies to advance medical image acquisition and analysis.

Maya Varma is a PhD student in computer science at Stanford University. Her research focuses on the development of artificial intelligence methods for addressing healthcare challenges, with a particular focus on medical imaging applications.

Leveraging Foundation Models for Pathology: Progress and Pitfalls

How do you train ML models on pathology slides that are thousands of times larger than standard images? Foundation models offer a breakthrough approach to these gigapixel-scale challenges. This talk explores how self-supervised foundation models trained on broad histopathology datasets are transforming computational pathology. We’ll examine their progress in handling weakly-supervised learning, managing tissue preparation variations, and enabling rapid prototyping with minimal labeled examples. However, significant challenges remain: increasing computational demands, the potential for bias, and questions about generalizability across diverse populations. This talk will offer a balanced perspective to help separate foundation model hype from genuine clinical value.

About the Speaker

Heather D. Couture is a consultant and founder of Pixel Scientia Labs, where she partners with mission-driven founders and R&D teams to support applications of computer vision for people and planetary health. She has a PhD in Computer Science and has published in top-tier computer vision and medical imaging venues. She hosts the Impact AI Podcast and writes regularly on LinkedIn, for her newsletter Computer Vision Insights, and for a variety of other publications.

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

Recent advances in promptable segmentation have transformed medical imaging workflows, yet most existing models are constrained to static 2D or 3D applications. This talk presents LesionLocator, the first end-to-end framework for universal 4D lesion segmentation and tracking using dense spatial prompts. The system enables zero-shot tumor analysis across whole-body 3D scans and multiple timepoints, propagating a single user prompt through longitudinal follow-ups to segment and track lesion progression. Trained on over 23,000 annotated scans and supplemented with a synthetic time-series dataset, LesionLocator achieves human-level performance in segmentation and outperforms state-of-the-art baselines in longitudinal tracking tasks. The presentation also highlights advances in 3D interactive segmentation, including our open-set tool nnInteractive, showing how spatial prompting can scale from user-guided interaction to clinical-grade automation.

About the Speaker

Maximilian Rokussis is a PhD scholar at the German Cancer Research Center (DKFZ), working in the Division of Medical Image Computing under Klaus Maier-Hein. He focuses on 3D multimodal and multi-timepoint segmentation with spatial and text prompts. With several MICCAI challenge wins and first-author publications at CVPR and MICCAI, he co-leads the Helmholtz Medical Foundation Model initiative and develops AI solutions at the interface of research and clinical radiology.

LLMs for Smarter Diagnosis: Unlocking the Future of AI in Healthcare

Large Language Models are rapidly transforming the healthcare landscape. In this talk, I will explore how LLMs like GPT-4 and DeepSeek-R1 are being used to support disease diagnosis, predict chronic conditions, and assist medical professionals without relying on sensitive patient data. Drawing from my published research and real-world applications, I’ll discuss the technical challenges, ethical considerations, and the future potential of integrating LLMs in clinical settings. The talk will offer valuable insights for developers, researchers, and healthcare innovators interested in applying AI responsibly and effectively.

About the Speaker

Gaurav K Gupta graduated from Youngstown State University, Bachelor’s in Computer Science and Mathematics.

June 27 - Visual AI in Healthcare

Join us for the third of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

When

June 27 at 9 AM Pacific

Where

Online. Register for the Zoom!

MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders

We present MedVAE, a family of six generalizable 2D and 3D variational autoencoders trained on over one million images from 19 open-source medical imaging datasets using a novel two-stage training strategy. MedVAE downsizes high-dimensional medical images into compact latent representations, reducing storage by up to 512× and accelerating downstream tasks by up to 70× while preserving clinically relevant features. We demonstrate across 20 evaluation tasks that these latent representations can replace high-resolution images in computer-aided diagnosis pipelines without compromising performance. MedVAE is open-source with a streamlined finetuning pipeline and inference engine, enabling scalable model development in resource-constrained medical imaging settings.

About the Speakers

Ashwin Kumar is a PhD Candidate in Biomedical Physics at Stanford University, advised by Akshay Chaudhari and Greg Zaharchuk. He focuses on developing deep learning methodologies to advance medical image acquisition and analysis.

Maya Varma is a PhD student in computer science at Stanford University. Her research focuses on the development of artificial intelligence methods for addressing healthcare challenges, with a particular focus on medical imaging applications.

Leveraging Foundation Models for Pathology: Progress and Pitfalls

How do you train ML models on pathology slides that are thousands of times larger than standard images? Foundation models offer a breakthrough approach to these gigapixel-scale challenges. This talk explores how self-supervised foundation models trained on broad histopathology datasets are transforming computational pathology. We’ll examine their progress in handling weakly-supervised learning, managing tissue preparation variations, and enabling rapid prototyping with minimal labeled examples. However, significant challenges remain: increasing computational demands, the potential for bias, and questions about generalizability across diverse populations. This talk will offer a balanced perspective to help separate foundation model hype from genuine clinical value.

About the Speaker

Heather D. Couture is a consultant and founder of Pixel Scientia Labs, where she partners with mission-driven founders and R&D teams to support applications of computer vision for people and planetary health. She has a PhD in Computer Science and has published in top-tier computer vision and medical imaging venues. She hosts the Impact AI Podcast and writes regularly on LinkedIn, for her newsletter Computer Vision Insights, and for a variety of other publications.

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

Recent advances in promptable segmentation have transformed medical imaging workflows, yet most existing models are constrained to static 2D or 3D applications. This talk presents LesionLocator, the first end-to-end framework for universal 4D lesion segmentation and tracking using dense spatial prompts. The system enables zero-shot tumor analysis across whole-body 3D scans and multiple timepoints, propagating a single user prompt through longitudinal follow-ups to segment and track lesion progression. Trained on over 23,000 annotated scans and supplemented with a synthetic time-series dataset, LesionLocator achieves human-level performance in segmentation and outperforms state-of-the-art baselines in longitudinal tracking tasks. The presentation also highlights advances in 3D interactive segmentation, including our open-set tool nnInteractive, showing how spatial prompting can scale from user-guided interaction to clinical-grade automation.

About the Speaker

Maximilian Rokussis is a PhD scholar at the German Cancer Research Center (DKFZ), working in the Division of Medical Image Computing under Klaus Maier-Hein. He focuses on 3D multimodal and multi-timepoint segmentation with spatial and text prompts. With several MICCAI challenge wins and first-author publications at CVPR and MICCAI, he co-leads the Helmholtz Medical Foundation Model initiative and develops AI solutions at the interface of research and clinical radiology.

LLMs for Smarter Diagnosis: Unlocking the Future of AI in Healthcare

Large Language Models are rapidly transforming the healthcare landscape. In this talk, I will explore how LLMs like GPT-4 and DeepSeek-R1 are being used to support disease diagnosis, predict chronic conditions, and assist medical professionals without relying on sensitive patient data. Drawing from my published research and real-world applications, I’ll discuss the technical challenges, ethical considerations, and the future potential of integrating LLMs in clinical settings. The talk will offer valuable insights for developers, researchers, and healthcare innovators interested in applying AI responsibly and effectively.

About the Speaker

Gaurav K Gupta graduated from Youngstown State University, Bachelor’s in Computer Science and Mathematics.

June 27 - Visual AI in Healthcare

Join us for the third of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

When

June 27 at 9 AM Pacific

Where

Online. Register for the Zoom!

MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders

We present MedVAE, a family of six generalizable 2D and 3D variational autoencoders trained on over one million images from 19 open-source medical imaging datasets using a novel two-stage training strategy. MedVAE downsizes high-dimensional medical images into compact latent representations, reducing storage by up to 512× and accelerating downstream tasks by up to 70× while preserving clinically relevant features. We demonstrate across 20 evaluation tasks that these latent representations can replace high-resolution images in computer-aided diagnosis pipelines without compromising performance. MedVAE is open-source with a streamlined finetuning pipeline and inference engine, enabling scalable model development in resource-constrained medical imaging settings.

About the Speakers

Ashwin Kumar is a PhD Candidate in Biomedical Physics at Stanford University, advised by Akshay Chaudhari and Greg Zaharchuk. He focuses on developing deep learning methodologies to advance medical image acquisition and analysis.

Maya Varma is a PhD student in computer science at Stanford University. Her research focuses on the development of artificial intelligence methods for addressing healthcare challenges, with a particular focus on medical imaging applications.

Leveraging Foundation Models for Pathology: Progress and Pitfalls

How do you train ML models on pathology slides that are thousands of times larger than standard images? Foundation models offer a breakthrough approach to these gigapixel-scale challenges. This talk explores how self-supervised foundation models trained on broad histopathology datasets are transforming computational pathology. We’ll examine their progress in handling weakly-supervised learning, managing tissue preparation variations, and enabling rapid prototyping with minimal labeled examples. However, significant challenges remain: increasing computational demands, the potential for bias, and questions about generalizability across diverse populations. This talk will offer a balanced perspective to help separate foundation model hype from genuine clinical value.

About the Speaker

Heather D. Couture is a consultant and founder of Pixel Scientia Labs, where she partners with mission-driven founders and R&D teams to support applications of computer vision for people and planetary health. She has a PhD in Computer Science and has published in top-tier computer vision and medical imaging venues. She hosts the Impact AI Podcast and writes regularly on LinkedIn, for her newsletter Computer Vision Insights, and for a variety of other publications.

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

Recent advances in promptable segmentation have transformed medical imaging workflows, yet most existing models are constrained to static 2D or 3D applications. This talk presents LesionLocator, the first end-to-end framework for universal 4D lesion segmentation and tracking using dense spatial prompts. The system enables zero-shot tumor analysis across whole-body 3D scans and multiple timepoints, propagating a single user prompt through longitudinal follow-ups to segment and track lesion progression. Trained on over 23,000 annotated scans and supplemented with a synthetic time-series dataset, LesionLocator achieves human-level performance in segmentation and outperforms state-of-the-art baselines in longitudinal tracking tasks. The presentation also highlights advances in 3D interactive segmentation, including our open-set tool nnInteractive, showing how spatial prompting can scale from user-guided interaction to clinical-grade automation.

About the Speaker

Maximilian Rokussis is a PhD scholar at the German Cancer Research Center (DKFZ), working in the Division of Medical Image Computing under Klaus Maier-Hein. He focuses on 3D multimodal and multi-timepoint segmentation with spatial and text prompts. With several MICCAI challenge wins and first-author publications at CVPR and MICCAI, he co-leads the Helmholtz Medical Foundation Model initiative and develops AI solutions at the interface of research and clinical radiology.

LLMs for Smarter Diagnosis: Unlocking the Future of AI in Healthcare

Large Language Models are rapidly transforming the healthcare landscape. In this talk, I will explore how LLMs like GPT-4 and DeepSeek-R1 are being used to support disease diagnosis, predict chronic conditions, and assist medical professionals without relying on sensitive patient data. Drawing from my published research and real-world applications, I’ll discuss the technical challenges, ethical considerations, and the future potential of integrating LLMs in clinical settings. The talk will offer valuable insights for developers, researchers, and healthcare innovators interested in applying AI responsibly and effectively.

About the Speaker

Gaurav K Gupta graduated from Youngstown State University, Bachelor’s in Computer Science and Mathematics.

June 27 - Visual AI in Healthcare

Join us for the third of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

When

June 27 at 9 AM Pacific

Where

Online. Register for the Zoom!

MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders

We present MedVAE, a family of six generalizable 2D and 3D variational autoencoders trained on over one million images from 19 open-source medical imaging datasets using a novel two-stage training strategy. MedVAE downsizes high-dimensional medical images into compact latent representations, reducing storage by up to 512× and accelerating downstream tasks by up to 70× while preserving clinically relevant features. We demonstrate across 20 evaluation tasks that these latent representations can replace high-resolution images in computer-aided diagnosis pipelines without compromising performance. MedVAE is open-source with a streamlined finetuning pipeline and inference engine, enabling scalable model development in resource-constrained medical imaging settings.

About the Speakers

Ashwin Kumar is a PhD Candidate in Biomedical Physics at Stanford University, advised by Akshay Chaudhari and Greg Zaharchuk. He focuses on developing deep learning methodologies to advance medical image acquisition and analysis.

Maya Varma is a PhD student in computer science at Stanford University. Her research focuses on the development of artificial intelligence methods for addressing healthcare challenges, with a particular focus on medical imaging applications.

Leveraging Foundation Models for Pathology: Progress and Pitfalls

How do you train ML models on pathology slides that are thousands of times larger than standard images? Foundation models offer a breakthrough approach to these gigapixel-scale challenges. This talk explores how self-supervised foundation models trained on broad histopathology datasets are transforming computational pathology. We’ll examine their progress in handling weakly-supervised learning, managing tissue preparation variations, and enabling rapid prototyping with minimal labeled examples. However, significant challenges remain: increasing computational demands, the potential for bias, and questions about generalizability across diverse populations. This talk will offer a balanced perspective to help separate foundation model hype from genuine clinical value.

About the Speaker

Heather D. Couture is a consultant and founder of Pixel Scientia Labs, where she partners with mission-driven founders and R&D teams to support applications of computer vision for people and planetary health. She has a PhD in Computer Science and has published in top-tier computer vision and medical imaging venues. She hosts the Impact AI Podcast and writes regularly on LinkedIn, for her newsletter Computer Vision Insights, and for a variety of other publications.

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

Recent advances in promptable segmentation have transformed medical imaging workflows, yet most existing models are constrained to static 2D or 3D applications. This talk presents LesionLocator, the first end-to-end framework for universal 4D lesion segmentation and tracking using dense spatial prompts. The system enables zero-shot tumor analysis across whole-body 3D scans and multiple timepoints, propagating a single user prompt through longitudinal follow-ups to segment and track lesion progression. Trained on over 23,000 annotated scans and supplemented with a synthetic time-series dataset, LesionLocator achieves human-level performance in segmentation and outperforms state-of-the-art baselines in longitudinal tracking tasks. The presentation also highlights advances in 3D interactive segmentation, including our open-set tool nnInteractive, showing how spatial prompting can scale from user-guided interaction to clinical-grade automation.

About the Speaker

Maximilian Rokussis is a PhD scholar at the German Cancer Research Center (DKFZ), working in the Division of Medical Image Computing under Klaus Maier-Hein. He focuses on 3D multimodal and multi-timepoint segmentation with spatial and text prompts. With several MICCAI challenge wins and first-author publications at CVPR and MICCAI, he co-leads the Helmholtz Medical Foundation Model initiative and develops AI solutions at the interface of research and clinical radiology.

LLMs for Smarter Diagnosis: Unlocking the Future of AI in Healthcare

Large Language Models are rapidly transforming the healthcare landscape. In this talk, I will explore how LLMs like GPT-4 and DeepSeek-R1 are being used to support disease diagnosis, predict chronic conditions, and assist medical professionals without relying on sensitive patient data. Drawing from my published research and real-world applications, I’ll discuss the technical challenges, ethical considerations, and the future potential of integrating LLMs in clinical settings. The talk will offer valuable insights for developers, researchers, and healthcare innovators interested in applying AI responsibly and effectively.

About the Speaker

Gaurav K Gupta graduated from Youngstown State University, Bachelor’s in Computer Science and Mathematics.

June 27 - Visual AI in Healthcare

Join us for the third of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

When

June 27 at 9 AM Pacific

Where

Online. Register for the Zoom!

MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders

We present MedVAE, a family of six generalizable 2D and 3D variational autoencoders trained on over one million images from 19 open-source medical imaging datasets using a novel two-stage training strategy. MedVAE downsizes high-dimensional medical images into compact latent representations, reducing storage by up to 512× and accelerating downstream tasks by up to 70× while preserving clinically relevant features. We demonstrate across 20 evaluation tasks that these latent representations can replace high-resolution images in computer-aided diagnosis pipelines without compromising performance. MedVAE is open-source with a streamlined finetuning pipeline and inference engine, enabling scalable model development in resource-constrained medical imaging settings.

About the Speakers

Ashwin Kumar is a PhD Candidate in Biomedical Physics at Stanford University, advised by Akshay Chaudhari and Greg Zaharchuk. He focuses on developing deep learning methodologies to advance medical image acquisition and analysis.

Maya Varma is a PhD student in computer science at Stanford University. Her research focuses on the development of artificial intelligence methods for addressing healthcare challenges, with a particular focus on medical imaging applications.

Leveraging Foundation Models for Pathology: Progress and Pitfalls

How do you train ML models on pathology slides that are thousands of times larger than standard images? Foundation models offer a breakthrough approach to these gigapixel-scale challenges. This talk explores how self-supervised foundation models trained on broad histopathology datasets are transforming computational pathology. We’ll examine their progress in handling weakly-supervised learning, managing tissue preparation variations, and enabling rapid prototyping with minimal labeled examples. However, significant challenges remain: increasing computational demands, the potential for bias, and questions about generalizability across diverse populations. This talk will offer a balanced perspective to help separate foundation model hype from genuine clinical value.

About the Speaker

Heather D. Couture is a consultant and founder of Pixel Scientia Labs, where she partners with mission-driven founders and R&D teams to support applications of computer vision for people and planetary health. She has a PhD in Computer Science and has published in top-tier computer vision and medical imaging venues. She hosts the Impact AI Podcast and writes regularly on LinkedIn, for her newsletter Computer Vision Insights, and for a variety of other publications.

LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging

Recent advances in promptable segmentation have transformed medical imaging workflows, yet most existing models are constrained to static 2D or 3D applications. This talk presents LesionLocator, the first end-to-end framework for universal 4D lesion segmentation and tracking using dense spatial prompts. The system enables zero-shot tumor analysis across whole-body 3D scans and multiple timepoints, propagating a single user prompt through longitudinal follow-ups to segment and track lesion progression. Trained on over 23,000 annotated scans and supplemented with a synthetic time-series dataset, LesionLocator achieves human-level performance in segmentation and outperforms state-of-the-art baselines in longitudinal tracking tasks. The presentation also highlights advances in 3D interactive segmentation, including our open-set tool nnInteractive, showing how spatial prompting can scale from user-guided interaction to clinical-grade automation.

About the Speaker

Maximilian Rokussis is a PhD scholar at the German Cancer Research Center (DKFZ), working in the Division of Medical Image Computing under Klaus Maier-Hein. He focuses on 3D multimodal and multi-timepoint segmentation with spatial and text prompts. With several MICCAI challenge wins and first-author publications at CVPR and MICCAI, he co-leads the Helmholtz Medical Foundation Model initiative and develops AI solutions at the interface of research and clinical radiology.

LLMs for Smarter Diagnosis: Unlocking the Future of AI in Healthcare

Large Language Models are rapidly transforming the healthcare landscape. In this talk, I will explore how LLMs like GPT-4 and DeepSeek-R1 are being used to support disease diagnosis, predict chronic conditions, and assist medical professionals without relying on sensitive patient data. Drawing from my published research and real-world applications, I’ll discuss the technical challenges, ethical considerations, and the future potential of integrating LLMs in clinical settings. The talk will offer valuable insights for developers, researchers, and healthcare innovators interested in applying AI responsibly and effectively.

About the Speaker

Gaurav K Gupta graduated from Youngstown State University, Bachelor’s in Computer Science and Mathematics.

June 27 - Visual AI in Healthcare

Join us for the first of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

June 25 at 9 AM Pacific

Register for the Zoom

Vision-Driven Behavior Analysis in Autism: Challenges and Opportunities

Understanding and classifying human behaviors is a long-standing goal at the intersection of computer science and behavioral science. Video-based monitoring provides a non-intrusive and scalable framework for analyzing complex behavioral patterns in real-world environments. This talk explores key challenges and emerging opportunities in AI-driven behavior analysis for individuals with autism spectrum disorder (ASD), with an emphasis on the role of computer vision in building clinically meaningful and interpretable tools.

About the Speaker

Somaieh Amraee is a postdoctoral research fellow at Northeastern University’s Institute for Experiential AI. She earned her Ph.D. in Computer Engineering and her research focuses on advancing computer vision techniques to support health and medical applications, particularly in children’s health and development.

PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

PRISM, an explainability framework that leverages language-guided Stable Diffusion that generates high-resolution (512×512) counterfactual medical images with unprecedented precision, answering the question: “What would this patient image look like if a specific attribute is changed?” PRISM enables fine-grained control over image edits, allowing us to selectively add or remove disease-related image features as well as complex medical support devices (such as pacemakers) while preserving the rest of the image. Beyond generating high-quality images, we demonstrate that PRISM’s class counterfactuals can enhance downstream model performance by isolating disease-specific features from spurious ones — a significant advancement toward robust and trustworthy AI in healthcare.

About the Speaker

Amar Kumar is a PhD Candidate at McGill University \| MILA Quebec AI Institute in the Probabilistic Vision Group (PVG). His research primarily focuses on generative AI and medical imaging\, with the main objective to tackle real-world challenges like bias mitigation in deep learning models.

Building Your Medical Digital Twin — How Accurate Are LLMs Today?

We all hear about the dream of a digital twin: AI systems combining your blood tests, MRI scans, smartwatch data, and genetics to track health and plan care. But how accurate are today’s top tools like GPT-4o, Gemini, MedLLaMA, or OpenBioLLM — and what can you realistically feed them?

In this talk, we’ll explore where these models deliver, where they fall short, and what I learned testing them on my own health records.

About the Speaker

Ekaterina Kondrateva is a senior computer vision engineer with 8 years of experience in AI for healthcare, author of 20+ scientific papers, and finalist in three international MRI analysis competitions. Former head of AI research for medical imaging at HealthTech startup LightBC.

Deep Dive: Google’s MedGemma, NVIDIA’s VISTA-3D and MedSAM-2 Medical Imaging Models

In this talk, we’ll explore three medical imaging models. First, we’ll look at Google’s MedGemma open models for medical text and image comprehension, built on Gemma 3. Next,, we’ll dive into NVIDIA’s Versatile Imaging SegmenTation and Annotation (VISTA) model which combines semantic segmentation with interactivity, offering high accuracy and adaptability across diverse anatomical areas for medical imaging. Finally, we’ll explore MedSAM-2, an advanced segmentation model that utilizes Meta’s SAM 2 framework to address both 2D and 3D medical image segmentation tasks.

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

June 25 - Visual AI in Healthcare

Join us for the first of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

June 25 at 9 AM Pacific

Register for the Zoom

Vision-Driven Behavior Analysis in Autism: Challenges and Opportunities

Understanding and classifying human behaviors is a long-standing goal at the intersection of computer science and behavioral science. Video-based monitoring provides a non-intrusive and scalable framework for analyzing complex behavioral patterns in real-world environments. This talk explores key challenges and emerging opportunities in AI-driven behavior analysis for individuals with autism spectrum disorder (ASD), with an emphasis on the role of computer vision in building clinically meaningful and interpretable tools.

About the Speaker

Somaieh Amraee is a postdoctoral research fellow at Northeastern University’s Institute for Experiential AI. She earned her Ph.D. in Computer Engineering and her research focuses on advancing computer vision techniques to support health and medical applications, particularly in children’s health and development.

PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

PRISM, an explainability framework that leverages language-guided Stable Diffusion that generates high-resolution (512×512) counterfactual medical images with unprecedented precision, answering the question: “What would this patient image look like if a specific attribute is changed?” PRISM enables fine-grained control over image edits, allowing us to selectively add or remove disease-related image features as well as complex medical support devices (such as pacemakers) while preserving the rest of the image. Beyond generating high-quality images, we demonstrate that PRISM’s class counterfactuals can enhance downstream model performance by isolating disease-specific features from spurious ones — a significant advancement toward robust and trustworthy AI in healthcare.

About the Speaker

Amar Kumar is a PhD Candidate at McGill University \| MILA Quebec AI Institute in the Probabilistic Vision Group (PVG). His research primarily focuses on generative AI and medical imaging\, with the main objective to tackle real-world challenges like bias mitigation in deep learning models.

Building Your Medical Digital Twin — How Accurate Are LLMs Today?

We all hear about the dream of a digital twin: AI systems combining your blood tests, MRI scans, smartwatch data, and genetics to track health and plan care. But how accurate are today’s top tools like GPT-4o, Gemini, MedLLaMA, or OpenBioLLM — and what can you realistically feed them?

In this talk, we’ll explore where these models deliver, where they fall short, and what I learned testing them on my own health records.

About the Speaker

Ekaterina Kondrateva is a senior computer vision engineer with 8 years of experience in AI for healthcare, author of 20+ scientific papers, and finalist in three international MRI analysis competitions. Former head of AI research for medical imaging at HealthTech startup LightBC.

Deep Dive: Google’s MedGemma, NVIDIA’s VISTA-3D and MedSAM-2 Medical Imaging Models

In this talk, we’ll explore three medical imaging models. First, we’ll look at Google’s MedGemma open models for medical text and image comprehension, built on Gemma 3. Next,, we’ll dive into NVIDIA’s Versatile Imaging SegmenTation and Annotation (VISTA) model which combines semantic segmentation with interactivity, offering high accuracy and adaptability across diverse anatomical areas for medical imaging. Finally, we’ll explore MedSAM-2, an advanced segmentation model that utilizes Meta’s SAM 2 framework to address both 2D and 3D medical image segmentation tasks.

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

June 25 - Visual AI in Healthcare

Join us for the first of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

June 25 at 9 AM Pacific

Register for the Zoom

Vision-Driven Behavior Analysis in Autism: Challenges and Opportunities

Understanding and classifying human behaviors is a long-standing goal at the intersection of computer science and behavioral science. Video-based monitoring provides a non-intrusive and scalable framework for analyzing complex behavioral patterns in real-world environments. This talk explores key challenges and emerging opportunities in AI-driven behavior analysis for individuals with autism spectrum disorder (ASD), with an emphasis on the role of computer vision in building clinically meaningful and interpretable tools.

About the Speaker

Somaieh Amraee is a postdoctoral research fellow at Northeastern University’s Institute for Experiential AI. She earned her Ph.D. in Computer Engineering and her research focuses on advancing computer vision techniques to support health and medical applications, particularly in children’s health and development.

PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

PRISM, an explainability framework that leverages language-guided Stable Diffusion that generates high-resolution (512×512) counterfactual medical images with unprecedented precision, answering the question: “What would this patient image look like if a specific attribute is changed?” PRISM enables fine-grained control over image edits, allowing us to selectively add or remove disease-related image features as well as complex medical support devices (such as pacemakers) while preserving the rest of the image. Beyond generating high-quality images, we demonstrate that PRISM’s class counterfactuals can enhance downstream model performance by isolating disease-specific features from spurious ones — a significant advancement toward robust and trustworthy AI in healthcare.

About the Speaker

Amar Kumar is a PhD Candidate at McGill University \| MILA Quebec AI Institute in the Probabilistic Vision Group (PVG). His research primarily focuses on generative AI and medical imaging\, with the main objective to tackle real-world challenges like bias mitigation in deep learning models.

Building Your Medical Digital Twin — How Accurate Are LLMs Today?

We all hear about the dream of a digital twin: AI systems combining your blood tests, MRI scans, smartwatch data, and genetics to track health and plan care. But how accurate are today’s top tools like GPT-4o, Gemini, MedLLaMA, or OpenBioLLM — and what can you realistically feed them?

In this talk, we’ll explore where these models deliver, where they fall short, and what I learned testing them on my own health records.

About the Speaker

Ekaterina Kondrateva is a senior computer vision engineer with 8 years of experience in AI for healthcare, author of 20+ scientific papers, and finalist in three international MRI analysis competitions. Former head of AI research for medical imaging at HealthTech startup LightBC.

Deep Dive: Google’s MedGemma, NVIDIA’s VISTA-3D and MedSAM-2 Medical Imaging Models

In this talk, we’ll explore three medical imaging models. First, we’ll look at Google’s MedGemma open models for medical text and image comprehension, built on Gemma 3. Next,, we’ll dive into NVIDIA’s Versatile Imaging SegmenTation and Annotation (VISTA) model which combines semantic segmentation with interactivity, offering high accuracy and adaptability across diverse anatomical areas for medical imaging. Finally, we’ll explore MedSAM-2, an advanced segmentation model that utilizes Meta’s SAM 2 framework to address both 2D and 3D medical image segmentation tasks.

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

June 25 - Visual AI in Healthcare

Join us for the first of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

June 25 at 9 AM Pacific

Register for the Zoom

Vision-Driven Behavior Analysis in Autism: Challenges and Opportunities

Understanding and classifying human behaviors is a long-standing goal at the intersection of computer science and behavioral science. Video-based monitoring provides a non-intrusive and scalable framework for analyzing complex behavioral patterns in real-world environments. This talk explores key challenges and emerging opportunities in AI-driven behavior analysis for individuals with autism spectrum disorder (ASD), with an emphasis on the role of computer vision in building clinically meaningful and interpretable tools.

About the Speaker

Somaieh Amraee is a postdoctoral research fellow at Northeastern University’s Institute for Experiential AI. She earned her Ph.D. in Computer Engineering and her research focuses on advancing computer vision techniques to support health and medical applications, particularly in children’s health and development.

PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

PRISM, an explainability framework that leverages language-guided Stable Diffusion that generates high-resolution (512×512) counterfactual medical images with unprecedented precision, answering the question: “What would this patient image look like if a specific attribute is changed?” PRISM enables fine-grained control over image edits, allowing us to selectively add or remove disease-related image features as well as complex medical support devices (such as pacemakers) while preserving the rest of the image. Beyond generating high-quality images, we demonstrate that PRISM’s class counterfactuals can enhance downstream model performance by isolating disease-specific features from spurious ones — a significant advancement toward robust and trustworthy AI in healthcare.

About the Speaker

Amar Kumar is a PhD Candidate at McGill University \| MILA Quebec AI Institute in the Probabilistic Vision Group (PVG). His research primarily focuses on generative AI and medical imaging\, with the main objective to tackle real-world challenges like bias mitigation in deep learning models.

Building Your Medical Digital Twin — How Accurate Are LLMs Today?

We all hear about the dream of a digital twin: AI systems combining your blood tests, MRI scans, smartwatch data, and genetics to track health and plan care. But how accurate are today’s top tools like GPT-4o, Gemini, MedLLaMA, or OpenBioLLM — and what can you realistically feed them?

In this talk, we’ll explore where these models deliver, where they fall short, and what I learned testing them on my own health records.

About the Speaker

Ekaterina Kondrateva is a senior computer vision engineer with 8 years of experience in AI for healthcare, author of 20+ scientific papers, and finalist in three international MRI analysis competitions. Former head of AI research for medical imaging at HealthTech startup LightBC.

Deep Dive: Google’s MedGemma, NVIDIA’s VISTA-3D and MedSAM-2 Medical Imaging Models

In this talk, we’ll explore three medical imaging models. First, we’ll look at Google’s MedGemma open models for medical text and image comprehension, built on Gemma 3. Next,, we’ll dive into NVIDIA’s Versatile Imaging SegmenTation and Annotation (VISTA) model which combines semantic segmentation with interactivity, offering high accuracy and adaptability across diverse anatomical areas for medical imaging. Finally, we’ll explore MedSAM-2, an advanced segmentation model that utilizes Meta’s SAM 2 framework to address both 2D and 3D medical image segmentation tasks.

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

June 25 - Visual AI in Healthcare