talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (461 results)

See all 461 →

Activities & events

Title & Speakers Event

This session introduces participants to PREreview, a community-driven platform designed to make peer review more open, inclusive, and equitable. We’ll explore the current research publishing landscape—from preprints to common challenges in traditional peer review—and discuss why transparent, community-centered reviewing matters.

Participants will learn how PREreview connects with open-source infrastructure and platforms like ORCID and Zenodo, see a demo of how to review manuscripts and datasets, and gain practical tools from the Open Reviewers workshop to write constructive, socially-conscious reviews. We’ll also share upcoming developments, ways to get involved, and how to engage through programs like PREreview Champions.

Outline By the end of this session, participants will be able to:

  • Describe what PREreview is, why it was created, and how it fits into today’s research publishing landscape.
  • Explain the role of preprints and open peer review, including the benefits and challenges of current review systems.
  • Understand how PREreview supports open scholarship, including collaborations with ORCID, Zenodo, and other open communities.
  • Use the PREreview platform to find, write, and publish constructive open reviews of manuscripts and datasets.
  • Apply best practices for socially-conscious, constructive peer review, drawing from the Open Reviewers workshop.
  • Identify upcoming PREreview features and ways to get involved—through reviewing, contributing to open source, or joining the Champions program.

---------------------------------------- How to Join the Webinar ---------------------------------------- You can join via your browser (no app download required). Use Chrome or Firefox. Pre-register for the webinar: https://www.bigmarker.com/neo4j/Data-Umbrella-Webinar

-------------------------------- Video Recording -------------------------------- This event will be recorded and placed on our YouTube. We usually have it up within 24 hours of the event. Subscribe to our YT and set your notifications: https://www.youtube.com/c/DataUmbrella/

---------------------------------------- Time ---------------------------------------- 17:00 UTC, 9am PT / 12pm ET / 6pm Paris / 8pm EAT ---------------------------------------- About the Speaker ---------------------------------------- Daniela Saderi, Ph.D. - ORCID: 0000-0002-6109-0367 - Dr. Daniela Saderi is the Co-founder and Executive Director of PREreview, a non-profit advancing open, community-centered peer review of early research objects by supporting researchers and experts—especially early-career and historically excluded scholars.

She earned her Ph.D. in neuroscience from Oregon Health & Science University in 2019, studying auditory processing in mammals, and was a 2018–2019 Mozilla Fellow for Open Science. Daniela envisions a scholarly ecosystem grounded in trust, care, and collective wisdom, where knowledge flows freely as a shared inheritance rather than a commodity.

LinkedIn: https://www.linkedin.com/in/daniela-saderi/ GitHub: https://github.com/dasaderi

---------------------------------------- Connect with Data Umbrella ---------------------------------------- We invite you to follow Data Umbrella on our social networking sites to keep up to date on the latest news.

[Online] PREreview: Transforming Peer Review with Researchers, for Communities
Jan 22 - Women in AI 2026-01-22 · 23:00

Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

Date, Time and Location

Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Align Before You Recommend

The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

About the Speaker

Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

About the Speaker

Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

About the Speaker

Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

FiftyOne Labs: Enabling experimentation for the computer vision community

FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

About the Speaker

Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.

Jan 22 - Women in AI
Jan 22 - Women in AI 2026-01-22 · 23:00

Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

Date, Time and Location

Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Align Before You Recommend

The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

About the Speaker

Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

About the Speaker

Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

About the Speaker

Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

FiftyOne Labs: Enabling experimentation for the computer vision community

FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

About the Speaker

Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.

Jan 22 - Women in AI
Jan 22 - Women in AI 2026-01-22 · 23:00

Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

Date, Time and Location

Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Align Before You Recommend

The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

About the Speaker

Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

About the Speaker

Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

About the Speaker

Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

FiftyOne Labs: Enabling experimentation for the computer vision community

FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

About the Speaker

Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.

Jan 22 - Women in AI
Jan 22 - Women in AI 2026-01-22 · 23:00

Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

Date, Time and Location

Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Align Before You Recommend

The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

About the Speaker

Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

About the Speaker

Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

About the Speaker

Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

FiftyOne Labs: Enabling experimentation for the computer vision community

FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

About the Speaker

Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.

Jan 22 - Women in AI
Jan 22 - Women in AI 2026-01-22 · 23:00

Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

Date, Time and Location

Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Align Before You Recommend

The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

About the Speaker

Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

About the Speaker

Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

About the Speaker

Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

FiftyOne Labs: Enabling experimentation for the computer vision community

FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

About the Speaker

Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.

Jan 22 - Women in AI
Jan 22 - Women in AI 2026-01-22 · 17:00

Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

Date, Time and Location

Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Align Before You Recommend

The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

About the Speaker

Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

About the Speaker

Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

About the Speaker

Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

FiftyOne Labs: Enabling experimentation for the computer vision community

FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

About the Speaker

Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.

Jan 22 - Women in AI
Jan 22 - Women in AI 2026-01-22 · 17:00

Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

Date, Time and Location

Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Align Before You Recommend

The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

About the Speaker

Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

About the Speaker

Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

About the Speaker

Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

FiftyOne Labs: Enabling experimentation for the computer vision community

FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

About the Speaker

Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.

Jan 22 - Women in AI
Jan 22 - Women in AI 2026-01-22 · 17:00

Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

Date, Time and Location

Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Align Before You Recommend

The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

About the Speaker

Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

About the Speaker

Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

About the Speaker

Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

FiftyOne Labs: Enabling experimentation for the computer vision community

FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

About the Speaker

Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.

Jan 22 - Women in AI
Jan 14 - Best of NeurIPS 2026-01-14 · 17:00

Welcome to the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined the conference. Live streaming from the authors to you.

Jan 14, 2025 9 AM Pacific Online. Register for the Zoom!

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery,

EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

About the Speaker

Ege Özsoy is a last year PhD student researching multimodal computer vision and vision–language models for surgical scene understanding, focusing on semantic scene graphs, multimodality, and ego-exocentric modeling in operating rooms.

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

Few-shot segmentation requires recognizing novel object categories from only a few annotated examples, demanding both accurate mask generation and strong visual correspondence. While Segment Anything 2 (SAM2) provides powerful prompt-based segmentation and built-in feature matching, its representations are entangled with tracking-specific cues that limit higher-level semantic generalization. We show that SAM2 nonetheless encodes rich latent semantic structure despite its class-agnostic training. To leverage this, we introduce SANSA, a lightweight framework that makes this structure explicit and adapts SAM2 for few-shot segmentation with minimal modifications. SANSA achieves state-of-the-art generalization performance, outperforms generalist in-context methods, supports flexible prompting, and remains significantly faster and smaller than prior approaches.

About the Speaker

Claudia Cuttano is a PhD student in the VANDAL Lab at Politecnico di Torino and is currently conducting a research visit at TU Darmstadt with Prof. Stefan Roth in the Visual Inference Lab. Her work centers on semantic segmentation, particularly on multi-modal scene understanding and leveraging foundation models for pixel-level vision tasks.

Nested Learning: The Illusion of Deep Learning Architectures

We present Nested Learning (NL), a new learning paradigm for continual learning that views machine learning models and their training process as a set of nested and/or parallel optimization problems, each of which with its own context flow, frequency of update, and learning algorithm. Based on NL, we design a new architecture, called Hope, that is capable of continual learning and also modifying itself, if it is needed.

About the Speaker

Ali Behrouz is a Ph.D. student in the Computer Science Department at Cornell University and a research intern at Google Research. His research spans topics from deep learning architectures to continual learning and neuroscience, and appeared at NeurIPS, ICML, KDD, WWW, CHIL, VLDB, ... conferences. His work has been featured with two Best Paper awards, a Best Paper Honorable Mention award, a Best Paper Award candidate, and oral and spotlight presentations.

Are VLM Explanations Faithful? A Counterfactual Testing Approach

VLMs sound convincing—but are their explanations actually true? This talk introduces Explanation-Driven Counterfactual Testing (EDCT), a simple and model-agnostic method that evaluates whether VLM explanations align with the evidence models truly use. By perturbing the very features a model claims to rely on, EDCT exposes mismatches between stated reasoning and real decision pathways. I will show surprising failure cases across state-of-the-art VLMs and highlight how EDCT can guide more trustworthy explanation methods.

About the Speaker

Santosh Vasa is a Machine Learning Engineer at Mercedes-Benz R&D North America, working on multimodal perception and VLM safety for autonomous driving. He co-authored the EDCT framework and focuses on explainability, counterfactual testing, and trustworthy AI.

Jan 14 - Best of NeurIPS
Jan 14 - Best of NeurIPS 2026-01-14 · 17:00

Welcome to the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined the conference. Live streaming from the authors to you.

Jan 14, 2025 9 AM Pacific Online. Register for the Zoom!

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery,

EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

About the Speaker

Ege Özsoy is a last year PhD student researching multimodal computer vision and vision–language models for surgical scene understanding, focusing on semantic scene graphs, multimodality, and ego-exocentric modeling in operating rooms.

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

Few-shot segmentation requires recognizing novel object categories from only a few annotated examples, demanding both accurate mask generation and strong visual correspondence. While Segment Anything 2 (SAM2) provides powerful prompt-based segmentation and built-in feature matching, its representations are entangled with tracking-specific cues that limit higher-level semantic generalization. We show that SAM2 nonetheless encodes rich latent semantic structure despite its class-agnostic training. To leverage this, we introduce SANSA, a lightweight framework that makes this structure explicit and adapts SAM2 for few-shot segmentation with minimal modifications. SANSA achieves state-of-the-art generalization performance, outperforms generalist in-context methods, supports flexible prompting, and remains significantly faster and smaller than prior approaches.

About the Speaker

Claudia Cuttano is a PhD student in the VANDAL Lab at Politecnico di Torino and is currently conducting a research visit at TU Darmstadt with Prof. Stefan Roth in the Visual Inference Lab. Her work centers on semantic segmentation, particularly on multi-modal scene understanding and leveraging foundation models for pixel-level vision tasks.

Nested Learning: The Illusion of Deep Learning Architectures

We present Nested Learning (NL), a new learning paradigm for continual learning that views machine learning models and their training process as a set of nested and/or parallel optimization problems, each of which with its own context flow, frequency of update, and learning algorithm. Based on NL, we design a new architecture, called Hope, that is capable of continual learning and also modifying itself, if it is needed.

About the Speaker

Ali Behrouz is a Ph.D. student in the Computer Science Department at Cornell University and a research intern at Google Research. His research spans topics from deep learning architectures to continual learning and neuroscience, and appeared at NeurIPS, ICML, KDD, WWW, CHIL, VLDB, ... conferences. His work has been featured with two Best Paper awards, a Best Paper Honorable Mention award, a Best Paper Award candidate, and oral and spotlight presentations.

Are VLM Explanations Faithful? A Counterfactual Testing Approach

VLMs sound convincing—but are their explanations actually true? This talk introduces Explanation-Driven Counterfactual Testing (EDCT), a simple and model-agnostic method that evaluates whether VLM explanations align with the evidence models truly use. By perturbing the very features a model claims to rely on, EDCT exposes mismatches between stated reasoning and real decision pathways. I will show surprising failure cases across state-of-the-art VLMs and highlight how EDCT can guide more trustworthy explanation methods.

About the Speaker

Santosh Vasa is a Machine Learning Engineer at Mercedes-Benz R&D North America, working on multimodal perception and VLM safety for autonomous driving. He co-authored the EDCT framework and focuses on explainability, counterfactual testing, and trustworthy AI.

Jan 14 - Best of NeurIPS
Jan 14 - Best of NeurIPS 2026-01-14 · 17:00

Welcome to the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined the conference. Live streaming from the authors to you.

Jan 14, 2025 9 AM Pacific Online. Register for the Zoom!

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery,

EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

About the Speaker

Ege Özsoy is a last year PhD student researching multimodal computer vision and vision–language models for surgical scene understanding, focusing on semantic scene graphs, multimodality, and ego-exocentric modeling in operating rooms.

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

Few-shot segmentation requires recognizing novel object categories from only a few annotated examples, demanding both accurate mask generation and strong visual correspondence. While Segment Anything 2 (SAM2) provides powerful prompt-based segmentation and built-in feature matching, its representations are entangled with tracking-specific cues that limit higher-level semantic generalization. We show that SAM2 nonetheless encodes rich latent semantic structure despite its class-agnostic training. To leverage this, we introduce SANSA, a lightweight framework that makes this structure explicit and adapts SAM2 for few-shot segmentation with minimal modifications. SANSA achieves state-of-the-art generalization performance, outperforms generalist in-context methods, supports flexible prompting, and remains significantly faster and smaller than prior approaches.

About the Speaker

Claudia Cuttano is a PhD student in the VANDAL Lab at Politecnico di Torino and is currently conducting a research visit at TU Darmstadt with Prof. Stefan Roth in the Visual Inference Lab. Her work centers on semantic segmentation, particularly on multi-modal scene understanding and leveraging foundation models for pixel-level vision tasks.

Nested Learning: The Illusion of Deep Learning Architectures

We present Nested Learning (NL), a new learning paradigm for continual learning that views machine learning models and their training process as a set of nested and/or parallel optimization problems, each of which with its own context flow, frequency of update, and learning algorithm. Based on NL, we design a new architecture, called Hope, that is capable of continual learning and also modifying itself, if it is needed.

About the Speaker

Ali Behrouz is a Ph.D. student in the Computer Science Department at Cornell University and a research intern at Google Research. His research spans topics from deep learning architectures to continual learning and neuroscience, and appeared at NeurIPS, ICML, KDD, WWW, CHIL, VLDB, ... conferences. His work has been featured with two Best Paper awards, a Best Paper Honorable Mention award, a Best Paper Award candidate, and oral and spotlight presentations.

Are VLM Explanations Faithful? A Counterfactual Testing Approach

VLMs sound convincing—but are their explanations actually true? This talk introduces Explanation-Driven Counterfactual Testing (EDCT), a simple and model-agnostic method that evaluates whether VLM explanations align with the evidence models truly use. By perturbing the very features a model claims to rely on, EDCT exposes mismatches between stated reasoning and real decision pathways. I will show surprising failure cases across state-of-the-art VLMs and highlight how EDCT can guide more trustworthy explanation methods.

About the Speaker

Santosh Vasa is a Machine Learning Engineer at Mercedes-Benz R&D North America, working on multimodal perception and VLM safety for autonomous driving. He co-authored the EDCT framework and focuses on explainability, counterfactual testing, and trustworthy AI.

Jan 14 - Best of NeurIPS
Jan 14 - Best of NeurIPS 2026-01-14 · 17:00

Welcome to the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined the conference. Live streaming from the authors to you.

Jan 14, 2025 9 AM Pacific Online. Register for the Zoom!

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery,

EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

About the Speaker

Ege Özsoy is a last year PhD student researching multimodal computer vision and vision–language models for surgical scene understanding, focusing on semantic scene graphs, multimodality, and ego-exocentric modeling in operating rooms.

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

Few-shot segmentation requires recognizing novel object categories from only a few annotated examples, demanding both accurate mask generation and strong visual correspondence. While Segment Anything 2 (SAM2) provides powerful prompt-based segmentation and built-in feature matching, its representations are entangled with tracking-specific cues that limit higher-level semantic generalization. We show that SAM2 nonetheless encodes rich latent semantic structure despite its class-agnostic training. To leverage this, we introduce SANSA, a lightweight framework that makes this structure explicit and adapts SAM2 for few-shot segmentation with minimal modifications. SANSA achieves state-of-the-art generalization performance, outperforms generalist in-context methods, supports flexible prompting, and remains significantly faster and smaller than prior approaches.

About the Speaker

Claudia Cuttano is a PhD student in the VANDAL Lab at Politecnico di Torino and is currently conducting a research visit at TU Darmstadt with Prof. Stefan Roth in the Visual Inference Lab. Her work centers on semantic segmentation, particularly on multi-modal scene understanding and leveraging foundation models for pixel-level vision tasks.

Nested Learning: The Illusion of Deep Learning Architectures

We present Nested Learning (NL), a new learning paradigm for continual learning that views machine learning models and their training process as a set of nested and/or parallel optimization problems, each of which with its own context flow, frequency of update, and learning algorithm. Based on NL, we design a new architecture, called Hope, that is capable of continual learning and also modifying itself, if it is needed.

About the Speaker

Ali Behrouz is a Ph.D. student in the Computer Science Department at Cornell University and a research intern at Google Research. His research spans topics from deep learning architectures to continual learning and neuroscience, and appeared at NeurIPS, ICML, KDD, WWW, CHIL, VLDB, ... conferences. His work has been featured with two Best Paper awards, a Best Paper Honorable Mention award, a Best Paper Award candidate, and oral and spotlight presentations.

Are VLM Explanations Faithful? A Counterfactual Testing Approach

VLMs sound convincing—but are their explanations actually true? This talk introduces Explanation-Driven Counterfactual Testing (EDCT), a simple and model-agnostic method that evaluates whether VLM explanations align with the evidence models truly use. By perturbing the very features a model claims to rely on, EDCT exposes mismatches between stated reasoning and real decision pathways. I will show surprising failure cases across state-of-the-art VLMs and highlight how EDCT can guide more trustworthy explanation methods.

About the Speaker

Santosh Vasa is a Machine Learning Engineer at Mercedes-Benz R&D North America, working on multimodal perception and VLM safety for autonomous driving. He co-authored the EDCT framework and focuses on explainability, counterfactual testing, and trustworthy AI.

Jan 14 - Best of NeurIPS
Jan 14 - Best of NeurIPS 2026-01-14 · 17:00

Welcome to the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined the conference. Live streaming from the authors to you.

Jan 14, 2025 9 AM Pacific Online. Register for the Zoom!

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery,

EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

About the Speaker

Ege Özsoy is a last year PhD student researching multimodal computer vision and vision–language models for surgical scene understanding, focusing on semantic scene graphs, multimodality, and ego-exocentric modeling in operating rooms.

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

Few-shot segmentation requires recognizing novel object categories from only a few annotated examples, demanding both accurate mask generation and strong visual correspondence. While Segment Anything 2 (SAM2) provides powerful prompt-based segmentation and built-in feature matching, its representations are entangled with tracking-specific cues that limit higher-level semantic generalization. We show that SAM2 nonetheless encodes rich latent semantic structure despite its class-agnostic training. To leverage this, we introduce SANSA, a lightweight framework that makes this structure explicit and adapts SAM2 for few-shot segmentation with minimal modifications. SANSA achieves state-of-the-art generalization performance, outperforms generalist in-context methods, supports flexible prompting, and remains significantly faster and smaller than prior approaches.

About the Speaker

Claudia Cuttano is a PhD student in the VANDAL Lab at Politecnico di Torino and is currently conducting a research visit at TU Darmstadt with Prof. Stefan Roth in the Visual Inference Lab. Her work centers on semantic segmentation, particularly on multi-modal scene understanding and leveraging foundation models for pixel-level vision tasks.

Nested Learning: The Illusion of Deep Learning Architectures

We present Nested Learning (NL), a new learning paradigm for continual learning that views machine learning models and their training process as a set of nested and/or parallel optimization problems, each of which with its own context flow, frequency of update, and learning algorithm. Based on NL, we design a new architecture, called Hope, that is capable of continual learning and also modifying itself, if it is needed.

About the Speaker

Ali Behrouz is a Ph.D. student in the Computer Science Department at Cornell University and a research intern at Google Research. His research spans topics from deep learning architectures to continual learning and neuroscience, and appeared at NeurIPS, ICML, KDD, WWW, CHIL, VLDB, ... conferences. His work has been featured with two Best Paper awards, a Best Paper Honorable Mention award, a Best Paper Award candidate, and oral and spotlight presentations.

Are VLM Explanations Faithful? A Counterfactual Testing Approach

VLMs sound convincing—but are their explanations actually true? This talk introduces Explanation-Driven Counterfactual Testing (EDCT), a simple and model-agnostic method that evaluates whether VLM explanations align with the evidence models truly use. By perturbing the very features a model claims to rely on, EDCT exposes mismatches between stated reasoning and real decision pathways. I will show surprising failure cases across state-of-the-art VLMs and highlight how EDCT can guide more trustworthy explanation methods.

About the Speaker

Santosh Vasa is a Machine Learning Engineer at Mercedes-Benz R&D North America, working on multimodal perception and VLM safety for autonomous driving. He co-authored the EDCT framework and focuses on explainability, counterfactual testing, and trustworthy AI.

Jan 14 - Best of NeurIPS
Jan 14 - Best of NeurIPS 2026-01-14 · 17:00

Welcome to the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined the conference. Live streaming from the authors to you.

Jan 14, 2025 9 AM Pacific Online. Register for the Zoom!

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery,

EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

About the Speaker

Ege Özsoy is a last year PhD student researching multimodal computer vision and vision–language models for surgical scene understanding, focusing on semantic scene graphs, multimodality, and ego-exocentric modeling in operating rooms.

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

Few-shot segmentation requires recognizing novel object categories from only a few annotated examples, demanding both accurate mask generation and strong visual correspondence. While Segment Anything 2 (SAM2) provides powerful prompt-based segmentation and built-in feature matching, its representations are entangled with tracking-specific cues that limit higher-level semantic generalization. We show that SAM2 nonetheless encodes rich latent semantic structure despite its class-agnostic training. To leverage this, we introduce SANSA, a lightweight framework that makes this structure explicit and adapts SAM2 for few-shot segmentation with minimal modifications. SANSA achieves state-of-the-art generalization performance, outperforms generalist in-context methods, supports flexible prompting, and remains significantly faster and smaller than prior approaches.

About the Speaker

Claudia Cuttano is a PhD student in the VANDAL Lab at Politecnico di Torino and is currently conducting a research visit at TU Darmstadt with Prof. Stefan Roth in the Visual Inference Lab. Her work centers on semantic segmentation, particularly on multi-modal scene understanding and leveraging foundation models for pixel-level vision tasks.

Nested Learning: The Illusion of Deep Learning Architectures

We present Nested Learning (NL), a new learning paradigm for continual learning that views machine learning models and their training process as a set of nested and/or parallel optimization problems, each of which with its own context flow, frequency of update, and learning algorithm. Based on NL, we design a new architecture, called Hope, that is capable of continual learning and also modifying itself, if it is needed.

About the Speaker

Ali Behrouz is a Ph.D. student in the Computer Science Department at Cornell University and a research intern at Google Research. His research spans topics from deep learning architectures to continual learning and neuroscience, and appeared at NeurIPS, ICML, KDD, WWW, CHIL, VLDB, ... conferences. His work has been featured with two Best Paper awards, a Best Paper Honorable Mention award, a Best Paper Award candidate, and oral and spotlight presentations.

Are VLM Explanations Faithful? A Counterfactual Testing Approach

VLMs sound convincing—but are their explanations actually true? This talk introduces Explanation-Driven Counterfactual Testing (EDCT), a simple and model-agnostic method that evaluates whether VLM explanations align with the evidence models truly use. By perturbing the very features a model claims to rely on, EDCT exposes mismatches between stated reasoning and real decision pathways. I will show surprising failure cases across state-of-the-art VLMs and highlight how EDCT can guide more trustworthy explanation methods.

About the Speaker

Santosh Vasa is a Machine Learning Engineer at Mercedes-Benz R&D North America, working on multimodal perception and VLM safety for autonomous driving. He co-authored the EDCT framework and focuses on explainability, counterfactual testing, and trustworthy AI.

Jan 14 - Best of NeurIPS
Jan 14 - Best of NeurIPS 2026-01-14 · 17:00

Welcome to the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined the conference. Live streaming from the authors to you.

Jan 14, 2025 9 AM Pacific Online. Register for the Zoom!

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery,

EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

About the Speaker

Ege Özsoy is a last year PhD student researching multimodal computer vision and vision–language models for surgical scene understanding, focusing on semantic scene graphs, multimodality, and ego-exocentric modeling in operating rooms.

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

Few-shot segmentation requires recognizing novel object categories from only a few annotated examples, demanding both accurate mask generation and strong visual correspondence. While Segment Anything 2 (SAM2) provides powerful prompt-based segmentation and built-in feature matching, its representations are entangled with tracking-specific cues that limit higher-level semantic generalization. We show that SAM2 nonetheless encodes rich latent semantic structure despite its class-agnostic training. To leverage this, we introduce SANSA, a lightweight framework that makes this structure explicit and adapts SAM2 for few-shot segmentation with minimal modifications. SANSA achieves state-of-the-art generalization performance, outperforms generalist in-context methods, supports flexible prompting, and remains significantly faster and smaller than prior approaches.

About the Speaker

Claudia Cuttano is a PhD student in the VANDAL Lab at Politecnico di Torino and is currently conducting a research visit at TU Darmstadt with Prof. Stefan Roth in the Visual Inference Lab. Her work centers on semantic segmentation, particularly on multi-modal scene understanding and leveraging foundation models for pixel-level vision tasks.

Nested Learning: The Illusion of Deep Learning Architectures

We present Nested Learning (NL), a new learning paradigm for continual learning that views machine learning models and their training process as a set of nested and/or parallel optimization problems, each of which with its own context flow, frequency of update, and learning algorithm. Based on NL, we design a new architecture, called Hope, that is capable of continual learning and also modifying itself, if it is needed.

About the Speaker

Ali Behrouz is a Ph.D. student in the Computer Science Department at Cornell University and a research intern at Google Research. His research spans topics from deep learning architectures to continual learning and neuroscience, and appeared at NeurIPS, ICML, KDD, WWW, CHIL, VLDB, ... conferences. His work has been featured with two Best Paper awards, a Best Paper Honorable Mention award, a Best Paper Award candidate, and oral and spotlight presentations.

Are VLM Explanations Faithful? A Counterfactual Testing Approach

VLMs sound convincing—but are their explanations actually true? This talk introduces Explanation-Driven Counterfactual Testing (EDCT), a simple and model-agnostic method that evaluates whether VLM explanations align with the evidence models truly use. By perturbing the very features a model claims to rely on, EDCT exposes mismatches between stated reasoning and real decision pathways. I will show surprising failure cases across state-of-the-art VLMs and highlight how EDCT can guide more trustworthy explanation methods.

About the Speaker

Santosh Vasa is a Machine Learning Engineer at Mercedes-Benz R&D North America, working on multimodal perception and VLM safety for autonomous driving. He co-authored the EDCT framework and focuses on explainability, counterfactual testing, and trustworthy AI.

Jan 14 - Best of NeurIPS
Jan 14 - Best of NeurIPS 2026-01-14 · 17:00

Welcome to the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined the conference. Live streaming from the authors to you.

Jan 14, 2025 9 AM Pacific Online. Register for the Zoom!

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery,

EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568,235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR's multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception.

About the Speaker

Ege Özsoy is a last year PhD student researching multimodal computer vision and vision–language models for surgical scene understanding, focusing on semantic scene graphs, multimodality, and ego-exocentric modeling in operating rooms.

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

Few-shot segmentation requires recognizing novel object categories from only a few annotated examples, demanding both accurate mask generation and strong visual correspondence. While Segment Anything 2 (SAM2) provides powerful prompt-based segmentation and built-in feature matching, its representations are entangled with tracking-specific cues that limit higher-level semantic generalization. We show that SAM2 nonetheless encodes rich latent semantic structure despite its class-agnostic training. To leverage this, we introduce SANSA, a lightweight framework that makes this structure explicit and adapts SAM2 for few-shot segmentation with minimal modifications. SANSA achieves state-of-the-art generalization performance, outperforms generalist in-context methods, supports flexible prompting, and remains significantly faster and smaller than prior approaches.

About the Speaker

Claudia Cuttano is a PhD student in the VANDAL Lab at Politecnico di Torino and is currently conducting a research visit at TU Darmstadt with Prof. Stefan Roth in the Visual Inference Lab. Her work centers on semantic segmentation, particularly on multi-modal scene understanding and leveraging foundation models for pixel-level vision tasks.

Nested Learning: The Illusion of Deep Learning Architectures

We present Nested Learning (NL), a new learning paradigm for continual learning that views machine learning models and their training process as a set of nested and/or parallel optimization problems, each of which with its own context flow, frequency of update, and learning algorithm. Based on NL, we design a new architecture, called Hope, that is capable of continual learning and also modifying itself, if it is needed.

About the Speaker

Ali Behrouz is a Ph.D. student in the Computer Science Department at Cornell University and a research intern at Google Research. His research spans topics from deep learning architectures to continual learning and neuroscience, and appeared at NeurIPS, ICML, KDD, WWW, CHIL, VLDB, ... conferences. His work has been featured with two Best Paper awards, a Best Paper Honorable Mention award, a Best Paper Award candidate, and oral and spotlight presentations.

Are VLM Explanations Faithful? A Counterfactual Testing Approach

VLMs sound convincing—but are their explanations actually true? This talk introduces Explanation-Driven Counterfactual Testing (EDCT), a simple and model-agnostic method that evaluates whether VLM explanations align with the evidence models truly use. By perturbing the very features a model claims to rely on, EDCT exposes mismatches between stated reasoning and real decision pathways. I will show surprising failure cases across state-of-the-art VLMs and highlight how EDCT can guide more trustworthy explanation methods.

About the Speaker

Santosh Vasa is a Machine Learning Engineer at Mercedes-Benz R&D North America, working on multimodal perception and VLM safety for autonomous driving. He co-authored the EDCT framework and focuses on explainability, counterfactual testing, and trustworthy AI.

Jan 14 - Best of NeurIPS

** Registration required on this site: https://bit.ly/hcls-nyc-meetup-2025 ** - no need to RSVP on Meetup.

Join us for an exclusive AWS Healthcare & Life Sciences Customer Meetup in NYC (and virtually!) on December 10th.Fresh off the heels of #reInvent2025, we're bringing together healthcare and life sciences leaders to explore the latest innovations transforming our industry.What to expect:

• Deep dive into re:Invent 2025's newest HCLS capabilities • Real-world insights on building patient-centric data ecosystems for precision oncology • Strategies for delivering compliance solutions for health plans • AI-powered orchestration and benchmarking-driven optimization • Happy Hour following the presentations sponsored by PTP!

Featured speakers include: • Navdeep Alam, CTO @ Abacus Insights • Fengbo Ren, Founder & CEO @ Fovus • Marc Mason, CDAO & CTO @ Colorectal Cancer Alliance • MaryAnne Rizk, Ph.D., Managing Director, Head of HCLS Clinical Strategy @ Amazon Web Services (AWS) • Ariella Sasson, Sr. Manager & HCLS D&AI SA Leader @ Amazon Web Services (AWS) • Aaron Jeskey, Principal Cloud Architect @ PTP

:date: December 10, 2025 :alarm_clock: 1:30 PM - 5:30 PM (Meetup) \| 5:30 PM - 7:30 PM (Happy Hour) :round_pushpin: 12 W 39th St, New York, NY 10018 :globe_with_meridians: Virtual attendance available

Stay for our Happy Hour with food and drinks, sponsored by PTP, to continue the conversation and network with peers tackling similar challenges.

Whether you're a healthcare executive, data architect, or technical decision-maker working to advance patient care through technology, this is your opportunity to connect with industry leaders and AWS experts.:point_right: Visit your secure your spot - https://bit.ly/hcls-nyc-meetup-2025

AWS HCLS Customer Meetup - NYC [use link in description to register!]

Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Register for the Zoom

Date and Time

Dec 4, 2025 9:00 - 11:00 AM Pacific

Benchmarking Vision-Language Models for Autonomous Driving Safety

This workshop introduces a unified framework for evaluating how vision-language models handle driving safety. Using an enhanced BDDOIA dataset with scene, weather, and action labels, we benchmark models like Gemini, FastVLM, and Qwen within FiftyOne. Our results show consistent blind spots where models misjudge unsafe situations, highlighting the need for safer and more interpretable AI systems for autonomous driving.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI — making technology simple\, accessible\, and reliable.

TrueRice: AI-Powered Visual Quality Control for Rice Grains and Beyond at Scale

Agriculture remains one of the most under-digitized industries, yet grain quality control defines pricing, trust, and livelihoods for millions. TrueRice is an AI-powered analyzer that turns a flatbed scanner into a high-precision, 30-second QC engine, replacing the 2+ hours and subjectivity of manual quality inspection.

Built on a state-of-the-art 8K image processing pipeline with SAHI (Slicing Aided Hyper Inference), it detects fine-grained kernel defects at scale with high accuracy across grain size, shape, breakage, discoloration, and chalkiness. Now being extended to maize and coffee, TrueRice showcases how cross-crop transfer learning and frugal AI engineering can scale precision QC for farmers, millers, and exporters. This talk will cover the design principles, model architecture choices, and a live demonstration, while addressing challenges in data variability, regulatory standards, and cross-crop adaptation.

About the Speaker

Sai Jeevan Puchakayala is an Interdisciplinary AI/ML Consultant, Researcher, and Tech Lead at Sustainable Living Lab (SL2) India, where he drives development of applied AI solutions for agriculture, climate resilience, and sustainability. He led the engineering of TrueRice, an award-winning grain quality analyzer that won India’s first International Agri Hackathon 2025.

WeedNet: A Foundation Model Based Global-to-Local AI Approach for Real-Time Weed Species Identification and Classification

Early and accurate weed identification is critical for effective management, yet current AI-based approaches face challenges due to limited expert-verified datasets and the high variability in weed morphology across species and growth stages. We present WeedNet, a global-scale weed identification model designed to recognize a wide range of species, including noxious and invasive plants. WeedNet is an end-to-end real-time pipeline that integrates self-supervised pretraining, fine-tuning, and trustworthiness strategies to improve both accuracy and reliability.

Building on this foundation, we introduce a Global-to-Local strategy: while the Global WeedNet model provides broad generalization, we fine-tune local variants such as Iowa WeedNet to target region-specific weed communities in the U.S. Midwest. Our evaluation addresses both intra-species diversity (different growth stages) and inter-species similarity (look-alike species), ensuring robust performance under real-world variability. We further validate WeedNet on images captured by drones and ground rovers, demonstrating its potential for deployment in robotic platforms. Beyond field applications, we integrate a conversational AI to enable practical decision-support tools for farmers, agronomists, researchers, and land managers worldwide. These advances position WeedNet as a foundational model for intelligent, scalable, and regionally adaptable weed management and ecological conservation.

About the Speaker

Timilehin Ayanlade is a Ph.D. candidate in the Self-aware Complex Systems Laboratory at Iowa State University, where his research focuses on developing machine learning and computer vision methods for agricultural applications. His work integrates multimodal data across ground-based sensing, UAV, and satellite with advanced AI models to tackle challenges in weed identification, crop monitoring, and crop yield prediction.

Memory Matters: Early Alzheimer’s Detection with AI-Powered Mobile Tools

Advancements in artificial intelligence and mobile technology are transforming the landscape of neurodegenerative disease detection, offering new hope for early intervention in Alzheimer’s. By integrating machine learning algorithms with everyday mobile devices, we are entering a new era of accessible, scalable, and non-invasive tools for early Alzheimer’s detection In this talk, we’ll cover the potential of AI in health care systems, ethical considerations, plus an architecture, model, datasets and framework deep dive.

About the Speaker

Reetam Biswas has more than 18 years of experience in the IT industry as a software architect, currently working on AI.

Dec 4 - AI, ML and Computer Vision Meetup

Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Register for the Zoom

Date and Time

Dec 4, 2025 9:00 - 11:00 AM Pacific

Benchmarking Vision-Language Models for Autonomous Driving Safety

This workshop introduces a unified framework for evaluating how vision-language models handle driving safety. Using an enhanced BDDOIA dataset with scene, weather, and action labels, we benchmark models like Gemini, FastVLM, and Qwen within FiftyOne. Our results show consistent blind spots where models misjudge unsafe situations, highlighting the need for safer and more interpretable AI systems for autonomous driving.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI — making technology simple\, accessible\, and reliable.

TrueRice: AI-Powered Visual Quality Control for Rice Grains and Beyond at Scale

Agriculture remains one of the most under-digitized industries, yet grain quality control defines pricing, trust, and livelihoods for millions. TrueRice is an AI-powered analyzer that turns a flatbed scanner into a high-precision, 30-second QC engine, replacing the 2+ hours and subjectivity of manual quality inspection.

Built on a state-of-the-art 8K image processing pipeline with SAHI (Slicing Aided Hyper Inference), it detects fine-grained kernel defects at scale with high accuracy across grain size, shape, breakage, discoloration, and chalkiness. Now being extended to maize and coffee, TrueRice showcases how cross-crop transfer learning and frugal AI engineering can scale precision QC for farmers, millers, and exporters. This talk will cover the design principles, model architecture choices, and a live demonstration, while addressing challenges in data variability, regulatory standards, and cross-crop adaptation.

About the Speaker

Sai Jeevan Puchakayala is an Interdisciplinary AI/ML Consultant, Researcher, and Tech Lead at Sustainable Living Lab (SL2) India, where he drives development of applied AI solutions for agriculture, climate resilience, and sustainability. He led the engineering of TrueRice, an award-winning grain quality analyzer that won India’s first International Agri Hackathon 2025.

WeedNet: A Foundation Model Based Global-to-Local AI Approach for Real-Time Weed Species Identification and Classification

Early and accurate weed identification is critical for effective management, yet current AI-based approaches face challenges due to limited expert-verified datasets and the high variability in weed morphology across species and growth stages. We present WeedNet, a global-scale weed identification model designed to recognize a wide range of species, including noxious and invasive plants. WeedNet is an end-to-end real-time pipeline that integrates self-supervised pretraining, fine-tuning, and trustworthiness strategies to improve both accuracy and reliability.

Building on this foundation, we introduce a Global-to-Local strategy: while the Global WeedNet model provides broad generalization, we fine-tune local variants such as Iowa WeedNet to target region-specific weed communities in the U.S. Midwest. Our evaluation addresses both intra-species diversity (different growth stages) and inter-species similarity (look-alike species), ensuring robust performance under real-world variability. We further validate WeedNet on images captured by drones and ground rovers, demonstrating its potential for deployment in robotic platforms. Beyond field applications, we integrate a conversational AI to enable practical decision-support tools for farmers, agronomists, researchers, and land managers worldwide. These advances position WeedNet as a foundational model for intelligent, scalable, and regionally adaptable weed management and ecological conservation.

About the Speaker

Timilehin Ayanlade is a Ph.D. candidate in the Self-aware Complex Systems Laboratory at Iowa State University, where his research focuses on developing machine learning and computer vision methods for agricultural applications. His work integrates multimodal data across ground-based sensing, UAV, and satellite with advanced AI models to tackle challenges in weed identification, crop monitoring, and crop yield prediction.

Memory Matters: Early Alzheimer’s Detection with AI-Powered Mobile Tools

Advancements in artificial intelligence and mobile technology are transforming the landscape of neurodegenerative disease detection, offering new hope for early intervention in Alzheimer’s. By integrating machine learning algorithms with everyday mobile devices, we are entering a new era of accessible, scalable, and non-invasive tools for early Alzheimer’s detection In this talk, we’ll cover the potential of AI in health care systems, ethical considerations, plus an architecture, model, datasets and framework deep dive.

About the Speaker

Reetam Biswas has more than 18 years of experience in the IT industry as a software architect, currently working on AI.

Dec 4 - AI, ML and Computer Vision Meetup