talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (142 results)

See all 142 →

Companies (3 results)

Philips 1 speaker
Phillips 1 speaker

Activities & events

Title & Speakers Event

We're back to our usual format of 3 speakers + Q&A followed by networking.

18:30: Doors open 19:00 Welcome by Alexandra Deschamps-Sonsino (organiser since 2011) 19:10 James Harding, Green Custard 19:30 Max Park 19:50 Philip Steele, Octopus Energy 20:10 Networking 21:00 End of event and move to nearby pub

There will be drinks (including non-alcoholic) and nibbles.

About our sponsors: Green Custard is an award-winning AWS IoT specialist consultancy that helps organisations design, build, and scale secure cloud-connected products and solutions. With deep expertise across IoT architectures, data lakes, ML and agentic AI our IoT expertise spans from product innovation to smart factory innovation. We have full stack capability from embedded & mobile to cloud-native development. Green Custard partners with clients to deliver on their business outcomes whether modernising legacy devices, unlocking new value from existing data, or accelerating the delivery of production-ready solutions. Customers trust Green Custard for its technical excellence, pragmatic delivery, and we are proud of our customer success.

Aqua Libra is redefining sustainable hydration through innovative, low-impact dispensing technology. Beyond its zero-sugar infused drinks, the brand has pioneered solutions such as the Aqua Libra Flavour Tap, a digitally controlled dispensing system that delivers chilled, filtered water with natural flavours on demand. By removing the need for single-use bottles and reducing transport emissions, the Flavour Tap enables offices, venues, and public spaces to offer great-tasting drinks with a dramatically smaller environmental footprint. Aqua Libra’s approach blends sustainability, smart technology, and convenience to meet the growing demand for healthier, greener beverage options.

About our host: British Computing Society is a professional body and a learned society that represents those working in information technology, computing, software engineering, computer engineering and computer science, both in the United Kingdom and internationally.

The meetup will be held at 25 Copthall Ave, London EC2R 7BP, NOT their Southampton Row location.

London Internet of Things Meetup No.151

​LLMs are powerful, but they still hallucinate facts, especially when asked about entities, relationships, or claims that require up-to-date or structured knowledge.

​In this hands-on workshop, we'll explore how to use Wikidata as a grounding and fact-checking layer for LLMs to reduce hallucinations and make AI systems more reliable.

​We'll start with a short introduction to Wikidata and then set up the Wikidata MCP so an LLM can retrieve and verify facts rather than relying solely on its internal memory. This already provides a practical way to ground LLM outputs in verifiable data.

​From there, we’ll go beyond LLM-only approaches and build a small experimental fact-checking pipeline. The system combines semantic retrieval, LLM-based reranking, and natural language inference (NLI) to validate claims against evidence in a more controlled and interpretable way.

​This workshop focuses on evidence-driven verification pipelines that make LLM's reasoning steps explicit and easier to inspect, debug, and improve.

​What we'll cover:

  • ​Wikidata as a structured source for factual verification
  • ​Setting up and querying Wikidata using MCP
  • ​Verifying claims with MCP + an LLM
  • ​Moving beyond pure GenAI to evidence-based fact-checking
  • ​Finding relevant Wikidata statements with semantic search
  • ​Ranking candidate evidence with an LLM
  • ​Verifying claims using an NLI model

​​What you'll leave with By the end of the workshop, you'll be able to:

  • Ground LLM outputs in structured data to reduce hallucinations
  • ​Understand when LLM-only fact-checking is not enough
  • ​Build a small, transparent fact-checking pipeline you can adapt to real projects

About the speaker:

Philippe Saadé is the AI/ML project manager at Wikimedia Deutschland. His current work focuses on making Wikidata accessible to AI application with projects like the Wikidata vector database and the Wikidata Model Context Protocol.

Join our Slack: https://datatalks.club/slack.html

​This event is sponsored by Wikimedia

How to Reduce LLM Hallucinations with Wikidata: Hands-On Fact-Checking Using MCP

Répondre à un appel d'offres demande en moyenne 20 à 40 heures de travail. Et si vous pouviez réduire ce temps de 80 % tout en maintenant qualité et compétitivité ? L'intelligence artificielle rend cette transformation possible dès aujourd'hui.

Dans ce webinar animé par Philippe Guiheneuc de Zenbaia et Camilo Rodriguez de Machine Learning Lab, vous découvrirez une solution innovante qui transforme la manière dont les entreprises analysent et répondent aux appels d'offres.

Ce webinar est fait pour vous si :

✔ vous répondez régulièrement à des appels d'offres publics ou privés, ✔ vous souhaitez réduire drastiquement le temps consacré à l'analyse des RFP, ✔ vous voulez améliorer vos chances de remporter des contrats, ✔ vous cherchez à optimiser la qualité de vos propositions commerciales, ✔ vous êtes curieux de comprendre comment l'IA peut s'appliquer concrètement à vos processus métier, ✔ vous voulez découvrir les aspects techniques de l'automatisation par l'IA.

📅 Vendredi 16 janvier à 11h00

👉 Inscription : https://offre.mlab.ai/zenbaia/comment-automatiser-ses-appels-doffres-grace-a-lIA/

Comment automatiser ses réponses aux appels d’offres grâce à l’IA ?

Répondre à un appel d'offres demande en moyenne 20 à 40 heures de travail. Et si vous pouviez réduire ce temps de 80 % tout en maintenant qualité et compétitivité ? L'intelligence artificielle rend cette transformation possible dès aujourd'hui.

Dans ce webinar animé par Philippe Guiheneuc de Zenbaia et Camilo Rodriguez de Machine Learning Lab, vous découvrirez une solution innovante qui transforme la manière dont les entreprises analysent et répondent aux appels d'offres.

Ce webinar est fait pour vous si :

✔ vous répondez régulièrement à des appels d'offres publics ou privés, ✔ vous souhaitez réduire drastiquement le temps consacré à l'analyse des RFP, ✔ vous voulez améliorer vos chances de remporter des contrats, ✔ vous cherchez à optimiser la qualité de vos propositions commerciales, ✔ vous êtes curieux de comprendre comment l'IA peut s'appliquer concrètement à vos processus métier, ✔ vous voulez découvrir les aspects techniques de l'automatisation par l'IA.

📅 Vendredi 16 janvier à 11h00

👉 Inscription : https://offre.mlab.ai/zenbaia/comment-automatiser-ses-appels-doffres-grace-a-lIA/

Comment automatiser ses réponses aux appels d’offres grâce à l’IA ?

We're super excited for our last meetup of 2025 - just before the holidays start, we're back in Amsterdam, and this time at AI House Amsterdam powered by Prosus, on Wednesday 10 December!

This edition will be extra interesting, since it will be about Profitability AI: Build it right. Make it fast. Keep it cheap.

AI projects don’t become business-critical overnight. They move through stages: from spark-of-an-idea prototypes to hardened, scalable systems that drive real revenue. In this meetup edition, we invite industry leaders to share the technical journeys their AI projects went through before becoming part of their core business. This evening is all about what it actually takes to build profitable AI: not just using the latest models, but creating systems that are efficient, scalable, operationally reliable, and deliver measurable value. You’ll hear how teams navigate the messy middle, from architecture choices to optimization strategies, to transform AI from a cool demo into a cost-effective production engine. Expect an honest look at real-world trade-offs, engineering challenges, and the solutions that made their AI both powerful and economical.

Excited as well?! We'd love to welcome you for a evening full of knowledge sharing, demos and of course great conversations, networking, and above all a fun evening with the PyData community!

Agenda

  • 18:00 - 19:00: Walk-in with drinks & food
  • 19:00 - 19:45: Talk 1 - Scaling Personalized Push Notifications by Floris Fok
  • 19:45 - 20:00: Short break
  • 20:00 - 20:45: Talk 2 - LLM distillation explained: Make smarter, cheaper, and deployable AI for enterprises by Mashrur Haider
  • 20:45 - 21:30: Networking + drinks & bites

Talk 1 : Scaling Personalized Push Notifications by Floris Fok

This talk explores how we productionize personalized push notifications at scale—moving from proof-of-concept to serving 130 billion tokens per day to nearly half of Brazil's population. We'll share the journey from traditional CRM systems to personalized-powered notifications, covering the data processing pipeline, key architectural decisions, and operational challenges. Learn the trade-offs we navigated between latency and personalization depth, how we achieved a cost per order under 10 cents, and practical insights into productionizing foundation models for commerce.

Floris Fok is a Senior AI Engineer at Prosus Group, specializing in Generative AI. He helped develop Europe's second foundational model, Climate GPT, and has over 4 years of NLP experience spanning technologies from BERT to DeepSeek. Floris played a role in the development of Toqan and has been utilizing it since its early days.

Talk 2 : LLM distillation explained: Make smarter, cheaper, and deployable AI for enterprises by Mashrur Haider

Running large LLMs in production is expensive, but often unnecessary. In this masterclass, Mashrur Haider breaks down how distillation, a popular post-training technique, can cut inference costs by up to 70% while maintaining enterprise-grade performance. You’ll learn how distillation compares to quantization and fine-tuning, seeing real benchmarks. Key takeaways: Distillation 101: How it works and why enterprises use it. Benchmarks: Cost savings without accuracy trade-offs. Workflow: From data prep to deployment on Nebius Token Factory. Scaling: Running distilled models in production with compliance and reliability.

Mashrur Haider is a Tech PM at Nebius AI Studio with a deep healthcare background (BSc Genetics, Stony Brook; MSc Bioinformatics & ML, University of Amsterdam). He’s researched at Netherlands Cancer Institute, worked in Advanced R&D at Philips IGT Systems, and operated a VC-backed techbio startup. At Nebius Token Factory, he translates real customer needs into scalable, user-friendly products aimed at model customisation and dedicated inference.

Directions The venue for this meetup is AI House Amsterdam located at the Prosus Global Headquarters (Gustav Mahlerplein 5, 1082 MS Amsterdam). AI House Amsterdam is conveniently located next to the train station Amsterdam Zuid (3 minutes walking).

Profitability AI: Build it right. Make it fast. Keep it cheap.

📅 Monday\, December 8th \| 🕒 Online Event \| 🎤 Speaker: Ioannis Philippides Join us for the first session of our 4-part DP-600 preparation series, hosted as part of Fabric Data Days! This session lays the foundation for understanding Microsoft Fabric and the engineering components that are essential for both real-world workloads and the DP-600 exam. Whether you're just starting your Fabric journey or preparing seriously for certification, this session will give you a clear, practical, and hands-on starting point.

📚 What we’ll cover in Session 1

1️⃣ Intro to Microsoft Fabric

A guided overview of the Fabric ecosystem, key experiences, and how everything fits together — in a way that finally makes sense.

2️⃣ Get Started with Lakehouses

How Lakehouses work in Fabric, why they matter, and how they serve as the backbone for analytics workloads.

3️⃣ Ingest Data with Dataflows Gen2

Hands-on concepts around building ingestion processes, managing transformations, and preparing data for further engineering.

4️⃣ Orchestrate Processes with Data Factory Pipelines

Build reliable, automated movement and transformation processes using Fabric Data Factory.

5️⃣ Use Spark in Fabric

Understand when, why, and how Spark fits into your end-to-end architecture — with practical examples relevant for the exam.

🎯 Who is this session for?

  • Anyone preparing for DP-600
  • Power BI professionals moving into Fabric
  • Data engineers, analytics engineers, and advanced business analysts
  • Anyone who wants a strong, structured entry into Fabric

🔔 Call to Action

👉 RSVP now to secure your spot! 👉 Share this event with colleagues, friends, and anyone looking to grow their Microsoft Fabric skills. Let’s build the strongest, most collaborative Fabric study community together!

🎓 Get your exam discount voucher Microsoft is offering limited DP-600 exam vouchers. Request yours here: 👉 https://community.fabric.microsoft.com/t5/custom/page/page-id/campaign_form?campaignID=Q0FNUEFJR05fMTc2MTE4OTgxNTQwMg==

🚀 DP-600 Prep Series – Session 1: Intro to Microsoft Fabric

Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Date, Time and Location

Nov 20, 2025 9 AM Pacific Online. Register for the Zoom!

SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

The growing integration of computer vision and machine learning into the retail industry—both online and in physical stores—has driven the adoption of multimodal recommender systems to help users navigate increasingly complex product landscapes. These systems leverage diverse data sources, such as product images, textual descriptions, and user-generated content, to better model user preferences and item characteristics. While the fusion of multimodal data helps address issues like data sparsity and cold-start problems, it also introduces challenges such as information inconsistency, noise, and increased training instability.

In this paper, we analyze these robustness issues through the lens of flat local minima and propose a strategy that incorporates BLIP—a Vision-Language Model with strong denoising capabilities—to mitigate noise in multimodal inputs. Our method, Sharpness-Aware Mirror Gradient with BLIP-Based Denoising (SGBD), is a concise yet effective training strategy that implicitly enhances robustness during optimization. Extensive theoretical and empirical evaluations demonstrate its effectiveness across various multimodal recommendation benchmarks. SGBD offers a scalable solution for improving recommendation performance in real-world retail environments, where noisy, high-dimensional, and fast-evolving product data is the norm, making it a promising paradigm for training robust multi-modal recommender systems in retail industry.

About the Speaker

Kathy Wu holds a Ph.D. in Applied Mathematics and dual M.S. degrees in Computer Science and Quantitative Finance from the University of Southern California (USC), Los Angeles, CA, USA. At USC, she served as a course lecturer, offering ML Foundations and ML for Business Applications in the science school and business school. Her academic research spans high-dimensional statistics, deep learning, and causal inference, etc.

Kathy brings industry experience from Meta, LinkedIn, and Morgan Stanley in the Bay Area and New York City, US, where she focused on AI methodologies and real-world applications. She is currently an Applied Scientist at Amazon, within the Global Store organization, leading projects in E-Commerce Recommendation Systems, Search Engines, Multi-Modal Vision-Language Models (VLMs), and LLM/GenAI in retails.

Her work has been published in top-tier conferences including ICCV, CVPR, ICLR, SIGIR, WACV, etc. At ICCV 2025, she won the Best Paper Award in Retail Vision.

Spatial Mental Modeling from Limited Views

Can VLMs imagine the unobservable space from just a few views, like humans do? Humans form spatial mental models, as internal representations of "unseen space" to reason about layout, perspective, and motion. On our proposed MINDCUBE, we see critical gap systematically on VLMs building robust spatial mental models through representing positions (cognitive mapping), orientations (perspective-taking), and dynamics (mental simulation for ''what-if'' movements). We then explore three approaches to help VLMs approximate spatial mental models, including unseen intermediate views, natural language reasoning chains, and cognitive maps.

The significant improvement comes from ''map-then-reason'' that jointly trains the model to first abstract a cognitive map and then reason upon it. By training models to construct and reason over these internal maps, we boosted accuracy from 37.8% to 60.8% (+23.0%). Adding reinforcement learning pushed performance even further to 70.7% (+32.9%). Our key insight is that such scaffolding of spatial mental models, actively constructing and utilizing internal structured spatial representations with flexible reasoning processes, significantly improves understanding of "unobservable space".

We aim to understand why geometric concepts remain challenging for VLMs and outlining promising research directions towards fostering more robust spatial intelligence.

About the Speaker

Manling Li is an Assistant Professor at Northwestern University and Amazon Scholar. She was a postdoc at Stanford University, and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics, recognized by the MIT TR 35 Under 35, ACL Inaugural Dissertation Award Honorable Mention, ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, Microsoft Research PhD Fellowship, EE CS Rising Star, etc.

Forecasting and Visualizing Air Pollution via Sky Images and VLM-Guided Generative Models

Air pollution monitoring is traditionally limited by costly sensors and sparse data coverage. Our research introduces a vision-language model framework that predicts air quality directly from real-world sky images and also simulates skies under varying pollution levels to enhance interpretability and robustness. We further develop visualization techniques to make predictions more understandable for policymakers and the public. This talk will present our methodology, key findings, and implications for sustainable urban environments.

About the Speaker

Mohammad Saleh Vahdatpour is a PhD candidate in Computer Science at Georgia State University specializing in deep learning, vision–language models, and sustainable AI systems. His research bridges generative AI, environmental monitoring, and motion perception, focusing on scalable and energy-efficient models that connect scientific innovation with real-world impact.

Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Addressing a gap in retail-specific sim environments for embodied agent training, Sari Sandbox features over 250 interactive grocery items across three store configurations, controlled via an API. It supports both virtual reality (VR) for human interaction and a vision language model (VLM)-powered embodied agent.

We also introduce SariBench, a dataset of annotated human demonstrations across varied task difficulties. Our sandbox enables embodied agents to navigate, inspect, and manipulate retail items, providing baselines against human performance. We conclude with benchmarks, performance analysis, and recommendations for enhancing realism and scalability.

About the Speakers

Emmanuel G. Maminta is a fourth-year Artificial Intelligence Ph.D. student at the Ubiquitous Computing Laboratory (UCL) in the University of the Philippines Diliman, advised by Prof. Rowel O. Atienza.

Janika Deborah B.Gajo is an undergraduate student studying for a Bachelor of Science in Computer Engineering at the University of the Philippines, Diliman.

Nov 20 - Best of ICCV (Day 2)

Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Date, Time and Location

Nov 20, 2025 9 AM Pacific Online. Register for the Zoom!

SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

The growing integration of computer vision and machine learning into the retail industry—both online and in physical stores—has driven the adoption of multimodal recommender systems to help users navigate increasingly complex product landscapes. These systems leverage diverse data sources, such as product images, textual descriptions, and user-generated content, to better model user preferences and item characteristics. While the fusion of multimodal data helps address issues like data sparsity and cold-start problems, it also introduces challenges such as information inconsistency, noise, and increased training instability.

In this paper, we analyze these robustness issues through the lens of flat local minima and propose a strategy that incorporates BLIP—a Vision-Language Model with strong denoising capabilities—to mitigate noise in multimodal inputs. Our method, Sharpness-Aware Mirror Gradient with BLIP-Based Denoising (SGBD), is a concise yet effective training strategy that implicitly enhances robustness during optimization. Extensive theoretical and empirical evaluations demonstrate its effectiveness across various multimodal recommendation benchmarks. SGBD offers a scalable solution for improving recommendation performance in real-world retail environments, where noisy, high-dimensional, and fast-evolving product data is the norm, making it a promising paradigm for training robust multi-modal recommender systems in retail industry.

About the Speaker

Kathy Wu holds a Ph.D. in Applied Mathematics and dual M.S. degrees in Computer Science and Quantitative Finance from the University of Southern California (USC), Los Angeles, CA, USA. At USC, she served as a course lecturer, offering ML Foundations and ML for Business Applications in the science school and business school. Her academic research spans high-dimensional statistics, deep learning, and causal inference, etc.

Kathy brings industry experience from Meta, LinkedIn, and Morgan Stanley in the Bay Area and New York City, US, where she focused on AI methodologies and real-world applications. She is currently an Applied Scientist at Amazon, within the Global Store organization, leading projects in E-Commerce Recommendation Systems, Search Engines, Multi-Modal Vision-Language Models (VLMs), and LLM/GenAI in retails.

Her work has been published in top-tier conferences including ICCV, CVPR, ICLR, SIGIR, WACV, etc. At ICCV 2025, she won the Best Paper Award in Retail Vision.

Spatial Mental Modeling from Limited Views

Can VLMs imagine the unobservable space from just a few views, like humans do? Humans form spatial mental models, as internal representations of "unseen space" to reason about layout, perspective, and motion. On our proposed MINDCUBE, we see critical gap systematically on VLMs building robust spatial mental models through representing positions (cognitive mapping), orientations (perspective-taking), and dynamics (mental simulation for ''what-if'' movements). We then explore three approaches to help VLMs approximate spatial mental models, including unseen intermediate views, natural language reasoning chains, and cognitive maps.

The significant improvement comes from ''map-then-reason'' that jointly trains the model to first abstract a cognitive map and then reason upon it. By training models to construct and reason over these internal maps, we boosted accuracy from 37.8% to 60.8% (+23.0%). Adding reinforcement learning pushed performance even further to 70.7% (+32.9%). Our key insight is that such scaffolding of spatial mental models, actively constructing and utilizing internal structured spatial representations with flexible reasoning processes, significantly improves understanding of "unobservable space".

We aim to understand why geometric concepts remain challenging for VLMs and outlining promising research directions towards fostering more robust spatial intelligence.

About the Speaker

Manling Li is an Assistant Professor at Northwestern University and Amazon Scholar. She was a postdoc at Stanford University, and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics, recognized by the MIT TR 35 Under 35, ACL Inaugural Dissertation Award Honorable Mention, ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, Microsoft Research PhD Fellowship, EE CS Rising Star, etc.

Forecasting and Visualizing Air Pollution via Sky Images and VLM-Guided Generative Models

Air pollution monitoring is traditionally limited by costly sensors and sparse data coverage. Our research introduces a vision-language model framework that predicts air quality directly from real-world sky images and also simulates skies under varying pollution levels to enhance interpretability and robustness. We further develop visualization techniques to make predictions more understandable for policymakers and the public. This talk will present our methodology, key findings, and implications for sustainable urban environments.

About the Speaker

Mohammad Saleh Vahdatpour is a PhD candidate in Computer Science at Georgia State University specializing in deep learning, vision–language models, and sustainable AI systems. His research bridges generative AI, environmental monitoring, and motion perception, focusing on scalable and energy-efficient models that connect scientific innovation with real-world impact.

Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Addressing a gap in retail-specific sim environments for embodied agent training, Sari Sandbox features over 250 interactive grocery items across three store configurations, controlled via an API. It supports both virtual reality (VR) for human interaction and a vision language model (VLM)-powered embodied agent.

We also introduce SariBench, a dataset of annotated human demonstrations across varied task difficulties. Our sandbox enables embodied agents to navigate, inspect, and manipulate retail items, providing baselines against human performance. We conclude with benchmarks, performance analysis, and recommendations for enhancing realism and scalability.

About the Speakers

Emmanuel G. Maminta is a fourth-year Artificial Intelligence Ph.D. student at the Ubiquitous Computing Laboratory (UCL) in the University of the Philippines Diliman, advised by Prof. Rowel O. Atienza.

Janika Deborah B.Gajo is an undergraduate student studying for a Bachelor of Science in Computer Engineering at the University of the Philippines, Diliman.

Nov 20 - Best of ICCV (Day 2)

Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Date, Time and Location

Nov 20, 2025 9 AM Pacific Online. Register for the Zoom!

SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

The growing integration of computer vision and machine learning into the retail industry—both online and in physical stores—has driven the adoption of multimodal recommender systems to help users navigate increasingly complex product landscapes. These systems leverage diverse data sources, such as product images, textual descriptions, and user-generated content, to better model user preferences and item characteristics. While the fusion of multimodal data helps address issues like data sparsity and cold-start problems, it also introduces challenges such as information inconsistency, noise, and increased training instability.

In this paper, we analyze these robustness issues through the lens of flat local minima and propose a strategy that incorporates BLIP—a Vision-Language Model with strong denoising capabilities—to mitigate noise in multimodal inputs. Our method, Sharpness-Aware Mirror Gradient with BLIP-Based Denoising (SGBD), is a concise yet effective training strategy that implicitly enhances robustness during optimization. Extensive theoretical and empirical evaluations demonstrate its effectiveness across various multimodal recommendation benchmarks. SGBD offers a scalable solution for improving recommendation performance in real-world retail environments, where noisy, high-dimensional, and fast-evolving product data is the norm, making it a promising paradigm for training robust multi-modal recommender systems in retail industry.

About the Speaker

Kathy Wu holds a Ph.D. in Applied Mathematics and dual M.S. degrees in Computer Science and Quantitative Finance from the University of Southern California (USC), Los Angeles, CA, USA. At USC, she served as a course lecturer, offering ML Foundations and ML for Business Applications in the science school and business school. Her academic research spans high-dimensional statistics, deep learning, and causal inference, etc.

Kathy brings industry experience from Meta, LinkedIn, and Morgan Stanley in the Bay Area and New York City, US, where she focused on AI methodologies and real-world applications. She is currently an Applied Scientist at Amazon, within the Global Store organization, leading projects in E-Commerce Recommendation Systems, Search Engines, Multi-Modal Vision-Language Models (VLMs), and LLM/GenAI in retails.

Her work has been published in top-tier conferences including ICCV, CVPR, ICLR, SIGIR, WACV, etc. At ICCV 2025, she won the Best Paper Award in Retail Vision.

Spatial Mental Modeling from Limited Views

Can VLMs imagine the unobservable space from just a few views, like humans do? Humans form spatial mental models, as internal representations of "unseen space" to reason about layout, perspective, and motion. On our proposed MINDCUBE, we see critical gap systematically on VLMs building robust spatial mental models through representing positions (cognitive mapping), orientations (perspective-taking), and dynamics (mental simulation for ''what-if'' movements). We then explore three approaches to help VLMs approximate spatial mental models, including unseen intermediate views, natural language reasoning chains, and cognitive maps.

The significant improvement comes from ''map-then-reason'' that jointly trains the model to first abstract a cognitive map and then reason upon it. By training models to construct and reason over these internal maps, we boosted accuracy from 37.8% to 60.8% (+23.0%). Adding reinforcement learning pushed performance even further to 70.7% (+32.9%). Our key insight is that such scaffolding of spatial mental models, actively constructing and utilizing internal structured spatial representations with flexible reasoning processes, significantly improves understanding of "unobservable space".

We aim to understand why geometric concepts remain challenging for VLMs and outlining promising research directions towards fostering more robust spatial intelligence.

About the Speaker

Manling Li is an Assistant Professor at Northwestern University and Amazon Scholar. She was a postdoc at Stanford University, and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics, recognized by the MIT TR 35 Under 35, ACL Inaugural Dissertation Award Honorable Mention, ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, Microsoft Research PhD Fellowship, EE CS Rising Star, etc.

Forecasting and Visualizing Air Pollution via Sky Images and VLM-Guided Generative Models

Air pollution monitoring is traditionally limited by costly sensors and sparse data coverage. Our research introduces a vision-language model framework that predicts air quality directly from real-world sky images and also simulates skies under varying pollution levels to enhance interpretability and robustness. We further develop visualization techniques to make predictions more understandable for policymakers and the public. This talk will present our methodology, key findings, and implications for sustainable urban environments.

About the Speaker

Mohammad Saleh Vahdatpour is a PhD candidate in Computer Science at Georgia State University specializing in deep learning, vision–language models, and sustainable AI systems. His research bridges generative AI, environmental monitoring, and motion perception, focusing on scalable and energy-efficient models that connect scientific innovation with real-world impact.

Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Addressing a gap in retail-specific sim environments for embodied agent training, Sari Sandbox features over 250 interactive grocery items across three store configurations, controlled via an API. It supports both virtual reality (VR) for human interaction and a vision language model (VLM)-powered embodied agent.

We also introduce SariBench, a dataset of annotated human demonstrations across varied task difficulties. Our sandbox enables embodied agents to navigate, inspect, and manipulate retail items, providing baselines against human performance. We conclude with benchmarks, performance analysis, and recommendations for enhancing realism and scalability.

About the Speakers

Emmanuel G. Maminta is a fourth-year Artificial Intelligence Ph.D. student at the Ubiquitous Computing Laboratory (UCL) in the University of the Philippines Diliman, advised by Prof. Rowel O. Atienza.

Janika Deborah B.Gajo is an undergraduate student studying for a Bachelor of Science in Computer Engineering at the University of the Philippines, Diliman.

Nov 20 - Best of ICCV (Day 2)

Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Date, Time and Location

Nov 20, 2025 9 AM Pacific Online. Register for the Zoom!

SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

The growing integration of computer vision and machine learning into the retail industry—both online and in physical stores—has driven the adoption of multimodal recommender systems to help users navigate increasingly complex product landscapes. These systems leverage diverse data sources, such as product images, textual descriptions, and user-generated content, to better model user preferences and item characteristics. While the fusion of multimodal data helps address issues like data sparsity and cold-start problems, it also introduces challenges such as information inconsistency, noise, and increased training instability.

In this paper, we analyze these robustness issues through the lens of flat local minima and propose a strategy that incorporates BLIP—a Vision-Language Model with strong denoising capabilities—to mitigate noise in multimodal inputs. Our method, Sharpness-Aware Mirror Gradient with BLIP-Based Denoising (SGBD), is a concise yet effective training strategy that implicitly enhances robustness during optimization. Extensive theoretical and empirical evaluations demonstrate its effectiveness across various multimodal recommendation benchmarks. SGBD offers a scalable solution for improving recommendation performance in real-world retail environments, where noisy, high-dimensional, and fast-evolving product data is the norm, making it a promising paradigm for training robust multi-modal recommender systems in retail industry.

About the Speaker

Kathy Wu holds a Ph.D. in Applied Mathematics and dual M.S. degrees in Computer Science and Quantitative Finance from the University of Southern California (USC), Los Angeles, CA, USA. At USC, she served as a course lecturer, offering ML Foundations and ML for Business Applications in the science school and business school. Her academic research spans high-dimensional statistics, deep learning, and causal inference, etc.

Kathy brings industry experience from Meta, LinkedIn, and Morgan Stanley in the Bay Area and New York City, US, where she focused on AI methodologies and real-world applications. She is currently an Applied Scientist at Amazon, within the Global Store organization, leading projects in E-Commerce Recommendation Systems, Search Engines, Multi-Modal Vision-Language Models (VLMs), and LLM/GenAI in retails.

Her work has been published in top-tier conferences including ICCV, CVPR, ICLR, SIGIR, WACV, etc. At ICCV 2025, she won the Best Paper Award in Retail Vision.

Spatial Mental Modeling from Limited Views

Can VLMs imagine the unobservable space from just a few views, like humans do? Humans form spatial mental models, as internal representations of "unseen space" to reason about layout, perspective, and motion. On our proposed MINDCUBE, we see critical gap systematically on VLMs building robust spatial mental models through representing positions (cognitive mapping), orientations (perspective-taking), and dynamics (mental simulation for ''what-if'' movements). We then explore three approaches to help VLMs approximate spatial mental models, including unseen intermediate views, natural language reasoning chains, and cognitive maps.

The significant improvement comes from ''map-then-reason'' that jointly trains the model to first abstract a cognitive map and then reason upon it. By training models to construct and reason over these internal maps, we boosted accuracy from 37.8% to 60.8% (+23.0%). Adding reinforcement learning pushed performance even further to 70.7% (+32.9%). Our key insight is that such scaffolding of spatial mental models, actively constructing and utilizing internal structured spatial representations with flexible reasoning processes, significantly improves understanding of "unobservable space".

We aim to understand why geometric concepts remain challenging for VLMs and outlining promising research directions towards fostering more robust spatial intelligence.

About the Speaker

Manling Li is an Assistant Professor at Northwestern University and Amazon Scholar. She was a postdoc at Stanford University, and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics, recognized by the MIT TR 35 Under 35, ACL Inaugural Dissertation Award Honorable Mention, ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, Microsoft Research PhD Fellowship, EE CS Rising Star, etc.

Forecasting and Visualizing Air Pollution via Sky Images and VLM-Guided Generative Models

Air pollution monitoring is traditionally limited by costly sensors and sparse data coverage. Our research introduces a vision-language model framework that predicts air quality directly from real-world sky images and also simulates skies under varying pollution levels to enhance interpretability and robustness. We further develop visualization techniques to make predictions more understandable for policymakers and the public. This talk will present our methodology, key findings, and implications for sustainable urban environments.

About the Speaker

Mohammad Saleh Vahdatpour is a PhD candidate in Computer Science at Georgia State University specializing in deep learning, vision–language models, and sustainable AI systems. His research bridges generative AI, environmental monitoring, and motion perception, focusing on scalable and energy-efficient models that connect scientific innovation with real-world impact.

Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Addressing a gap in retail-specific sim environments for embodied agent training, Sari Sandbox features over 250 interactive grocery items across three store configurations, controlled via an API. It supports both virtual reality (VR) for human interaction and a vision language model (VLM)-powered embodied agent.

We also introduce SariBench, a dataset of annotated human demonstrations across varied task difficulties. Our sandbox enables embodied agents to navigate, inspect, and manipulate retail items, providing baselines against human performance. We conclude with benchmarks, performance analysis, and recommendations for enhancing realism and scalability.

About the Speakers

Emmanuel G. Maminta is a fourth-year Artificial Intelligence Ph.D. student at the Ubiquitous Computing Laboratory (UCL) in the University of the Philippines Diliman, advised by Prof. Rowel O. Atienza.

Janika Deborah B.Gajo is an undergraduate student studying for a Bachelor of Science in Computer Engineering at the University of the Philippines, Diliman.

Nov 20 - Best of ICCV (Day 2)

Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Date, Time and Location

Nov 20, 2025 9 AM Pacific Online. Register for the Zoom!

SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

The growing integration of computer vision and machine learning into the retail industry—both online and in physical stores—has driven the adoption of multimodal recommender systems to help users navigate increasingly complex product landscapes. These systems leverage diverse data sources, such as product images, textual descriptions, and user-generated content, to better model user preferences and item characteristics. While the fusion of multimodal data helps address issues like data sparsity and cold-start problems, it also introduces challenges such as information inconsistency, noise, and increased training instability.

In this paper, we analyze these robustness issues through the lens of flat local minima and propose a strategy that incorporates BLIP—a Vision-Language Model with strong denoising capabilities—to mitigate noise in multimodal inputs. Our method, Sharpness-Aware Mirror Gradient with BLIP-Based Denoising (SGBD), is a concise yet effective training strategy that implicitly enhances robustness during optimization. Extensive theoretical and empirical evaluations demonstrate its effectiveness across various multimodal recommendation benchmarks. SGBD offers a scalable solution for improving recommendation performance in real-world retail environments, where noisy, high-dimensional, and fast-evolving product data is the norm, making it a promising paradigm for training robust multi-modal recommender systems in retail industry.

About the Speaker

Kathy Wu holds a Ph.D. in Applied Mathematics and dual M.S. degrees in Computer Science and Quantitative Finance from the University of Southern California (USC), Los Angeles, CA, USA. At USC, she served as a course lecturer, offering ML Foundations and ML for Business Applications in the science school and business school. Her academic research spans high-dimensional statistics, deep learning, and causal inference, etc.

Kathy brings industry experience from Meta, LinkedIn, and Morgan Stanley in the Bay Area and New York City, US, where she focused on AI methodologies and real-world applications. She is currently an Applied Scientist at Amazon, within the Global Store organization, leading projects in E-Commerce Recommendation Systems, Search Engines, Multi-Modal Vision-Language Models (VLMs), and LLM/GenAI in retails.

Her work has been published in top-tier conferences including ICCV, CVPR, ICLR, SIGIR, WACV, etc. At ICCV 2025, she won the Best Paper Award in Retail Vision.

Spatial Mental Modeling from Limited Views

Can VLMs imagine the unobservable space from just a few views, like humans do? Humans form spatial mental models, as internal representations of "unseen space" to reason about layout, perspective, and motion. On our proposed MINDCUBE, we see critical gap systematically on VLMs building robust spatial mental models through representing positions (cognitive mapping), orientations (perspective-taking), and dynamics (mental simulation for ''what-if'' movements). We then explore three approaches to help VLMs approximate spatial mental models, including unseen intermediate views, natural language reasoning chains, and cognitive maps.

The significant improvement comes from ''map-then-reason'' that jointly trains the model to first abstract a cognitive map and then reason upon it. By training models to construct and reason over these internal maps, we boosted accuracy from 37.8% to 60.8% (+23.0%). Adding reinforcement learning pushed performance even further to 70.7% (+32.9%). Our key insight is that such scaffolding of spatial mental models, actively constructing and utilizing internal structured spatial representations with flexible reasoning processes, significantly improves understanding of "unobservable space".

We aim to understand why geometric concepts remain challenging for VLMs and outlining promising research directions towards fostering more robust spatial intelligence.

About the Speaker

Manling Li is an Assistant Professor at Northwestern University and Amazon Scholar. She was a postdoc at Stanford University, and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics, recognized by the MIT TR 35 Under 35, ACL Inaugural Dissertation Award Honorable Mention, ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, Microsoft Research PhD Fellowship, EE CS Rising Star, etc.

Forecasting and Visualizing Air Pollution via Sky Images and VLM-Guided Generative Models

Air pollution monitoring is traditionally limited by costly sensors and sparse data coverage. Our research introduces a vision-language model framework that predicts air quality directly from real-world sky images and also simulates skies under varying pollution levels to enhance interpretability and robustness. We further develop visualization techniques to make predictions more understandable for policymakers and the public. This talk will present our methodology, key findings, and implications for sustainable urban environments.

About the Speaker

Mohammad Saleh Vahdatpour is a PhD candidate in Computer Science at Georgia State University specializing in deep learning, vision–language models, and sustainable AI systems. His research bridges generative AI, environmental monitoring, and motion perception, focusing on scalable and energy-efficient models that connect scientific innovation with real-world impact.

Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Addressing a gap in retail-specific sim environments for embodied agent training, Sari Sandbox features over 250 interactive grocery items across three store configurations, controlled via an API. It supports both virtual reality (VR) for human interaction and a vision language model (VLM)-powered embodied agent.

We also introduce SariBench, a dataset of annotated human demonstrations across varied task difficulties. Our sandbox enables embodied agents to navigate, inspect, and manipulate retail items, providing baselines against human performance. We conclude with benchmarks, performance analysis, and recommendations for enhancing realism and scalability.

About the Speakers

Emmanuel G. Maminta is a fourth-year Artificial Intelligence Ph.D. student at the Ubiquitous Computing Laboratory (UCL) in the University of the Philippines Diliman, advised by Prof. Rowel O. Atienza.

Janika Deborah B.Gajo is an undergraduate student studying for a Bachelor of Science in Computer Engineering at the University of the Philippines, Diliman.

Nov 20 - Best of ICCV (Day 2)

Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Date, Time and Location

Nov 20, 2025 9 AM Pacific Online. Register for the Zoom!

SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

The growing integration of computer vision and machine learning into the retail industry—both online and in physical stores—has driven the adoption of multimodal recommender systems to help users navigate increasingly complex product landscapes. These systems leverage diverse data sources, such as product images, textual descriptions, and user-generated content, to better model user preferences and item characteristics. While the fusion of multimodal data helps address issues like data sparsity and cold-start problems, it also introduces challenges such as information inconsistency, noise, and increased training instability.

In this paper, we analyze these robustness issues through the lens of flat local minima and propose a strategy that incorporates BLIP—a Vision-Language Model with strong denoising capabilities—to mitigate noise in multimodal inputs. Our method, Sharpness-Aware Mirror Gradient with BLIP-Based Denoising (SGBD), is a concise yet effective training strategy that implicitly enhances robustness during optimization. Extensive theoretical and empirical evaluations demonstrate its effectiveness across various multimodal recommendation benchmarks. SGBD offers a scalable solution for improving recommendation performance in real-world retail environments, where noisy, high-dimensional, and fast-evolving product data is the norm, making it a promising paradigm for training robust multi-modal recommender systems in retail industry.

About the Speaker

Kathy Wu holds a Ph.D. in Applied Mathematics and dual M.S. degrees in Computer Science and Quantitative Finance from the University of Southern California (USC), Los Angeles, CA, USA. At USC, she served as a course lecturer, offering ML Foundations and ML for Business Applications in the science school and business school. Her academic research spans high-dimensional statistics, deep learning, and causal inference, etc.

Kathy brings industry experience from Meta, LinkedIn, and Morgan Stanley in the Bay Area and New York City, US, where she focused on AI methodologies and real-world applications. She is currently an Applied Scientist at Amazon, within the Global Store organization, leading projects in E-Commerce Recommendation Systems, Search Engines, Multi-Modal Vision-Language Models (VLMs), and LLM/GenAI in retails.

Her work has been published in top-tier conferences including ICCV, CVPR, ICLR, SIGIR, WACV, etc. At ICCV 2025, she won the Best Paper Award in Retail Vision.

Spatial Mental Modeling from Limited Views

Can VLMs imagine the unobservable space from just a few views, like humans do? Humans form spatial mental models, as internal representations of "unseen space" to reason about layout, perspective, and motion. On our proposed MINDCUBE, we see critical gap systematically on VLMs building robust spatial mental models through representing positions (cognitive mapping), orientations (perspective-taking), and dynamics (mental simulation for ''what-if'' movements). We then explore three approaches to help VLMs approximate spatial mental models, including unseen intermediate views, natural language reasoning chains, and cognitive maps.

The significant improvement comes from ''map-then-reason'' that jointly trains the model to first abstract a cognitive map and then reason upon it. By training models to construct and reason over these internal maps, we boosted accuracy from 37.8% to 60.8% (+23.0%). Adding reinforcement learning pushed performance even further to 70.7% (+32.9%). Our key insight is that such scaffolding of spatial mental models, actively constructing and utilizing internal structured spatial representations with flexible reasoning processes, significantly improves understanding of "unobservable space".

We aim to understand why geometric concepts remain challenging for VLMs and outlining promising research directions towards fostering more robust spatial intelligence.

About the Speaker

Manling Li is an Assistant Professor at Northwestern University and Amazon Scholar. She was a postdoc at Stanford University, and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics, recognized by the MIT TR 35 Under 35, ACL Inaugural Dissertation Award Honorable Mention, ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, Microsoft Research PhD Fellowship, EE CS Rising Star, etc.

Forecasting and Visualizing Air Pollution via Sky Images and VLM-Guided Generative Models

Air pollution monitoring is traditionally limited by costly sensors and sparse data coverage. Our research introduces a vision-language model framework that predicts air quality directly from real-world sky images and also simulates skies under varying pollution levels to enhance interpretability and robustness. We further develop visualization techniques to make predictions more understandable for policymakers and the public. This talk will present our methodology, key findings, and implications for sustainable urban environments.

About the Speaker

Mohammad Saleh Vahdatpour is a PhD candidate in Computer Science at Georgia State University specializing in deep learning, vision–language models, and sustainable AI systems. His research bridges generative AI, environmental monitoring, and motion perception, focusing on scalable and energy-efficient models that connect scientific innovation with real-world impact.

Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Addressing a gap in retail-specific sim environments for embodied agent training, Sari Sandbox features over 250 interactive grocery items across three store configurations, controlled via an API. It supports both virtual reality (VR) for human interaction and a vision language model (VLM)-powered embodied agent.

We also introduce SariBench, a dataset of annotated human demonstrations across varied task difficulties. Our sandbox enables embodied agents to navigate, inspect, and manipulate retail items, providing baselines against human performance. We conclude with benchmarks, performance analysis, and recommendations for enhancing realism and scalability.

About the Speakers

Emmanuel G. Maminta is a fourth-year Artificial Intelligence Ph.D. student at the Ubiquitous Computing Laboratory (UCL) in the University of the Philippines Diliman, advised by Prof. Rowel O. Atienza.

Janika Deborah B.Gajo is an undergraduate student studying for a Bachelor of Science in Computer Engineering at the University of the Philippines, Diliman.

Nov 20 - Best of ICCV (Day 2)

Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Date, Time and Location

Nov 20, 2025 9 AM Pacific Online. Register for the Zoom!

SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

The growing integration of computer vision and machine learning into the retail industry—both online and in physical stores—has driven the adoption of multimodal recommender systems to help users navigate increasingly complex product landscapes. These systems leverage diverse data sources, such as product images, textual descriptions, and user-generated content, to better model user preferences and item characteristics. While the fusion of multimodal data helps address issues like data sparsity and cold-start problems, it also introduces challenges such as information inconsistency, noise, and increased training instability.

In this paper, we analyze these robustness issues through the lens of flat local minima and propose a strategy that incorporates BLIP—a Vision-Language Model with strong denoising capabilities—to mitigate noise in multimodal inputs. Our method, Sharpness-Aware Mirror Gradient with BLIP-Based Denoising (SGBD), is a concise yet effective training strategy that implicitly enhances robustness during optimization. Extensive theoretical and empirical evaluations demonstrate its effectiveness across various multimodal recommendation benchmarks. SGBD offers a scalable solution for improving recommendation performance in real-world retail environments, where noisy, high-dimensional, and fast-evolving product data is the norm, making it a promising paradigm for training robust multi-modal recommender systems in retail industry.

About the Speaker

Kathy Wu holds a Ph.D. in Applied Mathematics and dual M.S. degrees in Computer Science and Quantitative Finance from the University of Southern California (USC), Los Angeles, CA, USA. At USC, she served as a course lecturer, offering ML Foundations and ML for Business Applications in the science school and business school. Her academic research spans high-dimensional statistics, deep learning, and causal inference, etc.

Kathy brings industry experience from Meta, LinkedIn, and Morgan Stanley in the Bay Area and New York City, US, where she focused on AI methodologies and real-world applications. She is currently an Applied Scientist at Amazon, within the Global Store organization, leading projects in E-Commerce Recommendation Systems, Search Engines, Multi-Modal Vision-Language Models (VLMs), and LLM/GenAI in retails.

Her work has been published in top-tier conferences including ICCV, CVPR, ICLR, SIGIR, WACV, etc. At ICCV 2025, she won the Best Paper Award in Retail Vision.

Spatial Mental Modeling from Limited Views

Can VLMs imagine the unobservable space from just a few views, like humans do? Humans form spatial mental models, as internal representations of "unseen space" to reason about layout, perspective, and motion. On our proposed MINDCUBE, we see critical gap systematically on VLMs building robust spatial mental models through representing positions (cognitive mapping), orientations (perspective-taking), and dynamics (mental simulation for ''what-if'' movements). We then explore three approaches to help VLMs approximate spatial mental models, including unseen intermediate views, natural language reasoning chains, and cognitive maps.

The significant improvement comes from ''map-then-reason'' that jointly trains the model to first abstract a cognitive map and then reason upon it. By training models to construct and reason over these internal maps, we boosted accuracy from 37.8% to 60.8% (+23.0%). Adding reinforcement learning pushed performance even further to 70.7% (+32.9%). Our key insight is that such scaffolding of spatial mental models, actively constructing and utilizing internal structured spatial representations with flexible reasoning processes, significantly improves understanding of "unobservable space".

We aim to understand why geometric concepts remain challenging for VLMs and outlining promising research directions towards fostering more robust spatial intelligence.

About the Speaker

Manling Li is an Assistant Professor at Northwestern University and Amazon Scholar. She was a postdoc at Stanford University, and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics, recognized by the MIT TR 35 Under 35, ACL Inaugural Dissertation Award Honorable Mention, ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, Microsoft Research PhD Fellowship, EE CS Rising Star, etc.

Forecasting and Visualizing Air Pollution via Sky Images and VLM-Guided Generative Models

Air pollution monitoring is traditionally limited by costly sensors and sparse data coverage. Our research introduces a vision-language model framework that predicts air quality directly from real-world sky images and also simulates skies under varying pollution levels to enhance interpretability and robustness. We further develop visualization techniques to make predictions more understandable for policymakers and the public. This talk will present our methodology, key findings, and implications for sustainable urban environments.

About the Speaker

Mohammad Saleh Vahdatpour is a PhD candidate in Computer Science at Georgia State University specializing in deep learning, vision–language models, and sustainable AI systems. His research bridges generative AI, environmental monitoring, and motion perception, focusing on scalable and energy-efficient models that connect scientific innovation with real-world impact.

Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Addressing a gap in retail-specific sim environments for embodied agent training, Sari Sandbox features over 250 interactive grocery items across three store configurations, controlled via an API. It supports both virtual reality (VR) for human interaction and a vision language model (VLM)-powered embodied agent.

We also introduce SariBench, a dataset of annotated human demonstrations across varied task difficulties. Our sandbox enables embodied agents to navigate, inspect, and manipulate retail items, providing baselines against human performance. We conclude with benchmarks, performance analysis, and recommendations for enhancing realism and scalability.

About the Speakers

Emmanuel G. Maminta is a fourth-year Artificial Intelligence Ph.D. student at the Ubiquitous Computing Laboratory (UCL) in the University of the Philippines Diliman, advised by Prof. Rowel O. Atienza.

Janika Deborah B.Gajo is an undergraduate student studying for a Bachelor of Science in Computer Engineering at the University of the Philippines, Diliman.

Nov 20 - Best of ICCV (Day 2)

Venue: Carnival House, 100 Harbour Parade, Southampton, SO15 1ST 📢 Want to speak 📢: submit your talk proposal

Main Talks 1️⃣ Unlocking the Black Box: Demystifying ML Models with Shapley Values - Philip Le Model explainability is key to help us build trust and enable decision-making with ML/AI models. We will dig deeper into the theoretical background of the Shapley value to help us address the complexity and bias challenges of black box models. There will be several examples to showcase the practical side of how we could use these techniques in practice.

2️⃣ Digital attribution model using LSTM - Vinoth Chelladurai How we can use LSTM technique to develop click based digital attribution model to estimate assisted orders for each of the digital activities. This modelling will enable Marketing teams to optimise better to drive better ROI

⚡Lightning Talks ⚡ 1️⃣ "AI" - Professional liars? - Marcus Toy 2️⃣ TBA

Please note:

  1. 🚨🚨🚨A valid photo ID is required by building security. You MUST use your initial/first name and surname on your meetup profile, otherwise, you will NOT make it on the guest list! 🚨🚨🚨
  2. This event follows the NumFOCUS Code of Conduct, please familiarise yourself with it before the event.

If your RSVP status says "You're going" you will be able to get in. No further confirmation required. You will NOT need to show your RSVP confirmation when signing in. If you can no longer make it, please unRSVP as soon as you know so we can assign your place to someone on the waiting list.

*** Code of Conduct: This event follows the NumFOCUS Code of Conduct, please familiarise yourself with it before the event. Please get in touch with the organisers with any questions or concerns regarding the Code of Conduct. *** There will be pizza & drinks, generously provided by our host, Carnival UK. ***

Logistics Doors open at 6.30 pm, talks start at 7 pm. For those who wish to continue networking and chatting we will move to a nearby pub/bar for drinks from 9 pm.

Please unRSVP in good time if you realise you can't make it. We're limited by building security on the number of attendees, so please free up your place for your fellow community members!

Follow us for updates and early announcements we are on Bluesky/Instagram/Threads as @pydatasoton, and find us on LinkedIn.

PyData Southampton - 20th Meetup
Event AWS AI In Practice #2 2025-11-12
Phil Basford – Senior Director @ Cognizant UK&I Consulting

The insurance industry has billions of historical documents; with hundreds of thousands more being generated every day. The documents, in varying formats are used both internally and with other insurers to agree terms, assess risk and create accurate quotes. Historically each document can take a few hours or even days to manually process and needs to be loaded into each company’s systems. AI is helping several companies to reduce this processing time to minutes by automating the process using Intelligent Document Processing (IDP), saving time, increasing the accuracy, and readying the data for further analysis - giving valuable insights back to the business. IDP uses the latest AI services; Amazon’s Bedrock, Textract to extract text, Amazon Comprehend to classify and detect entries in the documents and custom models with labelled grown truth in SageMaker. This architecture will look at these services alongside the pre-processing, processing, and post processing challenges and showcase how to jointly leverage these services for the best success. Using IDP, one customer achieved an accuracy of 90%+ and a more than 500x reduction in processing time across over £500 million worth of business

amazon bedrock textract amazon comprehend amazon sagemaker
Ashraful Alam – AWS Cloud Technical Lead @ Legal and General

Amazon Q is powerful out of the box, but in the terminal it becomes a superpower. In this talk, I share how I turned Amazon Q CLI into a fully-fledged, verification‑first AI platform using the Model Context Protocol (MCP) all without adding $0 extra cost. I’ll walk through how I built 49 MCP servers to remove hallucinations, browser automation with Playwright, and live AWS verification that eliminates hallucinations in practice. You’ll see Financial Services Institution grade guardrails (production protection, cost controls, and full audit trails) that makes Q feel native to a builder’s workflow, secure and reliable. You’ll leave with a blueprint to extend Amazon Q/ any agentic coding beyond chat into a trusted, terminal‑first, agentic platform you own and can evolve infinitely.

amazon q mcps Playwright aws verification guardrails

REGISTRATION at https://analytics-pioneers.com/community-trainings


Topics:

  • Introduction & Core Concept Server-side GTM
  • Standard Setup GA4
  • Quick and Dirty: GA4 Dummy
  • Creating your own custom tags & clients
  • Further development using examples
  • Pitfalls & limitations

Date: Wednesday\, November 12th\, 2025 \| 10am-12pm (GMT) Trainer: Philipp Abendroth & Beatrix Stade with Marcus Stade & Patrick Mohr

The training is live and will be held in English. It is not recorded.


SIGN UP

Please register for the training here free of charge: https://analytics-pioneers.com/community-trainings

Custom Tracking Solution with Server-Side GTM || FREE Community Training

👉 Réservez votre place ici

Face à la pénurie de talents tech et à la montée en puissance de l’IA, l’inclusion est un levier stratégique pour innover et attirer/fidéliser talents et clients.

Nous aborderons les mécanismes RH, les structures de gouvernance et les formats de formation qui permettent d’inclure durablement tous les profils au service du business… et le rôle de la tech dans cette dynamique.

-------------- 🎤 Speakers

  • Marion Ranvier, Executive Director of the ContentSquare Foundation
  • Philippe Trotin, Directeur de la mission Handicap et Accessibilité Numérique de Microsoft
  • Caroline Pujo, Business Relationship Manager Sodexo Live! & Lenôtre
  • Sarah Huet, founder AFemaleAgency

⚠️ Pour confirmer votre inscription, assurez-vous de remplir le formulaire sur cette page avant que l’événement ne soit complet ! Nous privilégions un format intimiste avec un nombre de places limitées pour garantir la qualité des échanges entre pairs.

De l’ambition D&I à l’impact tech