talk-data.com
People (61 results)
See all 61 →Frannie Helforoush
Senior Digital Product Manager · RBC Global Asset Management
Jim Sterne
guest · Board Chair, Digital Analytics Association - USA
Gabriella Kusz
Global Public Policy and Management Consultant · Global Digital Asset and Cryptocurrency Association
Companies (2 results)
Activities & events
| Title & Speakers | Event |
|---|---|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
#330 Harnessing AI to Help Humanity with Professor Sandy Pentland, HAI Fellow at Stanford, Co-founder of MIT Media Lab
2025-11-10 · 10:00
Richie
– host
@ DataCamp
,
Alex “Sandy” Pentland
– Professor
@ MIT Media Lab
Data storytelling isn't just about presenting numbers—it's about creating shared wisdom that drives better decision-making. In our increasingly polarized world, we often miss that most people actually have reasonable views hidden behind the loudest voices. But how can technology help us cut through the noise and build genuine understanding? What if AI could help us share stories across different communities and contexts, making our collective knowledge more accessible? From reducing unnecessary meetings to enabling more effective collaboration, the way we exchange information is evolving rapidly. Are you prepared for a future where AI helps us communicate more effectively rather than replacing human judgment? Professor Alex “Sandy” Pentland is a leading computational scientist, co-founder of the MIT Media Lab and Media Lab Asia, and a HAI Fellow at Stanford. Recognized by Forbes as one of the world’s most powerful data scientists, he played a key role in shaping the GDPR through the World Economic Forum and contributed to the UN’s Sustainable Development Goals as one of the Secretary General’s “Data Revolutionaries.” His accolades include MIT’s Toshiba Chair, election to the U.S. National Academy of Engineering, the Harvard Business Review McKinsey Award, and the DARPA 40th Anniversary of the Internet Award. Pentland has served on advisory boards for organizations such as the UN Secretary General, UN Foundation, Consumers Union, and formerly for the OECD, Google, AT&T, and Nissan. Companies originating from his lab have driven major innovations, including India’s Aadhaar digital identity system, Alibaba’s news and advertising arm, and the world’s largest rural health service network. His more recent ventures span mental health (Ginger.io), AI interaction management (Cogito), delivery optimization (Wise Systems), financial privacy (Akoya), and fairness in social services (Prosperia). A mentor to over 80 PhD students—many now leading in academia, research, or entrepreneurship—Pentland helped pioneer fields such as computational social science, wearable computing, and modern biometrics. His books include Social Physics, Honest Signals, Building the New Economy, and Trusted Data. In the episode, Richie and Sandy explore the role of storytelling in data and AI, how technology reshapes our narratives, the impact of AI on decision-making, the importance of shared wisdom in communities, and much more. Links Mentioned in the Show: MIT Media LabSandy’s Booksdeliberation.ioConnect with SandySkill Track: Artificial Intelligence (AI) LeadershipRelated Episode: The Human Element of AI-Driven Transformation with Steve Lucas, CEO at BoomiRewatch RADAR AI New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business |
DataFramed |
|
NeuroTech x Innovation Afterwork: Meet Chinese Top 100 Entrepreneur
2025-11-07 · 18:00
Join us for an exclusive DeepTech Afterwork in the heart of Paris, dedicated to the future of Brain–Computer Interfaces (BCI) and human–technology interaction. Our special guest, Sun Yu, founder & CEO of Flexolink, is a pioneer in non-invasive neurotechnology and one of the Forbes China Top 100 influential innovators. His team’s work in soft neuro-electronic sleep patches and neural signal analysis is redefining the boundaries between neuroscience, AI, and digital health. This event will bring together entrepreneurs, researchers, and investors passionate about DeepTech, AI, NeuroTech, and Human Augmentation — in an informal yet high-level networking setting. 🗓 Date & Time: Friday, 19:00 – Late 📍 Location: registration closed 🗣 Languages: English & French 🥂 Format: Short talk + open discussion + networking 📞 Contact:** +33 07 80 81 46 79 (by SMS or Whatsapp pls) Let’s explore how the next generation of brain–machine interfaces can bridge science, entrepreneurship, and human experience. |
NeuroTech x Innovation Afterwork: Meet Chinese Top 100 Entrepreneur
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 2 - Women in AI Virtual Event
2025-10-02 · 16:00
Hear talks from experts on the latest topics in AI, ML, and computer vision. Date and Time Oct 2 at 9 AM Pacific Location Virtual. Register for the Zoom. The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems. About the Speaker Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones. Managing Medical Imaging Datasets: From Curation to Evaluation High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment. We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities. Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Building Agents That Learn: Managing Memory in AI Agents In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time. About the Speaker Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops. Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments. About the Speaker Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals. |
Oct 2 - Women in AI Virtual Event
|
|
Oct 2 - Women in AI Virtual Event
2025-10-02 · 16:00
Hear talks from experts on the latest topics in AI, ML, and computer vision. Date and Time Oct 2 at 9 AM Pacific Location Virtual. Register for the Zoom. The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems. About the Speaker Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones. Managing Medical Imaging Datasets: From Curation to Evaluation High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment. We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities. Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Building Agents That Learn: Managing Memory in AI Agents In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time. About the Speaker Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops. Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments. About the Speaker Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals. |
Oct 2 - Women in AI Virtual Event
|
|
Oct 2 - Women in AI Virtual Event
2025-10-02 · 16:00
Hear talks from experts on the latest topics in AI, ML, and computer vision. Date and Time Oct 2 at 9 AM Pacific Location Virtual. Register for the Zoom. The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems. About the Speaker Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones. Managing Medical Imaging Datasets: From Curation to Evaluation High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment. We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities. Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Building Agents That Learn: Managing Memory in AI Agents In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time. About the Speaker Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops. Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments. About the Speaker Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals. |
Oct 2 - Women in AI Virtual Event
|
|
Oct 2 - Women in AI Virtual Event
2025-10-02 · 16:00
Hear talks from experts on the latest topics in AI, ML, and computer vision. Date and Time Oct 2 at 9 AM Pacific Location Virtual. Register for the Zoom. The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems. About the Speaker Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones. Managing Medical Imaging Datasets: From Curation to Evaluation High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment. We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities. Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Building Agents That Learn: Managing Memory in AI Agents In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time. About the Speaker Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops. Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments. About the Speaker Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals. |
Oct 2 - Women in AI Virtual Event
|