talk-data.com
Activities & events
| Title & Speakers | Event |
|---|---|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Nov 19 - Best of ICCV (Day 1)
2025-11-19 · 17:00
Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Date, Time and Location Nov 19, 2025 9 AM Pacific Online. Register for the Zoom! AnimalClue: Recognizing Animals by their Traces Wildlife observation plays an important role in biodiversity conservation, necessitating robust methodologies for monitoring wildlife populations and interspecies interactions. Recent advances in computer vision have significantly contributed to automating fundamental wildlife observation tasks, such as animal detection and species identification. However, accurately identifying species from indirect evidence like footprints and feces remains relatively underexplored, despite its importance in contributing to wildlife monitoring. To bridge this gap, we introduce AnimalClue, the first large-scale dataset for species identification from images of indirect evidence. Our dataset consists of 159,605 bounding boxes encompassing five categories of indirect clues: footprints, feces, eggs, bones, and feathers. It covers 968 species, 200 families, and 65 orders. Each image is annotated with species-level labels, bounding boxes or segmentation masks, and fine-grained trait information, including activity patterns and habitat preferences. Unlike existing datasets primarily focused on direct visual features (e.g., animal appearances), AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks due to the need for recognizing more detailed and subtle visual features. In our experiments, we extensively evaluate representative vision models and identify key challenges in animal identification from their traces. About the Speaker Risa Shinoda received her M.S. and Ph.D. in Agricultural Science from Kyoto University in 2022 and 2025. Since April 2025, she has been serving as a Specially Appointed Assistant Professor at the Graduate School of Information Science and Technology, the University of Osaka. She is engaged in research on the application of image recognition to plants and animals, as well as vision-language models. LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the diffusion model’s multi-step denoising process. To validate our method, we build on Fashionpedia to release Sketchy, the first fashion dataset where multiple text-sketch pairs are provided per image. Quantitative results show LOTS achieves state-of-the-art image generation performance on both global and localized metrics, while qualitative examples and a human evaluation study highlight its unprecedented level of design customization. About the Speaker Federico Girella is a third-year Ph.D. student at the University of Verona (Italy), supervised by Prof. Marco Cristani, with expected graduation in May 2026. His research involves joint representations in the Image and Language multi-modal domain, working with deep neural networks such as (Large) Vision and Language Models and Text-to-Image Generative Models. His main body of work focuses on Text-to-Image Retrieval and Generation in the Fashion domain. ProtoMedX: Explainable Multi-Modal Prototype Learning for Bone Health Assessment Early detection of osteoporosis and osteopenia is critical, yet most AI models for bone health rely solely on imaging and offer little transparency into their decisions. In this talk, I will present ProtoMedX, the first prototype-based framework that combines lumbar spine DEXA scans with patient clinical records to deliver accurate and inherently explainable predictions. Unlike black-box deep networks, ProtoMedX classifies patients by comparing them to learned case-based prototypes, mirroring how clinicians reason in practice. Our method not only achieves state-of-the-art accuracy on a real NHS dataset of 4,160 patients but also provides clear, interpretable explanations aligned with the upcoming EU AI Act requirements for high-risk medical AI. Beyond bone health, this work illustrates how prototype learning can make multi-modal AI both powerful and transparent, offering a blueprint for other safety-critical domains. About the Speaker Alvaro Lopez is a PhD candidate in Explainable AI at Lancaster University and an AI Research Associate at J.P. Morgan in London. His research focuses on prototype-based learning, multi-modal AI, and AI security. He has led projects on medical AI, fraud detection, and adversarial robustness, with applications ranging from healthcare to financial systems. CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image Segmentation We introduce CLASP (Clustering via Adaptive Spectral Processing), a lightweight framework for unsupervised image segmentation that operates without any labeled data or fine-tuning. CLASP first extracts per-patch features using a self-supervised ViT encoder (DINO); then, it builds an affinity matrix and applies spectral clustering. To avoid manual tuning, we select the segment count automatically with a eigengap-silhouette search, and we sharpen the boundaries with a fully connected DenseCRF. Despite its simplicity and training-free nature, CLASP attains competitive mIoU and pixel-accuracy on COCO-Stuff and ADE20K, matching recent unsupervised baselines. The zero-training design makes CLASP a strong, easily reproducible baseline for large unannotated corpora—especially common in digital advertising and marketing workflows such as brand-safety screening, creative asset curation, and social-media content moderation. About the Speaker Max Curie is a Research Scientist at Integral Ad Science, building fast, lightweight solutions for brand safety, multi-media classification, and recommendation systems. As a former nuclear physicist at Princeton University, he brings rigorous analytical thinking and modeling discipline from his physics background to advance ad tech. |
Nov 19 - Best of ICCV (Day 1)
|
|
Oct 16 - Visual AI in Agriculture (Day 2)
2025-10-16 · 16:00
Join us for day two of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 16 at 9 AM Pacific Location Virtual. Register for the Zoom. Field-Ready Vision: Building the Agricultural Image Repository (AgIR) for Sustainable Farming Data—not models—is the bottleneck in agricultural computer vision. This talk shares how Precision Sustainable Agriculture (PSA) is tackling that gap with the Agricultural Image Repository (AgIR): a cloud bank of high-resolution, labeled images spanning weeds (40+ species), cover crops, and cash crops across regions, seasons, and sensors. We’ll show how AgIR blends two complementary streams: (1) semi-field, high-throughput data captured by BenchBot, our open-source, modular gantry that autonomously images plants and feeds a semi-automated annotation pipeline; (2) true field images that capture real environmental variability. Together, they cut labeling cost, accelerate pretraining, and improve robustness in production. On top of AgIR, we’ve built a data-centric training stack: hierarchical augmentation groups, batch mixers, a stand-alone visualizer for rapid iteration, and a reproducible PyTorch Lightning pipeline. We’ll cover practical lessons from segmentation (crop/weed/residue/water/soil), handling domain shift between semi-field and field scenes, and designing metadata schemas that actually pay off at model time. About the Speaker Sina Baghbanijam is a Ph.D. candidate in Electrical and Computer Engineering at North Carolina State University, where his research centers on generative AI, computer vision, and machine learning. His work bridges advanced AI methods with real-world applications across agriculture, medicine, and the social sciences, with a focus on large-scale image segmentation, bias-aware modeling, and data-driven analysis. In addition to his academic research, Sina is currently serving as an Agricultural Image Repository Software Engineering Intern with Precision Sustainable Agriculture, where he develops scalable pipelines and metadata systems to support AI-driven analysis of crop, soil, and field imagery. Beyond Manual Measurements: How AI is Accelerating Plant Breeding Traditional plant breeding relies on manual phenotypic measurements that are time-intensive, subjective, and create bottlenecks in variety development. This presentation demonstrates how computer vision and artificial intelligence are revolutionizing plant selection processes by automating trait extraction from simple photographs. Our cloud-based platform transforms images captured with smartphones, drones, or laboratory cameras into instant, quantitative phenotypic data including fruit count, size measurements, and weight estimations. The system integrates phenotypic data with genotypic, pedigree, and environmental information in a unified database, enabling real-time analytics and decision support through intuitive dashboards. Unlike expensive hardware-dependent solutions, our software-focused approach works with existing camera equipment and standard breeding workflows, making advanced phenotyping accessible to organizations of all sizes. About the Speaker Dr. Sharon Inch is a botanist with a PhD in Plant Pathology and over 20 years of experience in horticulture and agricultural research. Throughout her career, she has witnessed firsthand the inefficiencies of traditional breeding methods, inspiring her to found AgriVision Analytics. As CEO, she leads the development of cloud-based computer vision platforms that transform plant breeding workflows through AI-powered phenotyping. Her work focuses on accelerating variety development and improving breeding decision-making through automated trait extraction and data integration. Dr. Sharon Inch is passionate about bridging the gap between advanced technology and practical agricultural applications to address global food security challenges. AI-assisted sweetpotato yield estimation pipelines using optical sensor data In this presentation, we will introduce the sensor systems and AI-powered analysis algorithms used in high-throughput sweetpotato post-harvest packing pipelines (developed by the Optical Sensing Lab at NC State University). By collecting image data from sweetpotato fields and packing lines respectively, we aim to quantitatively optimize the grading and yield estimation process, and the planning on storage and inventory-order matching. We built two customized sensor devices to collect data respectively from the top bins when receiving sweetpotatoes from farmers, and eliminator table before grading and packing process. We also developed a compact instance segmentation pipeline that can run on smart phones for rapid yield estimation in-field with resource limitations. To minimize data privacy concerns and Internet connectivity issues, we try to keep all the analysis pipelines on the edge, which results in a design tradeoff between resource availability and environmental constraints. We will also introduce sensor building with these considerations. The analysis results and real time production information are then integrated into an interactive online dashboard, where stakeholders can leverage to help with inventory-order management and making operational decisions. About the Speaker Yifan Wu is a current Ph.D candidate at NC State University working in the Optical Sensing Lab (OSL) supervised by Dr. Michael Kudenov. Research focuses on developing sensor systems and machine learning platforms for business intelligence applications. An End-to-End AgTech Use Case in FiftyOne The agricultural sector is increasingly turning to computer vision to tackle challenges in crop monitoring, pest detection, and yield optimization. Yet, developing robust models in this space often requires careful data exploration, curation, and evaluation—steps that are just as critical as model training itself. In this talk, we will walk through an end-to-end AgTech use case using FiftyOne, an open-source tool for dataset visualization, curation, and model evaluation. Starting with a pest detection dataset, we will explore the samples and annotations to understand dataset quality and potential pitfalls. From there, we will curate the dataset by filtering, tagging, and identifying edge cases that could impact downstream performance. Next, we’ll train a computer vision model to detect different pest species and demonstrate how FiftyOne can be used to rigorously evaluate the results. Along the way, we’ll highlight how dataset-centric workflows can accelerate experimentation, improve model reliability, and surface actionable insights specific to agricultural applications. By the end of the session, attendees will gain a practical understanding of how to: - Explore and diagnose real-world agricultural datasets - Curate training data for improved performance - Train and evaluate pest detection models - Use FiftyOne to close the loop between data and models This talk will be valuable for anyone working at the intersection of agriculture and computer vision, whether you’re building production models or just beginning to explore AgTech use cases. About the Speaker Prerna Dhareshwar is a Machine Learning Engineer at Voxel51, where she helps customers leverage FiftyOne to accelerate dataset curation, model development, and evaluation in real-world AI workflows. She brings extensive experience building and deploying computer vision and machine learning systems across industries. Prior to Voxel51, Prerna was a Senior Machine Learning Engineer at Instrumental Inc., where she developed models for defect detection in manufacturing, and a Machine Learning Software Engineer at Pure Storage, focusing on predictive analytics and automation. |
Oct 16 - Visual AI in Agriculture (Day 2)
|
|
Oct 16 - Visual AI in Agriculture (Day 2)
2025-10-16 · 16:00
Join us for day two of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 16 at 9 AM Pacific Location Virtual. Register for the Zoom. Field-Ready Vision: Building the Agricultural Image Repository (AgIR) for Sustainable Farming Data—not models—is the bottleneck in agricultural computer vision. This talk shares how Precision Sustainable Agriculture (PSA) is tackling that gap with the Agricultural Image Repository (AgIR): a cloud bank of high-resolution, labeled images spanning weeds (40+ species), cover crops, and cash crops across regions, seasons, and sensors. We’ll show how AgIR blends two complementary streams: (1) semi-field, high-throughput data captured by BenchBot, our open-source, modular gantry that autonomously images plants and feeds a semi-automated annotation pipeline; (2) true field images that capture real environmental variability. Together, they cut labeling cost, accelerate pretraining, and improve robustness in production. On top of AgIR, we’ve built a data-centric training stack: hierarchical augmentation groups, batch mixers, a stand-alone visualizer for rapid iteration, and a reproducible PyTorch Lightning pipeline. We’ll cover practical lessons from segmentation (crop/weed/residue/water/soil), handling domain shift between semi-field and field scenes, and designing metadata schemas that actually pay off at model time. About the Speaker Sina Baghbanijam is a Ph.D. candidate in Electrical and Computer Engineering at North Carolina State University, where his research centers on generative AI, computer vision, and machine learning. His work bridges advanced AI methods with real-world applications across agriculture, medicine, and the social sciences, with a focus on large-scale image segmentation, bias-aware modeling, and data-driven analysis. In addition to his academic research, Sina is currently serving as an Agricultural Image Repository Software Engineering Intern with Precision Sustainable Agriculture, where he develops scalable pipelines and metadata systems to support AI-driven analysis of crop, soil, and field imagery. Beyond Manual Measurements: How AI is Accelerating Plant Breeding Traditional plant breeding relies on manual phenotypic measurements that are time-intensive, subjective, and create bottlenecks in variety development. This presentation demonstrates how computer vision and artificial intelligence are revolutionizing plant selection processes by automating trait extraction from simple photographs. Our cloud-based platform transforms images captured with smartphones, drones, or laboratory cameras into instant, quantitative phenotypic data including fruit count, size measurements, and weight estimations. The system integrates phenotypic data with genotypic, pedigree, and environmental information in a unified database, enabling real-time analytics and decision support through intuitive dashboards. Unlike expensive hardware-dependent solutions, our software-focused approach works with existing camera equipment and standard breeding workflows, making advanced phenotyping accessible to organizations of all sizes. About the Speaker Dr. Sharon Inch is a botanist with a PhD in Plant Pathology and over 20 years of experience in horticulture and agricultural research. Throughout her career, she has witnessed firsthand the inefficiencies of traditional breeding methods, inspiring her to found AgriVision Analytics. As CEO, she leads the development of cloud-based computer vision platforms that transform plant breeding workflows through AI-powered phenotyping. Her work focuses on accelerating variety development and improving breeding decision-making through automated trait extraction and data integration. Dr. Sharon Inch is passionate about bridging the gap between advanced technology and practical agricultural applications to address global food security challenges. AI-assisted sweetpotato yield estimation pipelines using optical sensor data In this presentation, we will introduce the sensor systems and AI-powered analysis algorithms used in high-throughput sweetpotato post-harvest packing pipelines (developed by the Optical Sensing Lab at NC State University). By collecting image data from sweetpotato fields and packing lines respectively, we aim to quantitatively optimize the grading and yield estimation process, and the planning on storage and inventory-order matching. We built two customized sensor devices to collect data respectively from the top bins when receiving sweetpotatoes from farmers, and eliminator table before grading and packing process. We also developed a compact instance segmentation pipeline that can run on smart phones for rapid yield estimation in-field with resource limitations. To minimize data privacy concerns and Internet connectivity issues, we try to keep all the analysis pipelines on the edge, which results in a design tradeoff between resource availability and environmental constraints. We will also introduce sensor building with these considerations. The analysis results and real time production information are then integrated into an interactive online dashboard, where stakeholders can leverage to help with inventory-order management and making operational decisions. About the Speaker Yifan Wu is a current Ph.D candidate at NC State University working in the Optical Sensing Lab (OSL) supervised by Dr. Michael Kudenov. Research focuses on developing sensor systems and machine learning platforms for business intelligence applications. An End-to-End AgTech Use Case in FiftyOne The agricultural sector is increasingly turning to computer vision to tackle challenges in crop monitoring, pest detection, and yield optimization. Yet, developing robust models in this space often requires careful data exploration, curation, and evaluation—steps that are just as critical as model training itself. In this talk, we will walk through an end-to-end AgTech use case using FiftyOne, an open-source tool for dataset visualization, curation, and model evaluation. Starting with a pest detection dataset, we will explore the samples and annotations to understand dataset quality and potential pitfalls. From there, we will curate the dataset by filtering, tagging, and identifying edge cases that could impact downstream performance. Next, we’ll train a computer vision model to detect different pest species and demonstrate how FiftyOne can be used to rigorously evaluate the results. Along the way, we’ll highlight how dataset-centric workflows can accelerate experimentation, improve model reliability, and surface actionable insights specific to agricultural applications. By the end of the session, attendees will gain a practical understanding of how to: - Explore and diagnose real-world agricultural datasets - Curate training data for improved performance - Train and evaluate pest detection models - Use FiftyOne to close the loop between data and models This talk will be valuable for anyone working at the intersection of agriculture and computer vision, whether you’re building production models or just beginning to explore AgTech use cases. About the Speaker Prerna Dhareshwar is a Machine Learning Engineer at Voxel51, where she helps customers leverage FiftyOne to accelerate dataset curation, model development, and evaluation in real-world AI workflows. She brings extensive experience building and deploying computer vision and machine learning systems across industries. Prior to Voxel51, Prerna was a Senior Machine Learning Engineer at Instrumental Inc., where she developed models for defect detection in manufacturing, and a Machine Learning Software Engineer at Pure Storage, focusing on predictive analytics and automation. |
Oct 16 - Visual AI in Agriculture (Day 2)
|
|
Oct 16 - Visual AI in Agriculture (Day 2)
2025-10-16 · 16:00
Join us for day two of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 16 at 9 AM Pacific Location Virtual. Register for the Zoom. Field-Ready Vision: Building the Agricultural Image Repository (AgIR) for Sustainable Farming Data—not models—is the bottleneck in agricultural computer vision. This talk shares how Precision Sustainable Agriculture (PSA) is tackling that gap with the Agricultural Image Repository (AgIR): a cloud bank of high-resolution, labeled images spanning weeds (40+ species), cover crops, and cash crops across regions, seasons, and sensors. We’ll show how AgIR blends two complementary streams: (1) semi-field, high-throughput data captured by BenchBot, our open-source, modular gantry that autonomously images plants and feeds a semi-automated annotation pipeline; (2) true field images that capture real environmental variability. Together, they cut labeling cost, accelerate pretraining, and improve robustness in production. On top of AgIR, we’ve built a data-centric training stack: hierarchical augmentation groups, batch mixers, a stand-alone visualizer for rapid iteration, and a reproducible PyTorch Lightning pipeline. We’ll cover practical lessons from segmentation (crop/weed/residue/water/soil), handling domain shift between semi-field and field scenes, and designing metadata schemas that actually pay off at model time. About the Speaker Sina Baghbanijam is a Ph.D. candidate in Electrical and Computer Engineering at North Carolina State University, where his research centers on generative AI, computer vision, and machine learning. His work bridges advanced AI methods with real-world applications across agriculture, medicine, and the social sciences, with a focus on large-scale image segmentation, bias-aware modeling, and data-driven analysis. In addition to his academic research, Sina is currently serving as an Agricultural Image Repository Software Engineering Intern with Precision Sustainable Agriculture, where he develops scalable pipelines and metadata systems to support AI-driven analysis of crop, soil, and field imagery. Beyond Manual Measurements: How AI is Accelerating Plant Breeding Traditional plant breeding relies on manual phenotypic measurements that are time-intensive, subjective, and create bottlenecks in variety development. This presentation demonstrates how computer vision and artificial intelligence are revolutionizing plant selection processes by automating trait extraction from simple photographs. Our cloud-based platform transforms images captured with smartphones, drones, or laboratory cameras into instant, quantitative phenotypic data including fruit count, size measurements, and weight estimations. The system integrates phenotypic data with genotypic, pedigree, and environmental information in a unified database, enabling real-time analytics and decision support through intuitive dashboards. Unlike expensive hardware-dependent solutions, our software-focused approach works with existing camera equipment and standard breeding workflows, making advanced phenotyping accessible to organizations of all sizes. About the Speaker Dr. Sharon Inch is a botanist with a PhD in Plant Pathology and over 20 years of experience in horticulture and agricultural research. Throughout her career, she has witnessed firsthand the inefficiencies of traditional breeding methods, inspiring her to found AgriVision Analytics. As CEO, she leads the development of cloud-based computer vision platforms that transform plant breeding workflows through AI-powered phenotyping. Her work focuses on accelerating variety development and improving breeding decision-making through automated trait extraction and data integration. Dr. Sharon Inch is passionate about bridging the gap between advanced technology and practical agricultural applications to address global food security challenges. AI-assisted sweetpotato yield estimation pipelines using optical sensor data In this presentation, we will introduce the sensor systems and AI-powered analysis algorithms used in high-throughput sweetpotato post-harvest packing pipelines (developed by the Optical Sensing Lab at NC State University). By collecting image data from sweetpotato fields and packing lines respectively, we aim to quantitatively optimize the grading and yield estimation process, and the planning on storage and inventory-order matching. We built two customized sensor devices to collect data respectively from the top bins when receiving sweetpotatoes from farmers, and eliminator table before grading and packing process. We also developed a compact instance segmentation pipeline that can run on smart phones for rapid yield estimation in-field with resource limitations. To minimize data privacy concerns and Internet connectivity issues, we try to keep all the analysis pipelines on the edge, which results in a design tradeoff between resource availability and environmental constraints. We will also introduce sensor building with these considerations. The analysis results and real time production information are then integrated into an interactive online dashboard, where stakeholders can leverage to help with inventory-order management and making operational decisions. About the Speaker Yifan Wu is a current Ph.D candidate at NC State University working in the Optical Sensing Lab (OSL) supervised by Dr. Michael Kudenov. Research focuses on developing sensor systems and machine learning platforms for business intelligence applications. An End-to-End AgTech Use Case in FiftyOne The agricultural sector is increasingly turning to computer vision to tackle challenges in crop monitoring, pest detection, and yield optimization. Yet, developing robust models in this space often requires careful data exploration, curation, and evaluation—steps that are just as critical as model training itself. In this talk, we will walk through an end-to-end AgTech use case using FiftyOne, an open-source tool for dataset visualization, curation, and model evaluation. Starting with a pest detection dataset, we will explore the samples and annotations to understand dataset quality and potential pitfalls. From there, we will curate the dataset by filtering, tagging, and identifying edge cases that could impact downstream performance. Next, we’ll train a computer vision model to detect different pest species and demonstrate how FiftyOne can be used to rigorously evaluate the results. Along the way, we’ll highlight how dataset-centric workflows can accelerate experimentation, improve model reliability, and surface actionable insights specific to agricultural applications. By the end of the session, attendees will gain a practical understanding of how to: - Explore and diagnose real-world agricultural datasets - Curate training data for improved performance - Train and evaluate pest detection models - Use FiftyOne to close the loop between data and models This talk will be valuable for anyone working at the intersection of agriculture and computer vision, whether you’re building production models or just beginning to explore AgTech use cases. About the Speaker Prerna Dhareshwar is a Machine Learning Engineer at Voxel51, where she helps customers leverage FiftyOne to accelerate dataset curation, model development, and evaluation in real-world AI workflows. She brings extensive experience building and deploying computer vision and machine learning systems across industries. Prior to Voxel51, Prerna was a Senior Machine Learning Engineer at Instrumental Inc., where she developed models for defect detection in manufacturing, and a Machine Learning Software Engineer at Pure Storage, focusing on predictive analytics and automation. |
Oct 16 - Visual AI in Agriculture (Day 2)
|
|
Oct 16 - Visual AI in Agriculture (Day 2)
2025-10-16 · 16:00
Join us for day two of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 16 at 9 AM Pacific Location Virtual. Register for the Zoom. Field-Ready Vision: Building the Agricultural Image Repository (AgIR) for Sustainable Farming Data—not models—is the bottleneck in agricultural computer vision. This talk shares how Precision Sustainable Agriculture (PSA) is tackling that gap with the Agricultural Image Repository (AgIR): a cloud bank of high-resolution, labeled images spanning weeds (40+ species), cover crops, and cash crops across regions, seasons, and sensors. We’ll show how AgIR blends two complementary streams: (1) semi-field, high-throughput data captured by BenchBot, our open-source, modular gantry that autonomously images plants and feeds a semi-automated annotation pipeline; (2) true field images that capture real environmental variability. Together, they cut labeling cost, accelerate pretraining, and improve robustness in production. On top of AgIR, we’ve built a data-centric training stack: hierarchical augmentation groups, batch mixers, a stand-alone visualizer for rapid iteration, and a reproducible PyTorch Lightning pipeline. We’ll cover practical lessons from segmentation (crop/weed/residue/water/soil), handling domain shift between semi-field and field scenes, and designing metadata schemas that actually pay off at model time. About the Speaker Sina Baghbanijam is a Ph.D. candidate in Electrical and Computer Engineering at North Carolina State University, where his research centers on generative AI, computer vision, and machine learning. His work bridges advanced AI methods with real-world applications across agriculture, medicine, and the social sciences, with a focus on large-scale image segmentation, bias-aware modeling, and data-driven analysis. In addition to his academic research, Sina is currently serving as an Agricultural Image Repository Software Engineering Intern with Precision Sustainable Agriculture, where he develops scalable pipelines and metadata systems to support AI-driven analysis of crop, soil, and field imagery. Beyond Manual Measurements: How AI is Accelerating Plant Breeding Traditional plant breeding relies on manual phenotypic measurements that are time-intensive, subjective, and create bottlenecks in variety development. This presentation demonstrates how computer vision and artificial intelligence are revolutionizing plant selection processes by automating trait extraction from simple photographs. Our cloud-based platform transforms images captured with smartphones, drones, or laboratory cameras into instant, quantitative phenotypic data including fruit count, size measurements, and weight estimations. The system integrates phenotypic data with genotypic, pedigree, and environmental information in a unified database, enabling real-time analytics and decision support through intuitive dashboards. Unlike expensive hardware-dependent solutions, our software-focused approach works with existing camera equipment and standard breeding workflows, making advanced phenotyping accessible to organizations of all sizes. About the Speaker Dr. Sharon Inch is a botanist with a PhD in Plant Pathology and over 20 years of experience in horticulture and agricultural research. Throughout her career, she has witnessed firsthand the inefficiencies of traditional breeding methods, inspiring her to found AgriVision Analytics. As CEO, she leads the development of cloud-based computer vision platforms that transform plant breeding workflows through AI-powered phenotyping. Her work focuses on accelerating variety development and improving breeding decision-making through automated trait extraction and data integration. Dr. Sharon Inch is passionate about bridging the gap between advanced technology and practical agricultural applications to address global food security challenges. AI-assisted sweetpotato yield estimation pipelines using optical sensor data In this presentation, we will introduce the sensor systems and AI-powered analysis algorithms used in high-throughput sweetpotato post-harvest packing pipelines (developed by the Optical Sensing Lab at NC State University). By collecting image data from sweetpotato fields and packing lines respectively, we aim to quantitatively optimize the grading and yield estimation process, and the planning on storage and inventory-order matching. We built two customized sensor devices to collect data respectively from the top bins when receiving sweetpotatoes from farmers, and eliminator table before grading and packing process. We also developed a compact instance segmentation pipeline that can run on smart phones for rapid yield estimation in-field with resource limitations. To minimize data privacy concerns and Internet connectivity issues, we try to keep all the analysis pipelines on the edge, which results in a design tradeoff between resource availability and environmental constraints. We will also introduce sensor building with these considerations. The analysis results and real time production information are then integrated into an interactive online dashboard, where stakeholders can leverage to help with inventory-order management and making operational decisions. About the Speaker Yifan Wu is a current Ph.D candidate at NC State University working in the Optical Sensing Lab (OSL) supervised by Dr. Michael Kudenov. Research focuses on developing sensor systems and machine learning platforms for business intelligence applications. An End-to-End AgTech Use Case in FiftyOne The agricultural sector is increasingly turning to computer vision to tackle challenges in crop monitoring, pest detection, and yield optimization. Yet, developing robust models in this space often requires careful data exploration, curation, and evaluation—steps that are just as critical as model training itself. In this talk, we will walk through an end-to-end AgTech use case using FiftyOne, an open-source tool for dataset visualization, curation, and model evaluation. Starting with a pest detection dataset, we will explore the samples and annotations to understand dataset quality and potential pitfalls. From there, we will curate the dataset by filtering, tagging, and identifying edge cases that could impact downstream performance. Next, we’ll train a computer vision model to detect different pest species and demonstrate how FiftyOne can be used to rigorously evaluate the results. Along the way, we’ll highlight how dataset-centric workflows can accelerate experimentation, improve model reliability, and surface actionable insights specific to agricultural applications. By the end of the session, attendees will gain a practical understanding of how to: - Explore and diagnose real-world agricultural datasets - Curate training data for improved performance - Train and evaluate pest detection models - Use FiftyOne to close the loop between data and models This talk will be valuable for anyone working at the intersection of agriculture and computer vision, whether you’re building production models or just beginning to explore AgTech use cases. About the Speaker Prerna Dhareshwar is a Machine Learning Engineer at Voxel51, where she helps customers leverage FiftyOne to accelerate dataset curation, model development, and evaluation in real-world AI workflows. She brings extensive experience building and deploying computer vision and machine learning systems across industries. Prior to Voxel51, Prerna was a Senior Machine Learning Engineer at Instrumental Inc., where she developed models for defect detection in manufacturing, and a Machine Learning Software Engineer at Pure Storage, focusing on predictive analytics and automation. |
Oct 16 - Visual AI in Agriculture (Day 2)
|
|
Oct 16 - Visual AI in Agriculture (Day 2)
2025-10-16 · 16:00
Join us for day two of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 16 at 9 AM Pacific Location Virtual. Register for the Zoom. Field-Ready Vision: Building the Agricultural Image Repository (AgIR) for Sustainable Farming Data—not models—is the bottleneck in agricultural computer vision. This talk shares how Precision Sustainable Agriculture (PSA) is tackling that gap with the Agricultural Image Repository (AgIR): a cloud bank of high-resolution, labeled images spanning weeds (40+ species), cover crops, and cash crops across regions, seasons, and sensors. We’ll show how AgIR blends two complementary streams: (1) semi-field, high-throughput data captured by BenchBot, our open-source, modular gantry that autonomously images plants and feeds a semi-automated annotation pipeline; (2) true field images that capture real environmental variability. Together, they cut labeling cost, accelerate pretraining, and improve robustness in production. On top of AgIR, we’ve built a data-centric training stack: hierarchical augmentation groups, batch mixers, a stand-alone visualizer for rapid iteration, and a reproducible PyTorch Lightning pipeline. We’ll cover practical lessons from segmentation (crop/weed/residue/water/soil), handling domain shift between semi-field and field scenes, and designing metadata schemas that actually pay off at model time. About the Speaker Sina Baghbanijam is a Ph.D. candidate in Electrical and Computer Engineering at North Carolina State University, where his research centers on generative AI, computer vision, and machine learning. His work bridges advanced AI methods with real-world applications across agriculture, medicine, and the social sciences, with a focus on large-scale image segmentation, bias-aware modeling, and data-driven analysis. In addition to his academic research, Sina is currently serving as an Agricultural Image Repository Software Engineering Intern with Precision Sustainable Agriculture, where he develops scalable pipelines and metadata systems to support AI-driven analysis of crop, soil, and field imagery. Beyond Manual Measurements: How AI is Accelerating Plant Breeding Traditional plant breeding relies on manual phenotypic measurements that are time-intensive, subjective, and create bottlenecks in variety development. This presentation demonstrates how computer vision and artificial intelligence are revolutionizing plant selection processes by automating trait extraction from simple photographs. Our cloud-based platform transforms images captured with smartphones, drones, or laboratory cameras into instant, quantitative phenotypic data including fruit count, size measurements, and weight estimations. The system integrates phenotypic data with genotypic, pedigree, and environmental information in a unified database, enabling real-time analytics and decision support through intuitive dashboards. Unlike expensive hardware-dependent solutions, our software-focused approach works with existing camera equipment and standard breeding workflows, making advanced phenotyping accessible to organizations of all sizes. About the Speaker Dr. Sharon Inch is a botanist with a PhD in Plant Pathology and over 20 years of experience in horticulture and agricultural research. Throughout her career, she has witnessed firsthand the inefficiencies of traditional breeding methods, inspiring her to found AgriVision Analytics. As CEO, she leads the development of cloud-based computer vision platforms that transform plant breeding workflows through AI-powered phenotyping. Her work focuses on accelerating variety development and improving breeding decision-making through automated trait extraction and data integration. Dr. Sharon Inch is passionate about bridging the gap between advanced technology and practical agricultural applications to address global food security challenges. AI-assisted sweetpotato yield estimation pipelines using optical sensor data In this presentation, we will introduce the sensor systems and AI-powered analysis algorithms used in high-throughput sweetpotato post-harvest packing pipelines (developed by the Optical Sensing Lab at NC State University). By collecting image data from sweetpotato fields and packing lines respectively, we aim to quantitatively optimize the grading and yield estimation process, and the planning on storage and inventory-order matching. We built two customized sensor devices to collect data respectively from the top bins when receiving sweetpotatoes from farmers, and eliminator table before grading and packing process. We also developed a compact instance segmentation pipeline that can run on smart phones for rapid yield estimation in-field with resource limitations. To minimize data privacy concerns and Internet connectivity issues, we try to keep all the analysis pipelines on the edge, which results in a design tradeoff between resource availability and environmental constraints. We will also introduce sensor building with these considerations. The analysis results and real time production information are then integrated into an interactive online dashboard, where stakeholders can leverage to help with inventory-order management and making operational decisions. About the Speaker Yifan Wu is a current Ph.D candidate at NC State University working in the Optical Sensing Lab (OSL) supervised by Dr. Michael Kudenov. Research focuses on developing sensor systems and machine learning platforms for business intelligence applications. An End-to-End AgTech Use Case in FiftyOne The agricultural sector is increasingly turning to computer vision to tackle challenges in crop monitoring, pest detection, and yield optimization. Yet, developing robust models in this space often requires careful data exploration, curation, and evaluation—steps that are just as critical as model training itself. In this talk, we will walk through an end-to-end AgTech use case using FiftyOne, an open-source tool for dataset visualization, curation, and model evaluation. Starting with a pest detection dataset, we will explore the samples and annotations to understand dataset quality and potential pitfalls. From there, we will curate the dataset by filtering, tagging, and identifying edge cases that could impact downstream performance. Next, we’ll train a computer vision model to detect different pest species and demonstrate how FiftyOne can be used to rigorously evaluate the results. Along the way, we’ll highlight how dataset-centric workflows can accelerate experimentation, improve model reliability, and surface actionable insights specific to agricultural applications. By the end of the session, attendees will gain a practical understanding of how to: - Explore and diagnose real-world agricultural datasets - Curate training data for improved performance - Train and evaluate pest detection models - Use FiftyOne to close the loop between data and models This talk will be valuable for anyone working at the intersection of agriculture and computer vision, whether you’re building production models or just beginning to explore AgTech use cases. About the Speaker Prerna Dhareshwar is a Machine Learning Engineer at Voxel51, where she helps customers leverage FiftyOne to accelerate dataset curation, model development, and evaluation in real-world AI workflows. She brings extensive experience building and deploying computer vision and machine learning systems across industries. Prior to Voxel51, Prerna was a Senior Machine Learning Engineer at Instrumental Inc., where she developed models for defect detection in manufacturing, and a Machine Learning Software Engineer at Pure Storage, focusing on predictive analytics and automation. |
Oct 16 - Visual AI in Agriculture (Day 2)
|
|
Oct 16 - Visual AI in Agriculture (Day 2)
2025-10-16 · 16:00
Join us for day two of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 16 at 9 AM Pacific Location Virtual. Register for the Zoom. Field-Ready Vision: Building the Agricultural Image Repository (AgIR) for Sustainable Farming Data—not models—is the bottleneck in agricultural computer vision. This talk shares how Precision Sustainable Agriculture (PSA) is tackling that gap with the Agricultural Image Repository (AgIR): a cloud bank of high-resolution, labeled images spanning weeds (40+ species), cover crops, and cash crops across regions, seasons, and sensors. We’ll show how AgIR blends two complementary streams: (1) semi-field, high-throughput data captured by BenchBot, our open-source, modular gantry that autonomously images plants and feeds a semi-automated annotation pipeline; (2) true field images that capture real environmental variability. Together, they cut labeling cost, accelerate pretraining, and improve robustness in production. On top of AgIR, we’ve built a data-centric training stack: hierarchical augmentation groups, batch mixers, a stand-alone visualizer for rapid iteration, and a reproducible PyTorch Lightning pipeline. We’ll cover practical lessons from segmentation (crop/weed/residue/water/soil), handling domain shift between semi-field and field scenes, and designing metadata schemas that actually pay off at model time. About the Speaker Sina Baghbanijam is a Ph.D. candidate in Electrical and Computer Engineering at North Carolina State University, where his research centers on generative AI, computer vision, and machine learning. His work bridges advanced AI methods with real-world applications across agriculture, medicine, and the social sciences, with a focus on large-scale image segmentation, bias-aware modeling, and data-driven analysis. In addition to his academic research, Sina is currently serving as an Agricultural Image Repository Software Engineering Intern with Precision Sustainable Agriculture, where he develops scalable pipelines and metadata systems to support AI-driven analysis of crop, soil, and field imagery. Beyond Manual Measurements: How AI is Accelerating Plant Breeding Traditional plant breeding relies on manual phenotypic measurements that are time-intensive, subjective, and create bottlenecks in variety development. This presentation demonstrates how computer vision and artificial intelligence are revolutionizing plant selection processes by automating trait extraction from simple photographs. Our cloud-based platform transforms images captured with smartphones, drones, or laboratory cameras into instant, quantitative phenotypic data including fruit count, size measurements, and weight estimations. The system integrates phenotypic data with genotypic, pedigree, and environmental information in a unified database, enabling real-time analytics and decision support through intuitive dashboards. Unlike expensive hardware-dependent solutions, our software-focused approach works with existing camera equipment and standard breeding workflows, making advanced phenotyping accessible to organizations of all sizes. About the Speaker Dr. Sharon Inch is a botanist with a PhD in Plant Pathology and over 20 years of experience in horticulture and agricultural research. Throughout her career, she has witnessed firsthand the inefficiencies of traditional breeding methods, inspiring her to found AgriVision Analytics. As CEO, she leads the development of cloud-based computer vision platforms that transform plant breeding workflows through AI-powered phenotyping. Her work focuses on accelerating variety development and improving breeding decision-making through automated trait extraction and data integration. Dr. Sharon Inch is passionate about bridging the gap between advanced technology and practical agricultural applications to address global food security challenges. AI-assisted sweetpotato yield estimation pipelines using optical sensor data In this presentation, we will introduce the sensor systems and AI-powered analysis algorithms used in high-throughput sweetpotato post-harvest packing pipelines (developed by the Optical Sensing Lab at NC State University). By collecting image data from sweetpotato fields and packing lines respectively, we aim to quantitatively optimize the grading and yield estimation process, and the planning on storage and inventory-order matching. We built two customized sensor devices to collect data respectively from the top bins when receiving sweetpotatoes from farmers, and eliminator table before grading and packing process. We also developed a compact instance segmentation pipeline that can run on smart phones for rapid yield estimation in-field with resource limitations. To minimize data privacy concerns and Internet connectivity issues, we try to keep all the analysis pipelines on the edge, which results in a design tradeoff between resource availability and environmental constraints. We will also introduce sensor building with these considerations. The analysis results and real time production information are then integrated into an interactive online dashboard, where stakeholders can leverage to help with inventory-order management and making operational decisions. About the Speaker Yifan Wu is a current Ph.D candidate at NC State University working in the Optical Sensing Lab (OSL) supervised by Dr. Michael Kudenov. Research focuses on developing sensor systems and machine learning platforms for business intelligence applications. An End-to-End AgTech Use Case in FiftyOne The agricultural sector is increasingly turning to computer vision to tackle challenges in crop monitoring, pest detection, and yield optimization. Yet, developing robust models in this space often requires careful data exploration, curation, and evaluation—steps that are just as critical as model training itself. In this talk, we will walk through an end-to-end AgTech use case using FiftyOne, an open-source tool for dataset visualization, curation, and model evaluation. Starting with a pest detection dataset, we will explore the samples and annotations to understand dataset quality and potential pitfalls. From there, we will curate the dataset by filtering, tagging, and identifying edge cases that could impact downstream performance. Next, we’ll train a computer vision model to detect different pest species and demonstrate how FiftyOne can be used to rigorously evaluate the results. Along the way, we’ll highlight how dataset-centric workflows can accelerate experimentation, improve model reliability, and surface actionable insights specific to agricultural applications. By the end of the session, attendees will gain a practical understanding of how to: - Explore and diagnose real-world agricultural datasets - Curate training data for improved performance - Train and evaluate pest detection models - Use FiftyOne to close the loop between data and models This talk will be valuable for anyone working at the intersection of agriculture and computer vision, whether you’re building production models or just beginning to explore AgTech use cases. About the Speaker Prerna Dhareshwar is a Machine Learning Engineer at Voxel51, where she helps customers leverage FiftyOne to accelerate dataset curation, model development, and evaluation in real-world AI workflows. She brings extensive experience building and deploying computer vision and machine learning systems across industries. Prior to Voxel51, Prerna was a Senior Machine Learning Engineer at Instrumental Inc., where she developed models for defect detection in manufacturing, and a Machine Learning Software Engineer at Pure Storage, focusing on predictive analytics and automation. |
Oct 16 - Visual AI in Agriculture (Day 2)
|
|
Oct 16 - Visual AI in Agriculture (Day 2)
2025-10-16 · 16:00
Join us for day two of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 16 at 9 AM Pacific Location Virtual. Register for the Zoom. Field-Ready Vision: Building the Agricultural Image Repository (AgIR) for Sustainable Farming Data—not models—is the bottleneck in agricultural computer vision. This talk shares how Precision Sustainable Agriculture (PSA) is tackling that gap with the Agricultural Image Repository (AgIR): a cloud bank of high-resolution, labeled images spanning weeds (40+ species), cover crops, and cash crops across regions, seasons, and sensors. We’ll show how AgIR blends two complementary streams: (1) semi-field, high-throughput data captured by BenchBot, our open-source, modular gantry that autonomously images plants and feeds a semi-automated annotation pipeline; (2) true field images that capture real environmental variability. Together, they cut labeling cost, accelerate pretraining, and improve robustness in production. On top of AgIR, we’ve built a data-centric training stack: hierarchical augmentation groups, batch mixers, a stand-alone visualizer for rapid iteration, and a reproducible PyTorch Lightning pipeline. We’ll cover practical lessons from segmentation (crop/weed/residue/water/soil), handling domain shift between semi-field and field scenes, and designing metadata schemas that actually pay off at model time. About the Speaker Sina Baghbanijam is a Ph.D. candidate in Electrical and Computer Engineering at North Carolina State University, where his research centers on generative AI, computer vision, and machine learning. His work bridges advanced AI methods with real-world applications across agriculture, medicine, and the social sciences, with a focus on large-scale image segmentation, bias-aware modeling, and data-driven analysis. In addition to his academic research, Sina is currently serving as an Agricultural Image Repository Software Engineering Intern with Precision Sustainable Agriculture, where he develops scalable pipelines and metadata systems to support AI-driven analysis of crop, soil, and field imagery. Beyond Manual Measurements: How AI is Accelerating Plant Breeding Traditional plant breeding relies on manual phenotypic measurements that are time-intensive, subjective, and create bottlenecks in variety development. This presentation demonstrates how computer vision and artificial intelligence are revolutionizing plant selection processes by automating trait extraction from simple photographs. Our cloud-based platform transforms images captured with smartphones, drones, or laboratory cameras into instant, quantitative phenotypic data including fruit count, size measurements, and weight estimations. The system integrates phenotypic data with genotypic, pedigree, and environmental information in a unified database, enabling real-time analytics and decision support through intuitive dashboards. Unlike expensive hardware-dependent solutions, our software-focused approach works with existing camera equipment and standard breeding workflows, making advanced phenotyping accessible to organizations of all sizes. About the Speaker Dr. Sharon Inch is a botanist with a PhD in Plant Pathology and over 20 years of experience in horticulture and agricultural research. Throughout her career, she has witnessed firsthand the inefficiencies of traditional breeding methods, inspiring her to found AgriVision Analytics. As CEO, she leads the development of cloud-based computer vision platforms that transform plant breeding workflows through AI-powered phenotyping. Her work focuses on accelerating variety development and improving breeding decision-making through automated trait extraction and data integration. Dr. Sharon Inch is passionate about bridging the gap between advanced technology and practical agricultural applications to address global food security challenges. AI-assisted sweetpotato yield estimation pipelines using optical sensor data In this presentation, we will introduce the sensor systems and AI-powered analysis algorithms used in high-throughput sweetpotato post-harvest packing pipelines (developed by the Optical Sensing Lab at NC State University). By collecting image data from sweetpotato fields and packing lines respectively, we aim to quantitatively optimize the grading and yield estimation process, and the planning on storage and inventory-order matching. We built two customized sensor devices to collect data respectively from the top bins when receiving sweetpotatoes from farmers, and eliminator table before grading and packing process. We also developed a compact instance segmentation pipeline that can run on smart phones for rapid yield estimation in-field with resource limitations. To minimize data privacy concerns and Internet connectivity issues, we try to keep all the analysis pipelines on the edge, which results in a design tradeoff between resource availability and environmental constraints. We will also introduce sensor building with these considerations. The analysis results and real time production information are then integrated into an interactive online dashboard, where stakeholders can leverage to help with inventory-order management and making operational decisions. About the Speaker Yifan Wu is a current Ph.D candidate at NC State University working in the Optical Sensing Lab (OSL) supervised by Dr. Michael Kudenov. Research focuses on developing sensor systems and machine learning platforms for business intelligence applications. An End-to-End AgTech Use Case in FiftyOne The agricultural sector is increasingly turning to computer vision to tackle challenges in crop monitoring, pest detection, and yield optimization. Yet, developing robust models in this space often requires careful data exploration, curation, and evaluation—steps that are just as critical as model training itself. In this talk, we will walk through an end-to-end AgTech use case using FiftyOne, an open-source tool for dataset visualization, curation, and model evaluation. Starting with a pest detection dataset, we will explore the samples and annotations to understand dataset quality and potential pitfalls. From there, we will curate the dataset by filtering, tagging, and identifying edge cases that could impact downstream performance. Next, we’ll train a computer vision model to detect different pest species and demonstrate how FiftyOne can be used to rigorously evaluate the results. Along the way, we’ll highlight how dataset-centric workflows can accelerate experimentation, improve model reliability, and surface actionable insights specific to agricultural applications. By the end of the session, attendees will gain a practical understanding of how to: - Explore and diagnose real-world agricultural datasets - Curate training data for improved performance - Train and evaluate pest detection models - Use FiftyOne to close the loop between data and models This talk will be valuable for anyone working at the intersection of agriculture and computer vision, whether you’re building production models or just beginning to explore AgTech use cases. About the Speaker Prerna Dhareshwar is a Machine Learning Engineer at Voxel51, where she helps customers leverage FiftyOne to accelerate dataset curation, model development, and evaluation in real-world AI workflows. She brings extensive experience building and deploying computer vision and machine learning systems across industries. Prior to Voxel51, Prerna was a Senior Machine Learning Engineer at Instrumental Inc., where she developed models for defect detection in manufacturing, and a Machine Learning Software Engineer at Pure Storage, focusing on predictive analytics and automation. |
Oct 16 - Visual AI in Agriculture (Day 2)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|
|
Oct 15 - Visual AI in Agriculture (Day 1)
2025-10-15 · 16:00
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture. Date and Time Oct 15 at 9 AM Pacific Location Virtual. Register for the Zoom. Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort. The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism. Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans. For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models. About the Speaker Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere\, where he develops deep learning models for LiDAR and RGB perception in safety-critical\, real-time systems. He earned his Ph.D. in Computer Science from Auburn University\, with a dissertation on improving computer vision and spatiotemporal deep neural networks\, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind\, Google\, Meta\, Microsoft\, and OpenAI\, among others\, and his (batter\|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr\, and his GitHub repositories—which have collectively received over 2\,100 stars—have served as starting points for research and production code at many different organizations. MothBox: inexpensive, open-source, automated insect monitor Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide. About the Speaker Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks. Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington. Foundation Models for Visual AI in Agriculture Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip. However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations. About the Speaker Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016. His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award. Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery. The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. |
Oct 15 - Visual AI in Agriculture (Day 1)
|