Search – talk-data.com

June 27 - AI, Machine Learning and Computer Vision Meetup 2024-06-27 · 17:00

When: June 27, 2024 – 10:00 AM Pacific / 1:00 PM Eastern

Register for the Zoom: https://voxel51.com/computer-vision-events/june-27-2024-ai-machine-learning-computer-vision-meetup/

Leveraging Pre-trained Text2Image Diffusion Models for Zero-Shot Video Editing

Text-to-image diffusion models demonstrate remarkable editing capabilities in the image domain, especially after Latent Diffusion Models made diffusion models more scalable. Conversely, video editing still has much room for improvement, particularly given the relative scarcity of video datasets compared to image datasets. Therefore, we will discuss whether pre-trained text-to-image diffusion models can be used for zero-shot video editing without any fine-tuning stage. Finally, we will also explore possible future work and interesting research ideas in the field.

About the Speaker

Bariscan Kurtkaya is a KUIS AI Fellow and a graduate student in the Department of Computer Science at Koc University. His research interests lie in exploring and leveraging the capabilities of generative models in the realm of 2D and 3D data, encompassing scientific observations from space telescopes.

Improved Visual Grounding through Self-Consistent Explanations

Vision-and-language models that are trained to associate images with text have shown to be effective for many tasks, including object detection and image segmentation. In this talk, we will discuss how to enhance vision-and-language models’ ability to localize objects in images by fine-tuning them for self-consistent visual explanations. We propose a method that augments text-image datasets with paraphrases using a large language model and employs SelfEQ, a weakly-supervised strategy that promotes self-consistency in visual explanation maps. This approach broadens the model’s working vocabulary and improves object localization accuracy, as demonstrated by performance gains on competitive benchmarks.

About the Speakers

Dr. Paola Cascante-Bonilla received her Ph.D. in Computer Science at Rice University in 2024, advised by Professor Vicente Ordóñez Román, working on Computer Vision, Natural Language Processing, and Machine Learning. She received a Master of Computer Science at the University of Virginia and a B.S. in Engineering at the Tecnológico de Costa Rica. Paola will join Stony Brook University (SUNY) as an Assistant Professor in the Department of Computer Science.

Ruozhen (Catherine) He is a first-year Computer Science PhD student at Rice University, advised by Prof. Vicente Ordóñez, focusing on efficient algorithms in computer vision with less or multimodal supervision. She aims to leverage insights from neuroscience and cognitive psychology to develop interpretable algorithms that achieve human-level intelligence across versatile tasks.

Combining Hugging Face Transformer Models and Image Data with FiftyOne

Datasets and Models are the two pillars of modern machine learning, but connecting the two can be cumbersome and time-consuming. In this lightning talk, you will learn how the seamless integration between Hugging Face and FiftyOne simplifies this complexity, enabling more effective data-model co-development. By the end of the talk, you will be able to download and visualize datasets from the Hugging Face hub with FiftyOne, apply state-of-the-art transformer models directly to your data, and effortlessly share your datasets with others.

About the Speaker

Jacob Marks, PhD is a Machine Learning Engineer and Developer Evangelist at Voxel51, where he leads open source efforts in vector search, semantic search, and generative AI for the FiftyOne data-centric AI toolkit. Prior to joining Voxel51, Jacob worked at Google X, Samsung Research, and Wolfram Research.

June 27 - AI, Machine Learning and Computer Vision Meetup

June 27 - AI, Machine Learning and Computer Vision Meetup 2024-06-27 · 17:00

When: June 27, 2024 – 10:00 AM Pacific / 1:00 PM Eastern

Register for the Zoom: https://voxel51.com/computer-vision-events/june-27-2024-ai-machine-learning-computer-vision-meetup/

Leveraging Pre-trained Text2Image Diffusion Models for Zero-Shot Video Editing

Text-to-image diffusion models demonstrate remarkable editing capabilities in the image domain, especially after Latent Diffusion Models made diffusion models more scalable. Conversely, video editing still has much room for improvement, particularly given the relative scarcity of video datasets compared to image datasets. Therefore, we will discuss whether pre-trained text-to-image diffusion models can be used for zero-shot video editing without any fine-tuning stage. Finally, we will also explore possible future work and interesting research ideas in the field.

About the Speaker

Bariscan Kurtkaya is a KUIS AI Fellow and a graduate student in the Department of Computer Science at Koc University. His research interests lie in exploring and leveraging the capabilities of generative models in the realm of 2D and 3D data, encompassing scientific observations from space telescopes.

Improved Visual Grounding through Self-Consistent Explanations

Vision-and-language models that are trained to associate images with text have shown to be effective for many tasks, including object detection and image segmentation. In this talk, we will discuss how to enhance vision-and-language models’ ability to localize objects in images by fine-tuning them for self-consistent visual explanations. We propose a method that augments text-image datasets with paraphrases using a large language model and employs SelfEQ, a weakly-supervised strategy that promotes self-consistency in visual explanation maps. This approach broadens the model’s working vocabulary and improves object localization accuracy, as demonstrated by performance gains on competitive benchmarks.

About the Speakers

Dr. Paola Cascante-Bonilla received her Ph.D. in Computer Science at Rice University in 2024, advised by Professor Vicente Ordóñez Román, working on Computer Vision, Natural Language Processing, and Machine Learning. She received a Master of Computer Science at the University of Virginia and a B.S. in Engineering at the Tecnológico de Costa Rica. Paola will join Stony Brook University (SUNY) as an Assistant Professor in the Department of Computer Science.

Ruozhen (Catherine) He is a first-year Computer Science PhD student at Rice University, advised by Prof. Vicente Ordóñez, focusing on efficient algorithms in computer vision with less or multimodal supervision. She aims to leverage insights from neuroscience and cognitive psychology to develop interpretable algorithms that achieve human-level intelligence across versatile tasks.

Combining Hugging Face Transformer Models and Image Data with FiftyOne

Datasets and Models are the two pillars of modern machine learning, but connecting the two can be cumbersome and time-consuming. In this lightning talk, you will learn how the seamless integration between Hugging Face and FiftyOne simplifies this complexity, enabling more effective data-model co-development. By the end of the talk, you will be able to download and visualize datasets from the Hugging Face hub with FiftyOne, apply state-of-the-art transformer models directly to your data, and effortlessly share your datasets with others.

About the Speaker

Jacob Marks, PhD is a Machine Learning Engineer and Developer Evangelist at Voxel51, where he leads open source efforts in vector search, semantic search, and generative AI for the FiftyOne data-centric AI toolkit. Prior to joining Voxel51, Jacob worked at Google X, Samsung Research, and Wolfram Research.

June 27 - AI, Machine Learning and Computer Vision Meetup

June 27 - AI, Machine Learning and Computer Vision Meetup 2024-06-27 · 17:00

When: June 27, 2024 – 10:00 AM Pacific / 1:00 PM Eastern

Register for the Zoom: https://voxel51.com/computer-vision-events/june-27-2024-ai-machine-learning-computer-vision-meetup/

Leveraging Pre-trained Text2Image Diffusion Models for Zero-Shot Video Editing

Text-to-image diffusion models demonstrate remarkable editing capabilities in the image domain, especially after Latent Diffusion Models made diffusion models more scalable. Conversely, video editing still has much room for improvement, particularly given the relative scarcity of video datasets compared to image datasets. Therefore, we will discuss whether pre-trained text-to-image diffusion models can be used for zero-shot video editing without any fine-tuning stage. Finally, we will also explore possible future work and interesting research ideas in the field.

About the Speaker

Bariscan Kurtkaya is a KUIS AI Fellow and a graduate student in the Department of Computer Science at Koc University. His research interests lie in exploring and leveraging the capabilities of generative models in the realm of 2D and 3D data, encompassing scientific observations from space telescopes.

Improved Visual Grounding through Self-Consistent Explanations

Vision-and-language models that are trained to associate images with text have shown to be effective for many tasks, including object detection and image segmentation. In this talk, we will discuss how to enhance vision-and-language models’ ability to localize objects in images by fine-tuning them for self-consistent visual explanations. We propose a method that augments text-image datasets with paraphrases using a large language model and employs SelfEQ, a weakly-supervised strategy that promotes self-consistency in visual explanation maps. This approach broadens the model’s working vocabulary and improves object localization accuracy, as demonstrated by performance gains on competitive benchmarks.

About the Speakers

Dr. Paola Cascante-Bonilla received her Ph.D. in Computer Science at Rice University in 2024, advised by Professor Vicente Ordóñez Román, working on Computer Vision, Natural Language Processing, and Machine Learning. She received a Master of Computer Science at the University of Virginia and a B.S. in Engineering at the Tecnológico de Costa Rica. Paola will join Stony Brook University (SUNY) as an Assistant Professor in the Department of Computer Science.

Ruozhen (Catherine) He is a first-year Computer Science PhD student at Rice University, advised by Prof. Vicente Ordóñez, focusing on efficient algorithms in computer vision with less or multimodal supervision. She aims to leverage insights from neuroscience and cognitive psychology to develop interpretable algorithms that achieve human-level intelligence across versatile tasks.

Combining Hugging Face Transformer Models and Image Data with FiftyOne

Datasets and Models are the two pillars of modern machine learning, but connecting the two can be cumbersome and time-consuming. In this lightning talk, you will learn how the seamless integration between Hugging Face and FiftyOne simplifies this complexity, enabling more effective data-model co-development. By the end of the talk, you will be able to download and visualize datasets from the Hugging Face hub with FiftyOne, apply state-of-the-art transformer models directly to your data, and effortlessly share your datasets with others.

About the Speaker

Jacob Marks, PhD is a Machine Learning Engineer and Developer Evangelist at Voxel51, where he leads open source efforts in vector search, semantic search, and generative AI for the FiftyOne data-centric AI toolkit. Prior to joining Voxel51, Jacob worked at Google X, Samsung Research, and Wolfram Research.

June 27 - AI, Machine Learning and Computer Vision Meetup

June 27 - AI, Machine Learning and Computer Vision Meetup 2024-06-27 · 17:00

When: June 27, 2024 – 10:00 AM Pacific / 1:00 PM Eastern

Register for the Zoom: https://voxel51.com/computer-vision-events/june-27-2024-ai-machine-learning-computer-vision-meetup/

Leveraging Pre-trained Text2Image Diffusion Models for Zero-Shot Video Editing

Text-to-image diffusion models demonstrate remarkable editing capabilities in the image domain, especially after Latent Diffusion Models made diffusion models more scalable. Conversely, video editing still has much room for improvement, particularly given the relative scarcity of video datasets compared to image datasets. Therefore, we will discuss whether pre-trained text-to-image diffusion models can be used for zero-shot video editing without any fine-tuning stage. Finally, we will also explore possible future work and interesting research ideas in the field.

About the Speaker

Bariscan Kurtkaya is a KUIS AI Fellow and a graduate student in the Department of Computer Science at Koc University. His research interests lie in exploring and leveraging the capabilities of generative models in the realm of 2D and 3D data, encompassing scientific observations from space telescopes.

Improved Visual Grounding through Self-Consistent Explanations

Vision-and-language models that are trained to associate images with text have shown to be effective for many tasks, including object detection and image segmentation. In this talk, we will discuss how to enhance vision-and-language models’ ability to localize objects in images by fine-tuning them for self-consistent visual explanations. We propose a method that augments text-image datasets with paraphrases using a large language model and employs SelfEQ, a weakly-supervised strategy that promotes self-consistency in visual explanation maps. This approach broadens the model’s working vocabulary and improves object localization accuracy, as demonstrated by performance gains on competitive benchmarks.

About the Speakers

Dr. Paola Cascante-Bonilla received her Ph.D. in Computer Science at Rice University in 2024, advised by Professor Vicente Ordóñez Román, working on Computer Vision, Natural Language Processing, and Machine Learning. She received a Master of Computer Science at the University of Virginia and a B.S. in Engineering at the Tecnológico de Costa Rica. Paola will join Stony Brook University (SUNY) as an Assistant Professor in the Department of Computer Science.

Ruozhen (Catherine) He is a first-year Computer Science PhD student at Rice University, advised by Prof. Vicente Ordóñez, focusing on efficient algorithms in computer vision with less or multimodal supervision. She aims to leverage insights from neuroscience and cognitive psychology to develop interpretable algorithms that achieve human-level intelligence across versatile tasks.

Combining Hugging Face Transformer Models and Image Data with FiftyOne

Datasets and Models are the two pillars of modern machine learning, but connecting the two can be cumbersome and time-consuming. In this lightning talk, you will learn how the seamless integration between Hugging Face and FiftyOne simplifies this complexity, enabling more effective data-model co-development. By the end of the talk, you will be able to download and visualize datasets from the Hugging Face hub with FiftyOne, apply state-of-the-art transformer models directly to your data, and effortlessly share your datasets with others.

About the Speaker

Jacob Marks, PhD is a Machine Learning Engineer and Developer Evangelist at Voxel51, where he leads open source efforts in vector search, semantic search, and generative AI for the FiftyOne data-centric AI toolkit. Prior to joining Voxel51, Jacob worked at Google X, Samsung Research, and Wolfram Research.

June 27 - AI, Machine Learning and Computer Vision Meetup

May 8 - AI, Machine Learning and Computer Vision Meetup 2024-05-08 · 17:00

When May 8, 2024 – 10:00 AM Pacific / 1:00 PM Eastern

Where Virtual / Zoom: https://voxel51.com/computer-vision-events/may-8-2024-ai-machine-learning-data-science-meetup/

To Infer or To Defer: Hazy Oracles in Human+AI Collaboration

This talk explores the evolving dynamics of human+AI collaboration, focusing on the concept of the human as a “hazy oracle” rather than an infallible source. It outlines the journey of integrating AI systems more deeply into practical applications through human+AI cooperation, discussing the potential value and challenges. The discussion includes the modeling of interaction errors and the strategic choices between immediate AI inference or seeking additional human input, supported by results from a user study on optimizing these collaborations.

About the Speaker

Jason Corso is a Professor of Robotics, Electrical Engineering, and Computer Science at the University of Michigan, and Co-Founder / Chief Scientist at AI startup Voxel51. His research spans computer vision, robotics, and AI, with over 150 peer-reviewed publications.

From Research to Industry: Bridging Real-World Applications with Anomalib at the CVPR VAND Challenge

This talk highlights the role of Anomalib, an open-source deep learning framework, in advancing anomaly detection within AI systems, particularly showcased at the upcoming CVPR Visual Anomaly and Novelty Detection (VAND) workshop. Anomalib integrates advanced algorithms and tools to facilitate both academic research and practical applications in sectors like manufacturing, healthcare, and security. It features capabilities such as experiment tracking, model optimization, and scalable deployment solutions. Additionally, the discussion will include Anomalib’s participation in the VAND challenge, focusing on robust real-world applications and few-shot learning for anomaly detection.

About the Speaker

Samet Akcay, an AI research engineer and a tech lead, specializes in semi/self-supervised, zero/few-shot anomaly detection, and multi-modality. He is recently known for his open-source contributions to the ML/DL community. He is the lead author of anomalib, a major open-source anomaly detection library. He also maintains the OpenVINO Training Extensions, a low-code transfer learning framework for building computer vision models.

Learning Robot Perception and Control using Vision with Action

To achieve general utility, robots must continue to learn in unstructured environments. In this talk, I describe how our mobile manipulation robot uses vision with action to 1) learn visual control, 2) annotate its own training data, and 3) learn to estimate depth for new objects and the environment. Using these techniques, I describe how I led a small group to win consecutive robot competitions against teams from Stanford, MIT, and other Universities.

About the Speaker

Brent Griffin is the Perception Lead at Agility Robotics and was previously an assistant research scientist at the University of Michigan conducting research at the intersection of computer vision, control, and robot learning. He is lead author on publications in all of the top IEEE conferences for computer vision, robotics, and control, and his work has been featured in Popular Science, in IEEE Spectrum, and on the Big Ten Network.

Anomaly Detection with Anomalib and FiftyOne

Most anomaly detection techniques are unsupervised, meaning that anomaly detection models are trained on unlabeled non-anomalous data. Developing the highest-quality dataset and data pipeline is essential to training robust anomaly detection models.

In this brief walkthrough, I will illustrate how to leverage open-source FiftyOne and Anomalib to build deployment-ready anomaly detection models. First, we will load and visualize the MVTec AD dataset in the FiftyOne App. Next, we will use Albumentations to test out augmentation techniques. We will then train an anomaly detection model with Anomalib and evaluate the model with FiftyOne.

About the Speaker

Jacob Marks is a Senior Machine Learning Engineer and Researcher at Voxel51, where he leads open source efforts in vector search, semantic search, and generative AI for the FiftyOne data-centric AI toolkit. Prior to joining Voxel51, Jacob worked at Google X, Samsung Research, and Wolfram Research.

May 8 - AI, Machine Learning and Computer Vision Meetup

May 8 - AI, Machine Learning and Computer Vision Meetup 2024-05-08 · 17:00

When May 8, 2024 – 10:00 AM Pacific / 1:00 PM Eastern

Where Virtual / Zoom: https://voxel51.com/computer-vision-events/may-8-2024-ai-machine-learning-data-science-meetup/

To Infer or To Defer: Hazy Oracles in Human+AI Collaboration

This talk explores the evolving dynamics of human+AI collaboration, focusing on the concept of the human as a “hazy oracle” rather than an infallible source. It outlines the journey of integrating AI systems more deeply into practical applications through human+AI cooperation, discussing the potential value and challenges. The discussion includes the modeling of interaction errors and the strategic choices between immediate AI inference or seeking additional human input, supported by results from a user study on optimizing these collaborations.

About the Speaker

Jason Corso is a Professor of Robotics, Electrical Engineering, and Computer Science at the University of Michigan, and Co-Founder / Chief Scientist at AI startup Voxel51. His research spans computer vision, robotics, and AI, with over 150 peer-reviewed publications.

From Research to Industry: Bridging Real-World Applications with Anomalib at the CVPR VAND Challenge

This talk highlights the role of Anomalib, an open-source deep learning framework, in advancing anomaly detection within AI systems, particularly showcased at the upcoming CVPR Visual Anomaly and Novelty Detection (VAND) workshop. Anomalib integrates advanced algorithms and tools to facilitate both academic research and practical applications in sectors like manufacturing, healthcare, and security. It features capabilities such as experiment tracking, model optimization, and scalable deployment solutions. Additionally, the discussion will include Anomalib’s participation in the VAND challenge, focusing on robust real-world applications and few-shot learning for anomaly detection.

About the Speaker

Samet Akcay, an AI research engineer and a tech lead, specializes in semi/self-supervised, zero/few-shot anomaly detection, and multi-modality. He is recently known for his open-source contributions to the ML/DL community. He is the lead author of anomalib, a major open-source anomaly detection library. He also maintains the OpenVINO Training Extensions, a low-code transfer learning framework for building computer vision models.

Learning Robot Perception and Control using Vision with Action

To achieve general utility, robots must continue to learn in unstructured environments. In this talk, I describe how our mobile manipulation robot uses vision with action to 1) learn visual control, 2) annotate its own training data, and 3) learn to estimate depth for new objects and the environment. Using these techniques, I describe how I led a small group to win consecutive robot competitions against teams from Stanford, MIT, and other Universities.

About the Speaker

Brent Griffin is the Perception Lead at Agility Robotics and was previously an assistant research scientist at the University of Michigan conducting research at the intersection of computer vision, control, and robot learning. He is lead author on publications in all of the top IEEE conferences for computer vision, robotics, and control, and his work has been featured in Popular Science, in IEEE Spectrum, and on the Big Ten Network.

Anomaly Detection with Anomalib and FiftyOne

Most anomaly detection techniques are unsupervised, meaning that anomaly detection models are trained on unlabeled non-anomalous data. Developing the highest-quality dataset and data pipeline is essential to training robust anomaly detection models.

In this brief walkthrough, I will illustrate how to leverage open-source FiftyOne and Anomalib to build deployment-ready anomaly detection models. First, we will load and visualize the MVTec AD dataset in the FiftyOne App. Next, we will use Albumentations to test out augmentation techniques. We will then train an anomaly detection model with Anomalib and evaluate the model with FiftyOne.

About the Speaker

Jacob Marks is a Senior Machine Learning Engineer and Researcher at Voxel51, where he leads open source efforts in vector search, semantic search, and generative AI for the FiftyOne data-centric AI toolkit. Prior to joining Voxel51, Jacob worked at Google X, Samsung Research, and Wolfram Research.

May 8 - AI, Machine Learning and Computer Vision Meetup

May 8 - AI, Machine Learning and Computer Vision Meetup 2024-05-08 · 17:00

When May 8, 2024 – 10:00 AM Pacific / 1:00 PM Eastern

Where Virtual / Zoom: https://voxel51.com/computer-vision-events/may-8-2024-ai-machine-learning-data-science-meetup/

To Infer or To Defer: Hazy Oracles in Human+AI Collaboration

This talk explores the evolving dynamics of human+AI collaboration, focusing on the concept of the human as a “hazy oracle” rather than an infallible source. It outlines the journey of integrating AI systems more deeply into practical applications through human+AI cooperation, discussing the potential value and challenges. The discussion includes the modeling of interaction errors and the strategic choices between immediate AI inference or seeking additional human input, supported by results from a user study on optimizing these collaborations.

About the Speaker

Jason Corso is a Professor of Robotics, Electrical Engineering, and Computer Science at the University of Michigan, and Co-Founder / Chief Scientist at AI startup Voxel51. His research spans computer vision, robotics, and AI, with over 150 peer-reviewed publications.

From Research to Industry: Bridging Real-World Applications with Anomalib at the CVPR VAND Challenge

This talk highlights the role of Anomalib, an open-source deep learning framework, in advancing anomaly detection within AI systems, particularly showcased at the upcoming CVPR Visual Anomaly and Novelty Detection (VAND) workshop. Anomalib integrates advanced algorithms and tools to facilitate both academic research and practical applications in sectors like manufacturing, healthcare, and security. It features capabilities such as experiment tracking, model optimization, and scalable deployment solutions. Additionally, the discussion will include Anomalib’s participation in the VAND challenge, focusing on robust real-world applications and few-shot learning for anomaly detection.

About the Speaker

Samet Akcay, an AI research engineer and a tech lead, specializes in semi/self-supervised, zero/few-shot anomaly detection, and multi-modality. He is recently known for his open-source contributions to the ML/DL community. He is the lead author of anomalib, a major open-source anomaly detection library. He also maintains the OpenVINO Training Extensions, a low-code transfer learning framework for building computer vision models.

Learning Robot Perception and Control using Vision with Action

To achieve general utility, robots must continue to learn in unstructured environments. In this talk, I describe how our mobile manipulation robot uses vision with action to 1) learn visual control, 2) annotate its own training data, and 3) learn to estimate depth for new objects and the environment. Using these techniques, I describe how I led a small group to win consecutive robot competitions against teams from Stanford, MIT, and other Universities.

About the Speaker

Brent Griffin is the Perception Lead at Agility Robotics and was previously an assistant research scientist at the University of Michigan conducting research at the intersection of computer vision, control, and robot learning. He is lead author on publications in all of the top IEEE conferences for computer vision, robotics, and control, and his work has been featured in Popular Science, in IEEE Spectrum, and on the Big Ten Network.

Anomaly Detection with Anomalib and FiftyOne

Most anomaly detection techniques are unsupervised, meaning that anomaly detection models are trained on unlabeled non-anomalous data. Developing the highest-quality dataset and data pipeline is essential to training robust anomaly detection models.

In this brief walkthrough, I will illustrate how to leverage open-source FiftyOne and Anomalib to build deployment-ready anomaly detection models. First, we will load and visualize the MVTec AD dataset in the FiftyOne App. Next, we will use Albumentations to test out augmentation techniques. We will then train an anomaly detection model with Anomalib and evaluate the model with FiftyOne.

About the Speaker

Jacob Marks is a Senior Machine Learning Engineer and Researcher at Voxel51, where he leads open source efforts in vector search, semantic search, and generative AI for the FiftyOne data-centric AI toolkit. Prior to joining Voxel51, Jacob worked at Google X, Samsung Research, and Wolfram Research.

May 8 - AI, Machine Learning and Computer Vision Meetup

May 8 - AI, Machine Learning and Computer Vision Meetup 2024-05-08 · 17:00

When May 8, 2024 – 10:00 AM Pacific / 1:00 PM Eastern

Where Virtual / Zoom: https://voxel51.com/computer-vision-events/may-8-2024-ai-machine-learning-data-science-meetup/

To Infer or To Defer: Hazy Oracles in Human+AI Collaboration

This talk explores the evolving dynamics of human+AI collaboration, focusing on the concept of the human as a “hazy oracle” rather than an infallible source. It outlines the journey of integrating AI systems more deeply into practical applications through human+AI cooperation, discussing the potential value and challenges. The discussion includes the modeling of interaction errors and the strategic choices between immediate AI inference or seeking additional human input, supported by results from a user study on optimizing these collaborations.

About the Speaker

Jason Corso is a Professor of Robotics, Electrical Engineering, and Computer Science at the University of Michigan, and Co-Founder / Chief Scientist at AI startup Voxel51. His research spans computer vision, robotics, and AI, with over 150 peer-reviewed publications.

From Research to Industry: Bridging Real-World Applications with Anomalib at the CVPR VAND Challenge

This talk highlights the role of Anomalib, an open-source deep learning framework, in advancing anomaly detection within AI systems, particularly showcased at the upcoming CVPR Visual Anomaly and Novelty Detection (VAND) workshop. Anomalib integrates advanced algorithms and tools to facilitate both academic research and practical applications in sectors like manufacturing, healthcare, and security. It features capabilities such as experiment tracking, model optimization, and scalable deployment solutions. Additionally, the discussion will include Anomalib’s participation in the VAND challenge, focusing on robust real-world applications and few-shot learning for anomaly detection.

About the Speaker

Samet Akcay, an AI research engineer and a tech lead, specializes in semi/self-supervised, zero/few-shot anomaly detection, and multi-modality. He is recently known for his open-source contributions to the ML/DL community. He is the lead author of anomalib, a major open-source anomaly detection library. He also maintains the OpenVINO Training Extensions, a low-code transfer learning framework for building computer vision models.

Learning Robot Perception and Control using Vision with Action

To achieve general utility, robots must continue to learn in unstructured environments. In this talk, I describe how our mobile manipulation robot uses vision with action to 1) learn visual control, 2) annotate its own training data, and 3) learn to estimate depth for new objects and the environment. Using these techniques, I describe how I led a small group to win consecutive robot competitions against teams from Stanford, MIT, and other Universities.

About the Speaker

Brent Griffin is the Perception Lead at Agility Robotics and was previously an assistant research scientist at the University of Michigan conducting research at the intersection of computer vision, control, and robot learning. He is lead author on publications in all of the top IEEE conferences for computer vision, robotics, and control, and his work has been featured in Popular Science, in IEEE Spectrum, and on the Big Ten Network.

Anomaly Detection with Anomalib and FiftyOne

Most anomaly detection techniques are unsupervised, meaning that anomaly detection models are trained on unlabeled non-anomalous data. Developing the highest-quality dataset and data pipeline is essential to training robust anomaly detection models.

In this brief walkthrough, I will illustrate how to leverage open-source FiftyOne and Anomalib to build deployment-ready anomaly detection models. First, we will load and visualize the MVTec AD dataset in the FiftyOne App. Next, we will use Albumentations to test out augmentation techniques. We will then train an anomaly detection model with Anomalib and evaluate the model with FiftyOne.

About the Speaker

Jacob Marks is a Senior Machine Learning Engineer and Researcher at Voxel51, where he leads open source efforts in vector search, semantic search, and generative AI for the FiftyOne data-centric AI toolkit. Prior to joining Voxel51, Jacob worked at Google X, Samsung Research, and Wolfram Research.

May 8 - AI, Machine Learning and Computer Vision Meetup

London and Kraków Joint Meetup: Jacob Wang and Paweł Marks 2024-03-14 · 17:45

🎉 The London Scala User Group is having a joint meetup with Krakow! 🎉 Come along to the London Scala Talks! This month, we'll be hearing from Jacob Wang (London) and Paweł Marks (streamed from Krakow Scala User Group). Whether you're a complete beginner to creative coding or an experienced Scala tinkerer, there's plenty to enjoy. We look forward to seeing you! ———————————————————— If you can't make it in person, join the webinar at: https://3ds.zoom.us/j/81445941841?pwd=Ykp6YVVMWGdnRFNZUkovVEcza3FvZz09 Passcode: 719040 ———————————————————— *Agenda* 5:45pm - 🍻 Doors open. Come along and grab a drink! 6:15pm - 🗣️ Jacob Wang: Deep Dive into Context Propagation and Otel4s 7:00pm - 🍻 Intermission 7:15pm - 🗣️ Paweł Marks: Conjuring types from the Void 8:00pm - 🍕 Intermission: Join us for some free food and drinks! Vegan, vegetarian and gluten free options are provided. Let us know if you'd like something special - we'd be happy to accommodate. 9:00pm - 🍻 Join us in a pub to discuss the talks! ———————————————————— 🗣️ Jacob Wang:Deep Dive into Context Propagation and Otel4s In this talk, we will look at various ways in Scala where "context" can be passed around without using function parameters. We'll investigate how each of these work under the hood, and use this knowledge to make otel4s and java libraries work together. (otel4s is an OpenTelemetry implementation for Typelevel ecosystem) ———————————————————— ⭐ Jacob Wang ⭐ Jacob is a software developer at Medidata. He is passionate about better ways to solve problems with software, whether that’s functional programming, better tools or a cat on his lap. ———————————————————— 🗣️ Paweł Marks: Conjuring types from the Void We like static typing. It gives us a sense of safety, correct code, and those sweet, sweet suggestions in the IDE. But the world we live in is not a type-safe place. The code we write is just a tiny airship of compile-time verification floating over the void full of amorphous jsons, raw data frames, or SQL results full of nulls. I will be here to show you how to conjure some types from this void. This won't be another ORM, annotation processor, or other heavy codegen you know from other languages. I will show you the tools added to Scala 3 to deal with dynamic data. They are lightweight, codegen- and reflection-free, and most importantly, have great support in the IDEs. You will also see how to create a simple, intuitive, and beginner-friendly API that will use the powerful features of Scala under the hood. ———————————————————— ⭐ Paweł Marks ⭐ Paweł is involved in shaping the future of Scala 3 as a member of the Scala Improvement Process Committee and as a leader of the compiler team at VirtusLab in Krakow, Poland. Having been involved as a developer and architect in JVM tooling for many years, in 2020, he returned to Scala - his favorite language from college - and became part of the compiler team shortly after. He is now mainly responsible for the Scala release cycle and helping other compiler developers. ———————————————————— 🗣️ Would you like to present, but are not sure how to start? Give a talk with us and you'll receive mentorship from a trained toastmaster! Get in touch and we'll get you started:https://forms.gle/zv5i9eeto1BsnSwe8 🏡 Interested in hosting or supporting us? Please get in touch and we can discuss how you can get involved:https://forms.gle/3SX3Bm6zHqVodBaMA 📜 All London Scala User Group events operate under the Scala Community Code of Conduct: https://www.scala-lang.org/conduct/ We encourage each of you to report the breach of the conduct, either anonymously or by contacting one of our team members. We guarantee privacy and confidentiality, as well as that we will take your report seriously and react quickly. https://forms.gle/9PMMorUWgBnbk1mm6

London and Kraków Joint Meetup: Jacob Wang and Paweł Marks

Transforming Computer Vision with LLMs 2023-11-15 · 20:00

Large language models (LLMs) are revolutionizing the way we interact with computers and the world around us. However, in order to truly understand the world, LLM-powered agents need to be able to see. While vision-language models present a promising pathway to such multimodal understanding, it turns out that text-only LLMs can achieve remarkable success with prompting and tool use.

In this talk, Jacob Marks will give an overview of key LLM-centered projects that are transforming the field of computer vision, such as VisProg, ViperGPT, VoxelGPT, and HuggingGPT. He will also discuss his first-hand experience of building VoxelGPT, shedding light on the challenges and lessons learned, as well as a practitioner’s insights into domain-specific prompt engineering. He will conclude with his thoughts on the future of LLMs in computer vision.

This event is open to all and is especially relevant for researchers and practitioners interested in computer vision, generative AI, LLMs, and machine learning. RSVP now for an enlightening session!

Transforming Computer Vision with LLMs

Transforming Computer Vision with LLMs 2023-11-15 · 20:00

Large language models (LLMs) are revolutionizing the way we interact with computers and the world around us. However, in order to truly understand the world, LLM-powered agents need to be able to see. While vision-language models present a promising pathway to such multimodal understanding, it turns out that text-only LLMs can achieve remarkable success with prompting and tool use.

In this talk, Jacob Marks will give an overview of key LLM-centered projects that are transforming the field of computer vision, such as VisProg, ViperGPT, VoxelGPT, and HuggingGPT. He will also discuss his first-hand experience of building VoxelGPT, shedding light on the challenges and lessons learned, as well as a practitioner’s insights into domain-specific prompt engineering. He will conclude with his thoughts on the future of LLMs in computer vision.

This event is open to all and is especially relevant for researchers and practitioners interested in computer vision, generative AI, LLMs, and machine learning. RSVP now for an enlightening session!

Transforming Computer Vision with LLMs

Transforming Computer Vision with LLMs 2023-11-14 · 20:00

Large language models (LLMs) are revolutionizing the way we interact with computers and the world around us. However, in order to truly understand the world, LLM-powered agents need to be able to see. While vision-language models present a promising pathway to such multimodal understanding, it turns out that text-only LLMs can achieve remarkable success with prompting and tool use.

In this talk, Jacob Marks will give an overview of key LLM-centered projects that are transforming the field of computer vision, such as VisProg, ViperGPT, VoxelGPT, and HuggingGPT. He will also discuss his first-hand experience of building VoxelGPT, shedding light on the challenges and lessons learned, as well as a practitioner’s insights into domain-specific prompt engineering. He will conclude with his thoughts on the future of LLMs in computer vision.

This event is open to all and is especially relevant for researchers and practitioners interested in computer vision, generative AI, LLMs, and machine learning. RSVP now for an enlightening session!

Transforming Computer Vision with LLMs

Aug 2023 - FiftyOne Computer Vision Community Office Hours and AMA 2023-08-16 · 16:00

Join us for a 30 minute interactive session where we’ll demo what’s new in the open source FiftyOne computer vision toolset. We'll show you some handy tips and tricks, plus explore new and interesting integrations, datasets, and more.

We'll also include an Ask Me Anything section at the end. So, make sure to bring your most challenging FiftyOne questions!

Zoom Link: https://us02web.zoom.us/meeting/register/tZMtc-ivrjMiHdyXfN3K0SpAw69XQRZ0EdSD#/registration

Speakers this month include Jacob Marks and Allen Lee, both machine learning engineers at Voxel51.

This is an interactive community call - audio and video will be enabled for all participants! See you there!

Aug 2023 - FiftyOne Computer Vision Community Office Hours and AMA

Aug 2023 - FiftyOne Computer Vision Community Office Hours and AMA 2023-08-16 · 16:00

Join us for a 30 minute interactive session where we’ll demo what’s new in the open source FiftyOne computer vision toolset. We'll show you some handy tips and tricks, plus explore new and interesting integrations, datasets, and more.

We'll also include an Ask Me Anything section at the end. So, make sure to bring your most challenging FiftyOne questions!

Zoom Link: https://us02web.zoom.us/meeting/register/tZMtc-ivrjMiHdyXfN3K0SpAw69XQRZ0EdSD#/registration

Speakers this month include Jacob Marks and Allen Lee, both machine learning engineers at Voxel51.

This is an interactive community call - audio and video will be enabled for all participants! See you there!

Aug 2023 - FiftyOne Computer Vision Community Office Hours and AMA

Aug 2023 - FiftyOne Computer Vision Community Office Hours and AMA 2023-08-16 · 16:00

Join us for a 30 minute interactive session where we’ll demo what’s new in the open source FiftyOne computer vision toolset. We'll show you some handy tips and tricks, plus explore new and interesting integrations, datasets, and more.

We'll also include an Ask Me Anything section at the end. So, make sure to bring your most challenging FiftyOne questions!

Zoom Link: https://us02web.zoom.us/meeting/register/tZMtc-ivrjMiHdyXfN3K0SpAw69XQRZ0EdSD#/registration

Speakers this month include Jacob Marks and Allen Lee, both machine learning engineers at Voxel51.

This is an interactive community call - audio and video will be enabled for all participants! See you there!

Aug 2023 - FiftyOne Computer Vision Community Office Hours and AMA

Aug 2023 - FiftyOne Computer Vision Community Office Hours and AMA 2023-08-16 · 16:00

Join us for a 30 minute interactive session where we’ll demo what’s new in the open source FiftyOne computer vision toolset. We'll show you some handy tips and tricks, plus explore new and interesting integrations, datasets, and more.

We'll also include an Ask Me Anything section at the end. So, make sure to bring your most challenging FiftyOne questions!

Zoom Link: https://us02web.zoom.us/meeting/register/tZMtc-ivrjMiHdyXfN3K0SpAw69XQRZ0EdSD#/registration

Speakers this month include Jacob Marks and Allen Lee, both machine learning engineers at Voxel51.

This is an interactive community call - audio and video will be enabled for all participants! See you there!

Aug 2023 - FiftyOne Computer Vision Community Office Hours and AMA

July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00

Zoom Link

https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/

Unleashing the Potential of Visual Data: Vector Databases in Computer Vision

Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.

Speaker

Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.

Computer Vision Applications at Scale with Vector Databases

Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare.

Speaker

Zain Hasan is a senior developer advocate at Weaviate, an open source vector database.

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process!

Speaker

Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine.

Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale

Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics!

Speaker

Jai Chopra is Head of Product at LanceDB

How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant

Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data.

Speaker

Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.

July 2023 Computer Vision Meetup (Virtual - EU and Americas)

July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00

Zoom Link

https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/

Unleashing the Potential of Visual Data: Vector Databases in Computer Vision

Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.

Speaker

Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.

Computer Vision Applications at Scale with Vector Databases

Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare.

Speaker

Zain Hasan is a senior developer advocate at Weaviate, an open source vector database.

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process!

Speaker

Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine.

Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale

Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics!

Speaker

Jai Chopra is Head of Product at LanceDB

How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant

Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data..

Speaker

Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.

July 2023 Computer Vision Meetup (Virtual - EU and Americas)

July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00

Zoom Link

https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/

Unleashing the Potential of Visual Data: Vector Databases in Computer Vision

Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.

Speaker

Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.

Computer Vision Applications at Scale with Vector Databases

Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare.

Speaker

Zain Hasan is a senior developer advocate at Weaviate, an open source vector database.

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process!

Speaker

Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine.

Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale

Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics!

Speaker

Jai Chopra is Head of Product at LanceDB

How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant

Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data.

Speaker

Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.

July 2023 Computer Vision Meetup (Virtual - EU and Americas)

July 2023 Computer Vision Meetup (Virtual - EU and Americas) 2023-07-13 · 17:00

Zoom Link

https://voxel51.com/computer-vision-events/july-2023-computer-vision-meetup/

Unleashing the Potential of Visual Data: Vector Databases in Computer Vision

Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.

Speaker

Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.

Computer Vision Applications at Scale with Vector Databases

Vector Databases enable semantic search at scale over hundreds of millions of unstructured data objects. In this talk I will introduce how you can use multi-modal encoders with the Weaviate vector database to semantically search over images and text. This will include demos across multiple domains including e-commerce and healthcare.

Speaker

Zain Hasan is a senior developer advocate at Weaviate, an open source vector database.

Reverse Image Search for Ecommerce Without Going Crazy

Traditional full-text-based search engines have been on the market for a while and we are all currently trying to extend them with semantic search. Still, it might be more beneficial for some ecommerce businesses to introduce reverse image search capabilities instead of relying on text only. However, both semantic search and reverse image may and should coexist! You may encounter common pitfalls while implementing both, so why don't we discuss the best practices? Let's learn how to extend your existing search system with reverse image search, without getting lost in the process!

Speaker

Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine.

Fast and Flexible Data Discovery & Mining for Computer Vision at Petabyte Scale

Improving model performance requires methods to discover computer vision data, sometimes from large repositories, whether its similar examples to errors previously seen, new examples/scenarios or more advanced techniques such as active learning and RLHF. LanceDB makes this fast and flexible for multi-modal data, with support for vector search, SQL, Pandas, Polars, Arrow and a growing ecosystem of tools that you're familiar with. We'll walk through some common search examples and show how you can find needles in a haystack to improve your metrics!

Speaker

Jai Chopra is Head of Product at LanceDB

How-To Build Scalable Image and Text Search for Computer Vision Data using Pinecone and Qdrant

Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images? In this talk Jacob will show you step-by-step how to integrate all the technology required to enable search for similar images, search with natural language, plus scaling the searches with Pinecone and Qdrant. He’ll dive-deep into the tech and show you a variety of practical examples that can help transform the way you manage your image data..

Speaker

Jacob Marks is a Machine Learning Engineer and Developer Evangelist at Voxel51.

July 2023 Computer Vision Meetup (Virtual - EU and Americas)

talk-data.com

People (49 results)

Activities & events