talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (16 results)

See all 16 →
Showing 13 results

Activities & events

Title & Speakers Event

Register for the Zoom

Exploring DeepSeek’s Janus-Pro Visual Question Answer (VQA) Capabilities

DeepSeek‘s Janus-Pro is an advanced multimodal model designed for both multimodal understanding and visual generation, with a particular emphasis on improvements in understanding tasks. The model’s architecture is built upon the concept of decoupled visual encoding, which allows it to handle the differing representation needs of these two types of tasks more effectively.

In this talk, we’ll explore Janus-Pro’s Visual Question Answer (VQA) capabilities using FiftyOne’s Janus-Pro VQA Plugin.

The plugin provides a seamless interface to Janus Pro’s visual question understanding capabilities within FiftyOne, offering:

  • Vision-language tasks
  • Hardware acceleration (CUDA/MPS) when available
  • Dynamic version selection from HuggingFace
  • Full integration with FiftyOne’s Dataset and UI

Can’t wait to see it for yourself? Check out the FiftyOne Quickstart with Janus-Pro.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Getting the Most Out of FiftyOne Open-Source for Gen AI Workflows

In this talk we’ll explore how we maximize the potential of the FiftyOne open source SDK and App to efficiently store and annotate training data critical to Finegrain‘s Generative AI workflows. We will provide an overview of our cloud-based storage and hosting architecture, showcase how we leverage FiftyOne for training and applying models for semi-automatic data annotation, and demonstrate how we extend the CVAT integration to enable pixel-perfect side-by-side evaluation of our Generative AI models.

About the Speaker

Maxime Brénon is a machine learning and data engineer. An Xoogler he started his machine learning journey at Moodstocks when AlexNet was all the rage.

BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

Measuring biodiversity is crucial for understanding global ecosystem health, especially in the face of anthropogenic environmental changes. Rates of data collection are ever increasing, but access to expert human annotation is limited, making this an ideal use-case for machine learning solutions. The newly released BIOSCAN-5M dataset features five million specimens from 47 countries around the world, with paired high-resolution images and DNA barcodes for every sample.

The dataset’s hierarchical taxonomic labels, geographic data, and long-tail distribution of rare species offer valuable resources for ecological research and AI model training. The dataset enables large-scale multimodal modelling for insect biodiversity, and poses challenging machine learning problems for fine-grained classification both for recognising known species of insects (closed-world), and handling novel species (open-world). BIOSCAN-5M represents a significant advancement in biodiversity informatics, facilitated by the International Barcode of Life and the BIOSCAN project, and is publicly available for download via Hugging Face and PyPI.

About the Speaker

Scott C. Lowe is a British machine learning researcher based at the Vector Institute in Toronto, Canada. His work is multidisciplinary, spanning several topics. Recently he has focused on biodiversity monitoring applications for both insects (BIOSCAN) and ocean habitats (BenthicNet), self-supervised learning, reasoning capabilities of LLMs, and symbolic music generation. Previously, he completed his PhD in Neuroinformatics from the University of Edinburgh.

Fine Tuning Moondream2

Stay tuned for the talk abstract!

About the Speaker

Parsa Khazaeepoul is the Head of Developer Relations at Moondream AI, where he focuses on making computer vision more accessible. A Summa Cum Laude graduate of the University of Washington’s Informatics program, Parsa also spearheaded developer relations at the AI2 Incubator and co-founded Turing Minds, a renowned speaker series featuring Turing Award winners and other leading figures in computer science. His work has impacted thousands through projects like CourseFinder and uwRMP, and he’s a recognized innovator in the Seattle tech scene, named to the Seattle Inno Under 25 Class of 2024.

Feb 20 - Virtual AI, ML and Computer Vision Meetup

Register for the Zoom

Exploring DeepSeek’s Janus-Pro Visual Question Answer (VQA) Capabilities

DeepSeek‘s Janus-Pro is an advanced multimodal model designed for both multimodal understanding and visual generation, with a particular emphasis on improvements in understanding tasks. The model’s architecture is built upon the concept of decoupled visual encoding, which allows it to handle the differing representation needs of these two types of tasks more effectively.

In this talk, we’ll explore Janus-Pro’s Visual Question Answer (VQA) capabilities using FiftyOne’s Janus-Pro VQA Plugin.

The plugin provides a seamless interface to Janus Pro’s visual question understanding capabilities within FiftyOne, offering:

  • Vision-language tasks
  • Hardware acceleration (CUDA/MPS) when available
  • Dynamic version selection from HuggingFace
  • Full integration with FiftyOne’s Dataset and UI

Can’t wait to see it for yourself? Check out the FiftyOne Quickstart with Janus-Pro.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Getting the Most Out of FiftyOne Open-Source for Gen AI Workflows

In this talk we’ll explore how we maximize the potential of the FiftyOne open source SDK and App to efficiently store and annotate training data critical to Finegrain‘s Generative AI workflows. We will provide an overview of our cloud-based storage and hosting architecture, showcase how we leverage FiftyOne for training and applying models for semi-automatic data annotation, and demonstrate how we extend the CVAT integration to enable pixel-perfect side-by-side evaluation of our Generative AI models.

About the Speaker

Maxime Brénon is a machine learning and data engineer. An Xoogler he started his machine learning journey at Moodstocks when AlexNet was all the rage.

BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

Measuring biodiversity is crucial for understanding global ecosystem health, especially in the face of anthropogenic environmental changes. Rates of data collection are ever increasing, but access to expert human annotation is limited, making this an ideal use-case for machine learning solutions. The newly released BIOSCAN-5M dataset features five million specimens from 47 countries around the world, with paired high-resolution images and DNA barcodes for every sample.

The dataset’s hierarchical taxonomic labels, geographic data, and long-tail distribution of rare species offer valuable resources for ecological research and AI model training. The dataset enables large-scale multimodal modelling for insect biodiversity, and poses challenging machine learning problems for fine-grained classification both for recognising known species of insects (closed-world), and handling novel species (open-world). BIOSCAN-5M represents a significant advancement in biodiversity informatics, facilitated by the International Barcode of Life and the BIOSCAN project, and is publicly available for download via Hugging Face and PyPI.

About the Speaker

Scott C. Lowe is a British machine learning researcher based at the Vector Institute in Toronto, Canada. His work is multidisciplinary, spanning several topics. Recently he has focused on biodiversity monitoring applications for both insects (BIOSCAN) and ocean habitats (BenthicNet), self-supervised learning, reasoning capabilities of LLMs, and symbolic music generation. Previously, he completed his PhD in Neuroinformatics from the University of Edinburgh.

Fine Tuning Moondream2

Stay tuned for the talk abstract!

About the Speaker

Parsa Khazaeepoul is the Head of Developer Relations at Moondream AI, where he focuses on making computer vision more accessible. A Summa Cum Laude graduate of the University of Washington’s Informatics program, Parsa also spearheaded developer relations at the AI2 Incubator and co-founded Turing Minds, a renowned speaker series featuring Turing Award winners and other leading figures in computer science. His work has impacted thousands through projects like CourseFinder and uwRMP, and he’s a recognized innovator in the Seattle tech scene, named to the Seattle Inno Under 25 Class of 2024.

Feb 20 - Virtual AI, ML and Computer Vision Meetup

Register for the Zoom

Exploring DeepSeek’s Janus-Pro Visual Question Answer (VQA) Capabilities

DeepSeek‘s Janus-Pro is an advanced multimodal model designed for both multimodal understanding and visual generation, with a particular emphasis on improvements in understanding tasks. The model’s architecture is built upon the concept of decoupled visual encoding, which allows it to handle the differing representation needs of these two types of tasks more effectively.

In this talk, we’ll explore Janus-Pro’s Visual Question Answer (VQA) capabilities using FiftyOne’s Janus-Pro VQA Plugin.

The plugin provides a seamless interface to Janus Pro’s visual question understanding capabilities within FiftyOne, offering:

  • Vision-language tasks
  • Hardware acceleration (CUDA/MPS) when available
  • Dynamic version selection from HuggingFace
  • Full integration with FiftyOne’s Dataset and UI

Can’t wait to see it for yourself? Check out the FiftyOne Quickstart with Janus-Pro.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Getting the Most Out of FiftyOne Open-Source for Gen AI Workflows

In this talk we’ll explore how we maximize the potential of the FiftyOne open source SDK and App to efficiently store and annotate training data critical to Finegrain‘s Generative AI workflows. We will provide an overview of our cloud-based storage and hosting architecture, showcase how we leverage FiftyOne for training and applying models for semi-automatic data annotation, and demonstrate how we extend the CVAT integration to enable pixel-perfect side-by-side evaluation of our Generative AI models.

About the Speaker

Maxime Brénon is a machine learning and data engineer. An Xoogler he started his machine learning journey at Moodstocks when AlexNet was all the rage.

BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

Measuring biodiversity is crucial for understanding global ecosystem health, especially in the face of anthropogenic environmental changes. Rates of data collection are ever increasing, but access to expert human annotation is limited, making this an ideal use-case for machine learning solutions. The newly released BIOSCAN-5M dataset features five million specimens from 47 countries around the world, with paired high-resolution images and DNA barcodes for every sample.

The dataset’s hierarchical taxonomic labels, geographic data, and long-tail distribution of rare species offer valuable resources for ecological research and AI model training. The dataset enables large-scale multimodal modelling for insect biodiversity, and poses challenging machine learning problems for fine-grained classification both for recognising known species of insects (closed-world), and handling novel species (open-world). BIOSCAN-5M represents a significant advancement in biodiversity informatics, facilitated by the International Barcode of Life and the BIOSCAN project, and is publicly available for download via Hugging Face and PyPI.

About the Speaker

Scott C. Lowe is a British machine learning researcher based at the Vector Institute in Toronto, Canada. His work is multidisciplinary, spanning several topics. Recently he has focused on biodiversity monitoring applications for both insects (BIOSCAN) and ocean habitats (BenthicNet), self-supervised learning, reasoning capabilities of LLMs, and symbolic music generation. Previously, he completed his PhD in Neuroinformatics from the University of Edinburgh.

Fine Tuning Moondream2

Stay tuned for the talk abstract!

About the Speaker

Parsa Khazaeepoul is the Head of Developer Relations at Moondream AI, where he focuses on making computer vision more accessible. A Summa Cum Laude graduate of the University of Washington’s Informatics program, Parsa also spearheaded developer relations at the AI2 Incubator and co-founded Turing Minds, a renowned speaker series featuring Turing Award winners and other leading figures in computer science. His work has impacted thousands through projects like CourseFinder and uwRMP, and he’s a recognized innovator in the Seattle tech scene, named to the Seattle Inno Under 25 Class of 2024.

Feb 20 - Virtual AI, ML and Computer Vision Meetup
James Le – Head of Developer Experience @ Twelve Labs

The evolution of video understanding has followed a similar trajectory to language and image understanding - with the rise of large pre-trained foundation models trained on a huge amount of data. Given the surge of multimodal research lately, video foundation models are becoming even more powerful to decipher the rich visual information embedded in videos. This talk will explore diverse use cases of video understanding and provide a glimpse of Twelve Labs offerings.

Chenliang Xu – Associate Professor @ University of Rochester

Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, previous bias identification methods overly rely on human experts to conjecture potential biases, which may neglect other underlying biases not realized by humans. Is there an automatic way to assist human experts in finding biases in a broad domain of image classifiers? In this talk, I will introduce solutions.

AI/ML
Luka Posilović – Head of Machine Learning @ Kitro

1/3 of all food gets wasted, with millions of tons of food being thrown away each day. Food does not mean the same thing everywhere in the world, there are thousands of different meals across the world, therefore a lot of different classes to distinguish between. In this talk we’ll see through challenges of food-waste classification and see how foundation models can be useful to this task. We will also explore how we use FiftyOne to test models during development.

AI/ML
Safwan Wshah – Associate Professor @ University of Vermont

Localizing images and objects from visual information stands out as one of the most challenging and dynamic topics in computer vision, owing to its broad applications across different domains. In this talk, we will introduce and delve into several research directions aimed at advancing solutions to these complex problems.

Chenliang Xu – Associate Professor @ University of Rochester

Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, previous bias identification methods overly rely on human experts to conjecture potential biases, which may neglect other underlying biases not realized by humans. Is there an automatic way to assist human experts in finding biases in a broad domain of image classifiers? In this talk, I will introduce solutions.

AI/ML
Safwan Wshah – Associate Professor @ University of Vermont

Localizing images and objects from visual information stands out as one of the most challenging and dynamic topics in computer vision, owing to its broad applications across different domains. In this talk, we will introduce and delve into several research directions aimed at advancing solutions to these complex problems.

Luka Posilović – Head of Machine Learning @ Kitro

1/3 of all food gets wasted, with millions of tons of food being thrown away each day. Food does not mean the same thing everywhere in the world, there are thousands of different meals across the world, therefore a lot of different classes to distinguish between. In this talk we’ll see through challenges of food-waste classification and see how foundation models can be useful to this task. We will also explore how we use FiftyOne to test models during development.

AI/ML

When Feb 15, 2024 – 10:00 AM Pacific

Where Virtual / Zoom - https://voxel51.com/computer-vision-events/feb-2024-ai-machine-learning-data-science-meetup/

Agenda

Lightning Talk: The Next Generation of Video Understanding with Twelve Labs

The evolution of video understanding has followed a similar trajectory to language and image understanding - with the rise of large pre-trained foundation models trained on a huge amount of data. Given the surge of multimodal research lately, video foundation models are becoming even more powerful to decipher the rich visual information embedded in videos. This talk will explore diverse use cases of video understanding and provide a glimpse of Twelve Labs offerings.

Speaker: James Le is the Head of Developer Experience at Twelve Labs, a startup building multimodal foundation models for video understanding.

Towards Fair Computer Vision: Discover the Hidden Biases of an Image Classifier

Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, previous bias identification methods overly rely on human experts to conjecture potential biases, which may neglect other underlying biases not realized by humans. Is there an automatic way to assist human experts in finding biases in a broad domain of image classifiers? In this talk, I will introduce solutions.

Speaker: Chenliang Xu is an Associate Professor in the Department of Computer Science at the University of Rochester. His research originates in computer vision and tackles interdisciplinary topics, including video understanding, audio-visual learning, vision and language, and methods for trustworthy AI. He has authored over 90 peer-reviewed papers in computer vision, machine learning, multimedia, and AI venues.

Food Waste Classification with AI

1/3 of all food gets wasted, with millions of tons of food being thrown away each day. Food does not mean the same thing everywhere in the world, there are thousands of different meals across the world, therefore a lot of different classes to distinguish between. In this talk we’ll see through challenges of food-waste classification and see how foundation models can be useful to this task. We will also explore how we use FiftyOne to test models during development.

Speaker: Luka Posilović is a computer scientist with a PhD from FER, Zagreb, Croatia, working as a Head of machine learning in Kitro. Him and the team are trying to reduce the global food waste problem by using AI.

Objects and Image Geo-localization from Visual Data

Localizing images and objects from visual information stands out as one of the most challenging and dynamic topics in computer vision, owing to its broad applications across different domains. In this talk, we will introduce and delve into several research directions aimed at advancing solutions to these complex problems.

Speaker: Safwan Wshah is an Associate Professor in the Department of Computer Science at the University of Vermont. His research interests encompass the intersection of machine learning theory and application, with a particular emphasis on geo-localization from visual information. Additionally, he maintains broader interests in deep learning, computer vision, data analytics, and image processing.

Don’t Forget

  • Voxel51 will make a donation on behalf of the Meetup members to the charity that gets the most votes this month.
  • Can’t make the date and time? No problem! Just make sure to register here so we can send you links to the playbacks.
Feb 2024 – AI, Machine Learning & Data Science Meetup

When Feb 15, 2024 – 10:00 AM Pacific

Where Virtual / Zoom - https://voxel51.com/computer-vision-events/feb-2024-ai-machine-learning-data-science-meetup/

Agenda

Lightning Talk: The Next Generation of Video Understanding with Twelve Labs

The evolution of video understanding has followed a similar trajectory to language and image understanding - with the rise of large pre-trained foundation models trained on a huge amount of data. Given the surge of multimodal research lately, video foundation models are becoming even more powerful to decipher the rich visual information embedded in videos. This talk will explore diverse use cases of video understanding and provide a glimpse of Twelve Labs offerings.

Speaker: James Le is the Head of Developer Experience at Twelve Labs, a startup building multimodal foundation models for video understanding.

Towards Fair Computer Vision: Discover the Hidden Biases of an Image Classifier

Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, previous bias identification methods overly rely on human experts to conjecture potential biases, which may neglect other underlying biases not realized by humans. Is there an automatic way to assist human experts in finding biases in a broad domain of image classifiers? In this talk, I will introduce solutions.

Speaker: Chenliang Xu is an Associate Professor in the Department of Computer Science at the University of Rochester. His research originates in computer vision and tackles interdisciplinary topics, including video understanding, audio-visual learning, vision and language, and methods for trustworthy AI. He has authored over 90 peer-reviewed papers in computer vision, machine learning, multimedia, and AI venues.

Food Waste Classification with AI

1/3 of all food gets wasted, with millions of tons of food being thrown away each day. Food does not mean the same thing everywhere in the world, there are thousands of different meals across the world, therefore a lot of different classes to distinguish between. In this talk we’ll see through challenges of food-waste classification and see how foundation models can be useful to this task. We will also explore how we use FiftyOne to test models during development.

Speaker: Luka Posilović is a computer scientist with a PhD from FER, Zagreb, Croatia, working as a Head of machine learning in Kitro. Him and the team are trying to reduce the global food waste problem by using AI.

Objects and Image Geo-localization from Visual Data

Localizing images and objects from visual information stands out as one of the most challenging and dynamic topics in computer vision, owing to its broad applications across different domains. In this talk, we will introduce and delve into several research directions aimed at advancing solutions to these complex problems.

Speaker: Safwan Wshah is an Associate Professor in the Department of Computer Science at the University of Vermont. His research interests encompass the intersection of machine learning theory and application, with a particular emphasis on geo-localization from visual information. Additionally, he maintains broader interests in deep learning, computer vision, data analytics, and image processing.

Don’t Forget

  • Voxel51 will make a donation on behalf of the Meetup members to the charity that gets the most votes this month.
  • Can’t make the date and time? No problem! Just make sure to register here so we can send you links to the playbacks.
Feb 2024 – AI, Machine Learning & Data Science Meetup
James Le – Head of Developer Experience @ Twelve Labs

The evolution of video understanding has followed a similar trajectory to language and image understanding - with the rise of large pre-trained foundation models trained on a huge amount of data. Given the surge of multimodal research lately, video foundation models are becoming even more powerful to decipher the rich visual information embedded in videos. This talk will explore diverse use cases of video understanding and provide a glimpse of Twelve Labs offerings.

Feb 2024 – AI, Machine Learning & Data Science Meetup
Showing 13 results