talk-data.com
People (16 results)
See all 16 →Tahir Fayyaz
Product Manager at Databricks · / Google Cloud Platform Team specialising in Data & Machine Learning, BigQuery expert
Rob Zinkov
machine learning engineer and data scientist
Keetaek Park
Technical Trainer, Data and Machine Learning · Google Cloud
Activities & events
| Title & Speakers | Event |
|---|---|
|
Feb 20 - Virtual AI, ML and Computer Vision Meetup
2025-02-20 · 18:00
Exploring DeepSeek’s Janus-Pro Visual Question Answer (VQA) Capabilities DeepSeek‘s Janus-Pro is an advanced multimodal model designed for both multimodal understanding and visual generation, with a particular emphasis on improvements in understanding tasks. The model’s architecture is built upon the concept of decoupled visual encoding, which allows it to handle the differing representation needs of these two types of tasks more effectively. In this talk, we’ll explore Janus-Pro’s Visual Question Answer (VQA) capabilities using FiftyOne’s Janus-Pro VQA Plugin. The plugin provides a seamless interface to Janus Pro’s visual question understanding capabilities within FiftyOne, offering:
Can’t wait to see it for yourself? Check out the FiftyOne Quickstart with Janus-Pro. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. Getting the Most Out of FiftyOne Open-Source for Gen AI Workflows In this talk we’ll explore how we maximize the potential of the FiftyOne open source SDK and App to efficiently store and annotate training data critical to Finegrain‘s Generative AI workflows. We will provide an overview of our cloud-based storage and hosting architecture, showcase how we leverage FiftyOne for training and applying models for semi-automatic data annotation, and demonstrate how we extend the CVAT integration to enable pixel-perfect side-by-side evaluation of our Generative AI models. About the Speaker Maxime Brénon is a machine learning and data engineer. An Xoogler he started his machine learning journey at Moodstocks when AlexNet was all the rage. BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity Measuring biodiversity is crucial for understanding global ecosystem health, especially in the face of anthropogenic environmental changes. Rates of data collection are ever increasing, but access to expert human annotation is limited, making this an ideal use-case for machine learning solutions. The newly released BIOSCAN-5M dataset features five million specimens from 47 countries around the world, with paired high-resolution images and DNA barcodes for every sample. The dataset’s hierarchical taxonomic labels, geographic data, and long-tail distribution of rare species offer valuable resources for ecological research and AI model training. The dataset enables large-scale multimodal modelling for insect biodiversity, and poses challenging machine learning problems for fine-grained classification both for recognising known species of insects (closed-world), and handling novel species (open-world). BIOSCAN-5M represents a significant advancement in biodiversity informatics, facilitated by the International Barcode of Life and the BIOSCAN project, and is publicly available for download via Hugging Face and PyPI. About the Speaker Scott C. Lowe is a British machine learning researcher based at the Vector Institute in Toronto, Canada. His work is multidisciplinary, spanning several topics. Recently he has focused on biodiversity monitoring applications for both insects (BIOSCAN) and ocean habitats (BenthicNet), self-supervised learning, reasoning capabilities of LLMs, and symbolic music generation. Previously, he completed his PhD in Neuroinformatics from the University of Edinburgh. Fine Tuning Moondream2 Stay tuned for the talk abstract! About the Speaker Parsa Khazaeepoul is the Head of Developer Relations at Moondream AI, where he focuses on making computer vision more accessible. A Summa Cum Laude graduate of the University of Washington’s Informatics program, Parsa also spearheaded developer relations at the AI2 Incubator and co-founded Turing Minds, a renowned speaker series featuring Turing Award winners and other leading figures in computer science. His work has impacted thousands through projects like CourseFinder and uwRMP, and he’s a recognized innovator in the Seattle tech scene, named to the Seattle Inno Under 25 Class of 2024. |
Feb 20 - Virtual AI, ML and Computer Vision Meetup
|
|
Feb 20 - Virtual AI, ML and Computer Vision Meetup
2025-02-20 · 18:00
Exploring DeepSeek’s Janus-Pro Visual Question Answer (VQA) Capabilities DeepSeek‘s Janus-Pro is an advanced multimodal model designed for both multimodal understanding and visual generation, with a particular emphasis on improvements in understanding tasks. The model’s architecture is built upon the concept of decoupled visual encoding, which allows it to handle the differing representation needs of these two types of tasks more effectively. In this talk, we’ll explore Janus-Pro’s Visual Question Answer (VQA) capabilities using FiftyOne’s Janus-Pro VQA Plugin. The plugin provides a seamless interface to Janus Pro’s visual question understanding capabilities within FiftyOne, offering:
Can’t wait to see it for yourself? Check out the FiftyOne Quickstart with Janus-Pro. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. Getting the Most Out of FiftyOne Open-Source for Gen AI Workflows In this talk we’ll explore how we maximize the potential of the FiftyOne open source SDK and App to efficiently store and annotate training data critical to Finegrain‘s Generative AI workflows. We will provide an overview of our cloud-based storage and hosting architecture, showcase how we leverage FiftyOne for training and applying models for semi-automatic data annotation, and demonstrate how we extend the CVAT integration to enable pixel-perfect side-by-side evaluation of our Generative AI models. About the Speaker Maxime Brénon is a machine learning and data engineer. An Xoogler he started his machine learning journey at Moodstocks when AlexNet was all the rage. BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity Measuring biodiversity is crucial for understanding global ecosystem health, especially in the face of anthropogenic environmental changes. Rates of data collection are ever increasing, but access to expert human annotation is limited, making this an ideal use-case for machine learning solutions. The newly released BIOSCAN-5M dataset features five million specimens from 47 countries around the world, with paired high-resolution images and DNA barcodes for every sample. The dataset’s hierarchical taxonomic labels, geographic data, and long-tail distribution of rare species offer valuable resources for ecological research and AI model training. The dataset enables large-scale multimodal modelling for insect biodiversity, and poses challenging machine learning problems for fine-grained classification both for recognising known species of insects (closed-world), and handling novel species (open-world). BIOSCAN-5M represents a significant advancement in biodiversity informatics, facilitated by the International Barcode of Life and the BIOSCAN project, and is publicly available for download via Hugging Face and PyPI. About the Speaker Scott C. Lowe is a British machine learning researcher based at the Vector Institute in Toronto, Canada. His work is multidisciplinary, spanning several topics. Recently he has focused on biodiversity monitoring applications for both insects (BIOSCAN) and ocean habitats (BenthicNet), self-supervised learning, reasoning capabilities of LLMs, and symbolic music generation. Previously, he completed his PhD in Neuroinformatics from the University of Edinburgh. Fine Tuning Moondream2 Stay tuned for the talk abstract! About the Speaker Parsa Khazaeepoul is the Head of Developer Relations at Moondream AI, where he focuses on making computer vision more accessible. A Summa Cum Laude graduate of the University of Washington’s Informatics program, Parsa also spearheaded developer relations at the AI2 Incubator and co-founded Turing Minds, a renowned speaker series featuring Turing Award winners and other leading figures in computer science. His work has impacted thousands through projects like CourseFinder and uwRMP, and he’s a recognized innovator in the Seattle tech scene, named to the Seattle Inno Under 25 Class of 2024. |
Feb 20 - Virtual AI, ML and Computer Vision Meetup
|
|
Feb 20 - Virtual AI, ML and Computer Vision Meetup
2025-02-20 · 18:00
Exploring DeepSeek’s Janus-Pro Visual Question Answer (VQA) Capabilities DeepSeek‘s Janus-Pro is an advanced multimodal model designed for both multimodal understanding and visual generation, with a particular emphasis on improvements in understanding tasks. The model’s architecture is built upon the concept of decoupled visual encoding, which allows it to handle the differing representation needs of these two types of tasks more effectively. In this talk, we’ll explore Janus-Pro’s Visual Question Answer (VQA) capabilities using FiftyOne’s Janus-Pro VQA Plugin. The plugin provides a seamless interface to Janus Pro’s visual question understanding capabilities within FiftyOne, offering:
Can’t wait to see it for yourself? Check out the FiftyOne Quickstart with Janus-Pro. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. Getting the Most Out of FiftyOne Open-Source for Gen AI Workflows In this talk we’ll explore how we maximize the potential of the FiftyOne open source SDK and App to efficiently store and annotate training data critical to Finegrain‘s Generative AI workflows. We will provide an overview of our cloud-based storage and hosting architecture, showcase how we leverage FiftyOne for training and applying models for semi-automatic data annotation, and demonstrate how we extend the CVAT integration to enable pixel-perfect side-by-side evaluation of our Generative AI models. About the Speaker Maxime Brénon is a machine learning and data engineer. An Xoogler he started his machine learning journey at Moodstocks when AlexNet was all the rage. BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity Measuring biodiversity is crucial for understanding global ecosystem health, especially in the face of anthropogenic environmental changes. Rates of data collection are ever increasing, but access to expert human annotation is limited, making this an ideal use-case for machine learning solutions. The newly released BIOSCAN-5M dataset features five million specimens from 47 countries around the world, with paired high-resolution images and DNA barcodes for every sample. The dataset’s hierarchical taxonomic labels, geographic data, and long-tail distribution of rare species offer valuable resources for ecological research and AI model training. The dataset enables large-scale multimodal modelling for insect biodiversity, and poses challenging machine learning problems for fine-grained classification both for recognising known species of insects (closed-world), and handling novel species (open-world). BIOSCAN-5M represents a significant advancement in biodiversity informatics, facilitated by the International Barcode of Life and the BIOSCAN project, and is publicly available for download via Hugging Face and PyPI. About the Speaker Scott C. Lowe is a British machine learning researcher based at the Vector Institute in Toronto, Canada. His work is multidisciplinary, spanning several topics. Recently he has focused on biodiversity monitoring applications for both insects (BIOSCAN) and ocean habitats (BenthicNet), self-supervised learning, reasoning capabilities of LLMs, and symbolic music generation. Previously, he completed his PhD in Neuroinformatics from the University of Edinburgh. Fine Tuning Moondream2 Stay tuned for the talk abstract! About the Speaker Parsa Khazaeepoul is the Head of Developer Relations at Moondream AI, where he focuses on making computer vision more accessible. A Summa Cum Laude graduate of the University of Washington’s Informatics program, Parsa also spearheaded developer relations at the AI2 Incubator and co-founded Turing Minds, a renowned speaker series featuring Turing Award winners and other leading figures in computer science. His work has impacted thousands through projects like CourseFinder and uwRMP, and he’s a recognized innovator in the Seattle tech scene, named to the Seattle Inno Under 25 Class of 2024. |
Feb 20 - Virtual AI, ML and Computer Vision Meetup
|
|
Lightning Talk: The Next Generation of Video Understanding with Twelve Labs
2024-02-15 · 18:00
James Le
– Head of Developer Experience
@ Twelve Labs
The evolution of video understanding has followed a similar trajectory to language and image understanding - with the rise of large pre-trained foundation models trained on a huge amount of data. Given the surge of multimodal research lately, video foundation models are becoming even more powerful to decipher the rich visual information embedded in videos. This talk will explore diverse use cases of video understanding and provide a glimpse of Twelve Labs offerings. |
|
|
Chenliang Xu
– Associate Professor
@ University of Rochester
Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, previous bias identification methods overly rely on human experts to conjecture potential biases, which may neglect other underlying biases not realized by humans. Is there an automatic way to assist human experts in finding biases in a broad domain of image classifiers? In this talk, I will introduce solutions. |
|
|
Food Waste Classification with AI
2024-02-15 · 18:00
Luka Posilović
– Head of Machine Learning
@ Kitro
1/3 of all food gets wasted, with millions of tons of food being thrown away each day. Food does not mean the same thing everywhere in the world, there are thousands of different meals across the world, therefore a lot of different classes to distinguish between. In this talk we’ll see through challenges of food-waste classification and see how foundation models can be useful to this task. We will also explore how we use FiftyOne to test models during development. |
|
|
Objects and Image Geo-localization from Visual Data
2024-02-15 · 18:00
Safwan Wshah
– Associate Professor
@ University of Vermont
Localizing images and objects from visual information stands out as one of the most challenging and dynamic topics in computer vision, owing to its broad applications across different domains. In this talk, we will introduce and delve into several research directions aimed at advancing solutions to these complex problems. |
|
|
Chenliang Xu
– Associate Professor
@ University of Rochester
Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, previous bias identification methods overly rely on human experts to conjecture potential biases, which may neglect other underlying biases not realized by humans. Is there an automatic way to assist human experts in finding biases in a broad domain of image classifiers? In this talk, I will introduce solutions. |
|
|
Objects and Image Geo-localization from Visual Data
2024-02-15 · 18:00
Safwan Wshah
– Associate Professor
@ University of Vermont
Localizing images and objects from visual information stands out as one of the most challenging and dynamic topics in computer vision, owing to its broad applications across different domains. In this talk, we will introduce and delve into several research directions aimed at advancing solutions to these complex problems. |
|
|
Food Waste Classification with AI
2024-02-15 · 18:00
Luka Posilović
– Head of Machine Learning
@ Kitro
1/3 of all food gets wasted, with millions of tons of food being thrown away each day. Food does not mean the same thing everywhere in the world, there are thousands of different meals across the world, therefore a lot of different classes to distinguish between. In this talk we’ll see through challenges of food-waste classification and see how foundation models can be useful to this task. We will also explore how we use FiftyOne to test models during development. |
|
|
Feb 2024 – AI, Machine Learning & Data Science Meetup
2024-02-15 · 18:00
When Feb 15, 2024 – 10:00 AM Pacific Where Virtual / Zoom - https://voxel51.com/computer-vision-events/feb-2024-ai-machine-learning-data-science-meetup/ Agenda Lightning Talk: The Next Generation of Video Understanding with Twelve Labs The evolution of video understanding has followed a similar trajectory to language and image understanding - with the rise of large pre-trained foundation models trained on a huge amount of data. Given the surge of multimodal research lately, video foundation models are becoming even more powerful to decipher the rich visual information embedded in videos. This talk will explore diverse use cases of video understanding and provide a glimpse of Twelve Labs offerings. Speaker: James Le is the Head of Developer Experience at Twelve Labs, a startup building multimodal foundation models for video understanding. Towards Fair Computer Vision: Discover the Hidden Biases of an Image Classifier Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, previous bias identification methods overly rely on human experts to conjecture potential biases, which may neglect other underlying biases not realized by humans. Is there an automatic way to assist human experts in finding biases in a broad domain of image classifiers? In this talk, I will introduce solutions. Speaker: Chenliang Xu is an Associate Professor in the Department of Computer Science at the University of Rochester. His research originates in computer vision and tackles interdisciplinary topics, including video understanding, audio-visual learning, vision and language, and methods for trustworthy AI. He has authored over 90 peer-reviewed papers in computer vision, machine learning, multimedia, and AI venues. Food Waste Classification with AI 1/3 of all food gets wasted, with millions of tons of food being thrown away each day. Food does not mean the same thing everywhere in the world, there are thousands of different meals across the world, therefore a lot of different classes to distinguish between. In this talk we’ll see through challenges of food-waste classification and see how foundation models can be useful to this task. We will also explore how we use FiftyOne to test models during development. Speaker: Luka Posilović is a computer scientist with a PhD from FER, Zagreb, Croatia, working as a Head of machine learning in Kitro. Him and the team are trying to reduce the global food waste problem by using AI. Objects and Image Geo-localization from Visual Data Localizing images and objects from visual information stands out as one of the most challenging and dynamic topics in computer vision, owing to its broad applications across different domains. In this talk, we will introduce and delve into several research directions aimed at advancing solutions to these complex problems. Speaker: Safwan Wshah is an Associate Professor in the Department of Computer Science at the University of Vermont. His research interests encompass the intersection of machine learning theory and application, with a particular emphasis on geo-localization from visual information. Additionally, he maintains broader interests in deep learning, computer vision, data analytics, and image processing. Don’t Forget
|
Feb 2024 – AI, Machine Learning & Data Science Meetup
|
|
Feb 2024 – AI, Machine Learning & Data Science Meetup
2024-02-15 · 18:00
When Feb 15, 2024 – 10:00 AM Pacific Where Virtual / Zoom - https://voxel51.com/computer-vision-events/feb-2024-ai-machine-learning-data-science-meetup/ Agenda Lightning Talk: The Next Generation of Video Understanding with Twelve Labs The evolution of video understanding has followed a similar trajectory to language and image understanding - with the rise of large pre-trained foundation models trained on a huge amount of data. Given the surge of multimodal research lately, video foundation models are becoming even more powerful to decipher the rich visual information embedded in videos. This talk will explore diverse use cases of video understanding and provide a glimpse of Twelve Labs offerings. Speaker: James Le is the Head of Developer Experience at Twelve Labs, a startup building multimodal foundation models for video understanding. Towards Fair Computer Vision: Discover the Hidden Biases of an Image Classifier Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, previous bias identification methods overly rely on human experts to conjecture potential biases, which may neglect other underlying biases not realized by humans. Is there an automatic way to assist human experts in finding biases in a broad domain of image classifiers? In this talk, I will introduce solutions. Speaker: Chenliang Xu is an Associate Professor in the Department of Computer Science at the University of Rochester. His research originates in computer vision and tackles interdisciplinary topics, including video understanding, audio-visual learning, vision and language, and methods for trustworthy AI. He has authored over 90 peer-reviewed papers in computer vision, machine learning, multimedia, and AI venues. Food Waste Classification with AI 1/3 of all food gets wasted, with millions of tons of food being thrown away each day. Food does not mean the same thing everywhere in the world, there are thousands of different meals across the world, therefore a lot of different classes to distinguish between. In this talk we’ll see through challenges of food-waste classification and see how foundation models can be useful to this task. We will also explore how we use FiftyOne to test models during development. Speaker: Luka Posilović is a computer scientist with a PhD from FER, Zagreb, Croatia, working as a Head of machine learning in Kitro. Him and the team are trying to reduce the global food waste problem by using AI. Objects and Image Geo-localization from Visual Data Localizing images and objects from visual information stands out as one of the most challenging and dynamic topics in computer vision, owing to its broad applications across different domains. In this talk, we will introduce and delve into several research directions aimed at advancing solutions to these complex problems. Speaker: Safwan Wshah is an Associate Professor in the Department of Computer Science at the University of Vermont. His research interests encompass the intersection of machine learning theory and application, with a particular emphasis on geo-localization from visual information. Additionally, he maintains broader interests in deep learning, computer vision, data analytics, and image processing. Don’t Forget
|
Feb 2024 – AI, Machine Learning & Data Science Meetup
|
|
Lightning Talk: The Next Generation of Video Understanding with Twelve Labs
2024-02-15 · 10:00
James Le
– Head of Developer Experience
@ Twelve Labs
The evolution of video understanding has followed a similar trajectory to language and image understanding - with the rise of large pre-trained foundation models trained on a huge amount of data. Given the surge of multimodal research lately, video foundation models are becoming even more powerful to decipher the rich visual information embedded in videos. This talk will explore diverse use cases of video understanding and provide a glimpse of Twelve Labs offerings. |
Feb 2024 – AI, Machine Learning & Data Science Meetup
|