talk-data.com
Activities & events
| Title & Speakers | Event |
|---|---|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
AI Seminar (Virtual): Turning ML and AI into Engineering Disciplines
2024-11-19 · 17:00
Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link). Description: Join Hugo Bowne-Anderson and Alex Filipchik (Head of Infrastructure, Cloud Kitchens) for a Fireside Chat on how machine learning (ML) and AI are evolving from niche specializations into essential engineering disciplines. Alex will share his experience in transforming Cloud Kitchens' data and ML infrastructure to empower engineers, support real-time and batch processing, and ensure seamless deployment of data-driven applications. Key Topics of Discussion:
This session is perfect for data engineers, software engineers, and AI/ML practitioners who are interested in making ML and AI an integral part of the software development lifecycle. --------------------------- Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only have the chance to speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 400K+ AI developers worldwide. AICamp Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations |
AI Seminar (Virtual): Turning ML and AI into Engineering Disciplines
|
|
AI Workshop (Virtual): Real-time chat feed processing with RAG
2024-02-16 · 17:00
*** IMPORTANT: RSVP on the event website to receive joining link: https://www.aicamp.ai/event/eventdetails/W2024021609 Description: Welcome to the weekly AI virtual seminars. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Join us for a two-hour workshop where we will dive into the world of real-time chat enhancements using advanced AI techniques. We will integrate a live data stream (powered by Bytewax) with a Large Language Model (LLM) and Retrieval Augmented Generation (RAG) to supercharge Slack conversations. This session promises a blend of theory and hands-on practice, equipped with resources and code samples. We will focus on the Python ecosystem and solutions that look friendly and familiar to Pythonistas. Participants of the workshop will receive a Certificate of Completion. Speakers/Instructors: Zander Matheson (Bytewax), Henrik Nyman (Softlandia), Mikko Lehtimäki (Softlandia) The workshop is provided and sponsored by Bytewax. Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations Join Slack (search and join the #virtualevents channel) \| Join Discord Community Managers: we are looking for passionate community managers who are enthusiastic about nurturing tech communities. Join us in planning, hosting meetup events, and build and grow the local AI/ML developer community. In addition to broadening your personal network and acquiring valuable tech skills, we'll make sure to reward your dedication and time. Speakers: If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: https://forms.gle/JkMt91CZRtoJBSFUA Sponsors: We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsor. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 300K+ developers worldwide. |
AI Workshop (Virtual): Real-time chat feed processing with RAG
|
|
AI Meetup (Virtual): Real Time Event Data Processing
2023-06-20 · 16:00
Register on the event website to receive joining link: https://www.aicamp.ai/event/eventdetails/W2023062009 Description: Welcome to the weekly AI/ML/Data virtual meetup, which you can join from anywhere around the world. Join us for deep dive tech talks on AI/ML/Data, networking with speakers&peer developers, and win lucky draw prizes. Tech Talk: Declarative Reasoning with Timelines: The Next Step in Event Processing Speaker: Ryan Michael, VP of Engineering @Datastax Abstract: At the heart of modern data processing lies events. Events describe the roughest, most complete picture available of what has happened in the world, and practically every form of data processing ultimately begins with events. While the power of event processing has increased since the emergence of streaming data processing, current systems are still difficult to use when working on problems that deal with time and order, such as predictive AI/ML. Handling these problems requires a new kind of query language - a way to declaratively reason about events over time. In this talk, we introduce the concept of timelines. Timelines are an intuitive abstraction for reasoning about temporal values. They support a broad range of useful operations which can be efficiently computed at scale. We will demonstrate the power and differentiation of timelines: - How timelines allow declarative queries over events and time in a simple and intuitive manner - Why timelines are ideal for applications such as behavioral predictions\, trend analysis\, and forecasting\, and how existing solutions such as streaming SQL fall short. - How to execute timeline based queries using the open-source Kaskada event-processing engine. Community on Slack - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations Join Slack (search and join the #virtualevents channel) Contact us to submit topics and/or sponsor the meetup on venue/food/swags/prizes. https://forms.gle/JkMt91CZRtoJBSFUA |
AI Meetup (Virtual): Real Time Event Data Processing
|