talk-data.com
People (36 results)
See all 36 →Companies (1 result)
Activities & events
| Title & Speakers | Event |
|---|---|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
March 20 - AI, Machine Learning and Computer Vision Meetup
2025-03-20 · 15:30
This is a virtual event. Vision Language Models Are Few-Shot Audio Spectrogram Classifiers The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs. About the Speaker Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/ Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management About the Speaker Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields. Active Data Curation Effectively Distills Large-Scale Multimodal Models Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining. Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs. We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks. About the Speaker Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021. Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis. Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:
Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI. |
March 20 - AI, Machine Learning and Computer Vision Meetup
|
|
AI Meetup (March): Generative AI and Train ML Models For Responsible AI
2024-03-20 · 21:00
** Important RSVP here (Due to room capacity and building security, you must pre-register at the link for admission) Description: Welcome to our in-person AI meetup in New York. Join us for deep dive tech talks on AI, GenAI, LLMs and ML, hands-on workshops, food/drink, networking with speakers and fellow developers. Tech Talk: End-to-End Development of Generative AI applications on Azure Speaker: Nitya Narasimhan (Microsoft) Abstract: In this talk we’ll introduce the core concepts for building a “copilot” application on Azure AI from prompt engineering to LLM Ops – using the Contoso Chat application sample as a reference. And we’ll explore the Azure AI Studio (preview) platform from a code-first perspective to understand how you can streamline your development from model exploration to endpoint deployment, with a unified platform and workflow. Tech Talk: Train & Debug your ML Models For Responsible AI on Azure Speaker: Ruth Yakubu (Microsoft) Abstract: In this talk you will learn to train an AI model using the Azure Machine Learning Studio, then use its built-in Responsible AI Dashboard capability to debug your model for performance, fairness and responsible AI usage. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics Sponsors: We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsor. Sponsors will not only speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 20,000+ AI developers in New York or 300K+ worldwide. Community on Slack/Discord
|
AI Meetup (March): Generative AI and Train ML Models For Responsible AI
|
|
Journey into GenAI: Fine-Tuning Stable Diffusion and boosting CX with LLMs
2024-03-19 · 17:00
The next meetup features 2 GenAI talks: (1) fine-tuning image generation models like Stable Diffusion, (2) the transformative impact of Large Language Models (LLMs) on customer service. Agenda: 18:00 - 18:30: Doors open 18:30 - 18:40: Welcome 18:40 - 19:10: Fine-Tune Your Own Stable Diffusion Model - Tips and Tricks Included 19:10 - 19:40: Pizza & Beers 19:40 - 20:10: AI Tools in Action: Enhancing Customer Experience with LLMs at Polestar 20:10 - 21:00: Networking – Fine-Tune Your Own Stable Diffusion Model - Tips and Tricks Included Daniel Pleus - Data Scientist, Schibsted Recently, models and tools like Stable Diffusion, Midjourney, and Dalle-3 have made significant strides in generating high-quality, hyper-realistic images. The innovation doesn't stop with their out-of-the-box capabilities, though. Recent advancements have enabled the fine-tuning of these models on commodity hardware, unlocking new potential for personalized image generation. In this talk, we'll embark on the practical journey of fine-tuning the Stable Diffusion model with an unexpected protagonist: Otto, Schibsted's plush octopus mascot. Using Stable Diffusion, we send him to faraway countries, place him in historical settings, or change his appearance. You'll discover how Stable Diffusion operates, learn how to fine-tune it with techniques like textual inversion and LoRA, and receive many tips about the optimal settings. We will deep dive into prompts, guidance factors, denoising steps, and random seeds. By the end of this talk, you should feel comfortable conducting your own experiments. Speakers Bio: Daniel Pleus is a key member of Schibsted's AI Enablement Program as a Data Scientist Lead, where he assists over 60 Schibsted brands in discovering AI use cases and developing prototypes. His recent focus has been on Generative AI, including fine-tuning and deploying models. Before joining Schibsted, Daniel worked in various data and analytics roles and has an academic background in Economics. Title: AI Tools in Action: Enhancing Customer Experience with LLMs at Polestar Olga Larina - AI & Data Scientist @ Polestar Customer Experience Department Join me for a presentation where I'll delve into the innovative toolkit that our team has employed to revolutionize customer service. This year, we’ve rolled out our first solution based on a Large Language Model (LLM), and let me tell you—it’s changing the game. Starting with summarization and information extraction in February 2023, we’ve now set our sights on enhancing our Care Advisors' capabilities with AI that generates responses to customer cases with the precision and expertise of a seasoned professional. Our advisors were excited about our prototype reveal last year. It was clear they were all in, finding the AI support not just cool but also a true timesaver. In my talk, I will reveal how tools like Langchain’s sophisticated models and chains, and LlamaIndex’s robust vector stores, empower LLMs to excel. Consider this a brief but insightful tour into the technology that’s making our customer service smarter, faster, and way more fun. I am thrilled to guide you through the varied applications of AI tools that are proving to be incredibly relevant in today's tech landscape. The journey is just beginning, and the pace of development is nothing short of breathtaking. Every day brings new surprises and possibilities. Come and join my talk! Speakers Bio: At CX, our mission is to mine diverse data types to enhance customer experience, inform business strategy, and streamline operations. Our objectives span traditional data science, including tabular data analysis and natural language processing (NLP). This year, we're delving into the transformative realm of large language models, exploring their vast potential. Previously, I contributed as a Data Scientist at Nexer working on different types of projects including NLP, and CV (Computer Vision), and I have a background in software development, where I engaged in automation testing, analysis, and product ownership. My academic journey began with mathematics at a university in Russia. – About the event Date: March 19th, 18:00 - 20:30 Location: Shibsted’s Social Kitchen (Kungsbrohuset, Kungsbron 13, 111 22 Stockholm) Directions: Right next to the central station. Tickets: Sign up required. Anyone who is not on the list will not get in. The event is free of charge. Capacity: Space is limited to 100 participants. If you are signed up but unable to attend, please change your RSVP by February 21st. Food and drinks: Food and drinks will be provided. Questions: Please contact the meetup organizers. – Code of Conduct The NumFOCUS Code of Conduct applies to this event; please familiarize yourself with it before attending. If you have any questions or concerns regarding the Code of Conduct, please contact the organizers. |
Journey into GenAI: Fine-Tuning Stable Diffusion and boosting CX with LLMs
|
|
AI meetup (March): Generative AI and LLMs in Action
2024-03-14 · 17:00
** RSVP: https://www.aicamp.ai/event/eventdetails/W2024031409 Description: Welcome to the monthly in-person AI meetup in Paris. Join us for deep dive tech talks on AI/ML, food/drink, networking with speakers and fellow developers. Tech Talk: Make Deep Learning Efficient Speaker: Bertrand Charpentier (Pruna.ai) Tech Talk: LLM finetuned for Customer Service Topics/Speakers: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 10,000+ AI developers in Paris or 300K+ worldwide. Community on Slack/Discord
|
AI meetup (March): Generative AI and LLMs in Action
|
|
AI meetup (March): Generative AI and LLMs in Action
2024-03-14 · 17:00
** RSVP: https://www.aicamp.ai/event/eventdetails/W2024031409 Description: Welcome to the monthly in-person AI meetup in Paris. Join us for deep dive tech talks on AI/ML, food/drink, networking with speakers and fellow developers. Tech Talk: Make Deep Learning Efficient Speaker: Bertrand Charpentier (Pruna.ai) Tech Talk: LLM finetuned for Customer Service Topics/Speakers: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 10,000+ AI developers in Paris or 300K+ worldwide. Community on Slack/Discord
|
AI meetup (March): Generative AI and LLMs in Action
|
|
Zero to Code: Coding with GPT4
2024-03-13 · 18:00
In this talk, James Bentley (Awin), shares his story of going from zero knowledge of python to building RAG tools, chatbots, and even multi-modal AI prototypes in under a year by using GPT4 as a coding assistant. He shares his ups, his downs, and tips on how to move fast, and not break too many things. He will also give his first ever live demo of a project. |
|
|
Prototype to production with AI-native databases
2024-03-13 · 18:00
In this talk, we will discuss how AI-native databases productionize tasks like better search and integration with generative AI models for RAG, and multi-modal operations. You will learn how to achieve data isolation, redundancy and scalability for your AI-powered apps through features like multi-tenancy, replication, and horizontal scaling. There will be live demos and examples, of course. Join us to learn why an AI-native database should be an integral part of your AI tech stack in production. |
|