talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (17 results)

See all 17 →
Showing 14 results

Activities & events

Title & Speakers Event
PyData Leeds: March Meet-up 2025-03-25 · 17:30

PyData Leeds is back and we're very excited to bring you the March Meet-up. We've got a full schedule with 2 presentations, it's going to be great!

PyData Leeds brings together people who are passionate about Python, Data & Engineering for evenings focussed around learning and networking.

Schedule: Date: Tuesday 25th March 2024 Time: 17:30 Location: Parallax Offices, The Elbow Rooms, 64 Call Lane, Leeds, LS1 6DT

Agenda: 17:30: Networking and Refreshments 18:00: Welcome & Icebreaker 18:15: Jakub Szamuk, Software Engineer - 'Purr-mission Granted: Machine Vision in the Real World' In an era where LLMs and machine learning are transforming industries, how do we bring this tech into a real product - quickly? This talk explores the journey of building Purr-mission Granted, a heavily over-engineered machine-vision catflap. From concept to working prototype in just one day, we will dive into the challenges of gathering training data and lessons learned in implementing machine vision in a physical product. Whether you're an AI enthusiast, maker, or just a pet owner tired of surprise deliveries, this talk aims to help inspire you to start bringing this exciting new technology into your own projects. 19:00: Suze Hawkins, Lead Data Scientist & Magda Nowakowska, Senior Data Scientist - 'Data Science Without Data: Building Models When Real Data is Scarce' What do you do when you're faced with a data science problem, but there’s no real data available? Sometimes, access is restricted due to privacy, legal constraints, or simply because it hasn’t been collected yet. However, being able to test and experiment ideas quickly is an important aspect of the development to production cycle - often as a proof of concept to secure the necessary approvals or access to real data. In this talk, we’ll explore practical strategies for tackling machine learning challenges when starting from scratch. 19:45: Wrap-up & Drinks

If you have been before, we look forward to seeing you again and if you're coming along for the first time, we're excited to meet you and for you to join the Leeds PyData Community.

Connect with us on Meetup, Discord or Twitter.

PyData Leeds is a strictly professional event, as such professional behaviour is expected.

PyData Leeds is a chapter of PyData, an educational program of NumFOCUS and thus abides by the NumFOCUS Code of Conduct - https://pydata.org/code-of-conduct.html

PyData Leeds: March Meet-up

This is a virtual event.

Register for the Zoom

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs.

About the Speaker

Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management

About the Speaker

Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields.

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining.

Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs.

We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021.

Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences

Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis.

Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:

  • Analyze dataset distributions and potential biases
  • Identify edge cases and interesting samples
  • Compare similar samples across datasets
  • Explore multi-modal annotations and complex label structures

Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

March 20 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs.

About the Speaker

Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management

About the Speaker

Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields.

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining.

Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs.

We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021.

Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences

Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis.

Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:

  • Analyze dataset distributions and potential biases
  • Identify edge cases and interesting samples
  • Compare similar samples across datasets
  • Explore multi-modal annotations and complex label structures

Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

March 20 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs.

About the Speaker

Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management

About the Speaker

Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields.

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining.

Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs.

We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021.

Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences

Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis.

Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:

  • Analyze dataset distributions and potential biases
  • Identify edge cases and interesting samples
  • Compare similar samples across datasets
  • Explore multi-modal annotations and complex label structures

Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

March 20 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs.

About the Speaker

Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management

About the Speaker

Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields.

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining.

Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs.

We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021.

Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences

Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis.

Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:

  • Analyze dataset distributions and potential biases
  • Identify edge cases and interesting samples
  • Compare similar samples across datasets
  • Explore multi-modal annotations and complex label structures

Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

March 20 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs.

About the Speaker

Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management

About the Speaker

Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields.

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining.

Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs.

We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021.

Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences

Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis.

Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:

  • Analyze dataset distributions and potential biases
  • Identify edge cases and interesting samples
  • Compare similar samples across datasets
  • Explore multi-modal annotations and complex label structures

Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

March 20 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs.

About the Speaker

Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management

About the Speaker

Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields.

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining.

Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs.

We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021.

Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences

Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis.

Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:

  • Analyze dataset distributions and potential biases
  • Identify edge cases and interesting samples
  • Compare similar samples across datasets
  • Explore multi-modal annotations and complex label structures

Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

March 20 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs.

About the Speaker

Satvik Dixit is a masters student at Carnegie Mellon University, advised by Professors Bhiksha Raj and Chris Donahue. His research interests are Audio/Speech Processing and Multimodal Learning, with focus on audio understanding and generation tasks. More details can be found at: https://satvik-dixit.github.io/

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management

About the Speaker

Aditi Singh is an Assistant College Lecturer in the Department of Computer Science at Cleveland State University, Cleveland, Ohio. She earned her M.S. and Ph.D. in Computer Science from Kent State University. She was awarded a prestigious Gold Medal for academic excellence during her undergraduate studies. Her research interests include Artificial Intelligence, Large Language Models (LLM), and Generative AI. Dr. Singh has published over 25 research papers in these fields.

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining.

Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs.

We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at Google Deepmind. He did his undergraduate degree in computer science in IIIT Delhi from 2016 to 2020, and his masters in machine learning in The University of Cambridge in 2021.

Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences

Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis.

Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:

  • Analyze dataset distributions and potential biases
  • Identify edge cases and interesting samples
  • Compare similar samples across datasets
  • Explore multi-modal annotations and complex label structures

Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward.

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

March 20 - AI, Machine Learning and Computer Vision Meetup

When March 21, 2024 – 10:00 AM Pacific

Where Virtual / Zoom - https://voxel51.com/computer-vision-events/march-2024-ai-machine-learning-data-science-meetup/

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Images that change their appearance under a transformation, such as a rotation or a flip, have long fascinated students of perception, from Salvador Dalli to M. C. Escher. The appeal of these multi-view optical illusions lies partly in the challenge of arranging visual elements such that they may be understood in multiple different ways. Creating these illusions requires accurately modeling—and then subverting—visual perception. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image. We then combine these noise estimates together and denoise the image. A theoretical analysis suggests that this method works precisely for views that can be written as orthogonal transformations. An important special case of these transformations are permutations of an image's pixels, which we use to create a variety of "anagram" illusions, such as jigsaw puzzles that can be solved in two different ways.

Speaker: Andrew Owens is an assistant professor at The University of Michigan in the department of Electrical Engineering and Computer Science. Prior to that, he was a postdoctoral scholar at UC Berkeley. He received a Ph.D. in Electrical Engineering and Computer Science from MIT in 2016. He is a recipient of a Computer Vision and Pattern Recognition (CVPR) Best Paper Honorable Mention Award, and a Microsoft Research Ph.D. Fellowship.

Omnidirectional Computer Vision

Omnidirectional cameras are a different camera modality than what we are typically used to, where we sacrifice nice geometry for significantly larger fields-of-view. Ciarán will discuss Omnidirectional cameras, how to represent them geometrically, optical effects of using omnidirectional cameras, neural networks on omnidirectional cameras, and applications that use omnidirectional cameras.

Speaker: Ciaran Eising is an Associate Professor of Artificial Intelligence and Computer Vision at the University of Limerick, Ireland and co-founder of the D2iCE Research Group. Prior to joining the University of Limerick, Ciarán was a Senior Expert of Computer Vision (director level) at Valeo, designing omnidirectional vision algorithms for low-speed vehicle automation.

Illuminating the Underground World with Multimodal Algorithms

Dive deep into the future of underground exploration! This talk introduces a groundbreaking approach that leverages the power of multimodal algorithms, combining advanced computer vision techniques, SLAM algorithms, sensor metadata, and GIS data. By integrating diverse data streams, we unlock unprecedented levels of detail and accuracy in underground inspections, paving the way for safer, more efficient, and insightful subterranean analyses. About the Speaker

Speaker: Adonaí Vera is a Machine Learning Engineer with expertise in computer vision and AI algorithms, specializing in AI solutions for underground inspections using TensorFlow and OpenCV. Recognized by Google as a top TensorFlow developer in Colombia, he is also the founder of a company focused on AI innovations and currently contributes his expertise to Subterra AI.

The Role of AI in Fixing Hiring

From writing job descriptions, to sourcing candidates, to interviews and evaluation – our current hiring practices are often rife with human bias. Even when such bias is unconscious, it can result in expensive mis-hires. This talk explores the types of biases and the pivotal role of AI in mitigating them. We will discuss the common sources of bias in hiring, the current AI landscape that attempts to address these issues and further opportunities.

Speaker: Saurav Pandit is a seasoned AI leader with expertise in natural language understanding, search and language models. He is on the advisory board of AI 2030, an initiative that promotes AI for good.

Don’t Forget

  • Voxel51 will make a donation on behalf of the Meetup members to the charity that gets the most votes this month.
  • Can’t make the date and time? No problem! Just make sure to register here so we can send you links to the playbacks.
March 2024 – AI, Machine Learning & Data Science Meetup

When March 21, 2024 – 10:00 AM Pacific

Where Virtual / Zoom - https://voxel51.com/computer-vision-events/march-2024-ai-machine-learning-data-science-meetup/

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Images that change their appearance under a transformation, such as a rotation or a flip, have long fascinated students of perception, from Salvador Dalli to M. C. Escher. The appeal of these multi-view optical illusions lies partly in the challenge of arranging visual elements such that they may be understood in multiple different ways. Creating these illusions requires accurately modeling—and then subverting—visual perception. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image. We then combine these noise estimates together and denoise the image. A theoretical analysis suggests that this method works precisely for views that can be written as orthogonal transformations. An important special case of these transformations are permutations of an image's pixels, which we use to create a variety of "anagram" illusions, such as jigsaw puzzles that can be solved in two different ways.

Speaker: Andrew Owens is an assistant professor at The University of Michigan in the department of Electrical Engineering and Computer Science. Prior to that, he was a postdoctoral scholar at UC Berkeley. He received a Ph.D. in Electrical Engineering and Computer Science from MIT in 2016. He is a recipient of a Computer Vision and Pattern Recognition (CVPR) Best Paper Honorable Mention Award, and a Microsoft Research Ph.D. Fellowship.

Omnidirectional Computer Vision

Omnidirectional cameras are a different camera modality than what we are typically used to, where we sacrifice nice geometry for significantly larger fields-of-view. Ciarán will discuss Omnidirectional cameras, how to represent them geometrically, optical effects of using omnidirectional cameras, neural networks on omnidirectional cameras, and applications that use omnidirectional cameras.

Speaker: Ciaran Eising is an Associate Professor of Artificial Intelligence and Computer Vision at the University of Limerick, Ireland and co-founder of the D2iCE Research Group. Prior to joining the University of Limerick, Ciarán was a Senior Expert of Computer Vision (director level) at Valeo, designing omnidirectional vision algorithms for low-speed vehicle automation.

Illuminating the Underground World with Multimodal Algorithms

Dive deep into the future of underground exploration! This talk introduces a groundbreaking approach that leverages the power of multimodal algorithms, combining advanced computer vision techniques, SLAM algorithms, sensor metadata, and GIS data. By integrating diverse data streams, we unlock unprecedented levels of detail and accuracy in underground inspections, paving the way for safer, more efficient, and insightful subterranean analyses. About the Speaker

Speaker: Adonaí Vera is a Machine Learning Engineer with expertise in computer vision and AI algorithms, specializing in AI solutions for underground inspections using TensorFlow and OpenCV. Recognized by Google as a top TensorFlow developer in Colombia, he is also the founder of a company focused on AI innovations and currently contributes his expertise to Subterra AI.

The Role of AI in Fixing Hiring

From writing job descriptions, to sourcing candidates, to interviews and evaluation – our current hiring practices are often rife with human bias. Even when such bias is unconscious, it can result in expensive mis-hires. This talk explores the types of biases and the pivotal role of AI in mitigating them. We will discuss the common sources of bias in hiring, the current AI landscape that attempts to address these issues and further opportunities.

Speaker: Saurav Pandit is a seasoned AI leader with expertise in natural language understanding, search and language models. He is on the advisory board of AI 2030, an initiative that promotes AI for good.

Don’t Forget

  • Voxel51 will make a donation on behalf of the Meetup members to the charity that gets the most votes this month.
  • Can’t make the date and time? No problem! Just make sure to register here so we can send you links to the playbacks.
March 2024 – AI, Machine Learning & Data Science Meetup

When March 21, 2024 – 10:00 AM Pacific

Where Virtual / Zoom - https://voxel51.com/computer-vision-events/march-2024-ai-machine-learning-data-science-meetup/

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Images that change their appearance under a transformation, such as a rotation or a flip, have long fascinated students of perception, from Salvador Dalli to M. C. Escher. The appeal of these multi-view optical illusions lies partly in the challenge of arranging visual elements such that they may be understood in multiple different ways. Creating these illusions requires accurately modeling—and then subverting—visual perception. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image. We then combine these noise estimates together and denoise the image. A theoretical analysis suggests that this method works precisely for views that can be written as orthogonal transformations. An important special case of these transformations are permutations of an image's pixels, which we use to create a variety of "anagram" illusions, such as jigsaw puzzles that can be solved in two different ways.

Speaker: Andrew Owens is an assistant professor at The University of Michigan in the department of Electrical Engineering and Computer Science. Prior to that, he was a postdoctoral scholar at UC Berkeley. He received a Ph.D. in Electrical Engineering and Computer Science from MIT in 2016. He is a recipient of a Computer Vision and Pattern Recognition (CVPR) Best Paper Honorable Mention Award, and a Microsoft Research Ph.D. Fellowship.

Omnidirectional Computer Vision

Omnidirectional cameras are a different camera modality than what we are typically used to, where we sacrifice nice geometry for significantly larger fields-of-view. Ciarán will discuss Omnidirectional cameras, how to represent them geometrically, optical effects of using omnidirectional cameras, neural networks on omnidirectional cameras, and applications that use omnidirectional cameras.

Speaker: Ciaran Eising is an Associate Professor of Artificial Intelligence and Computer Vision at the University of Limerick, Ireland and co-founder of the D2iCE Research Group. Prior to joining the University of Limerick, Ciarán was a Senior Expert of Computer Vision (director level) at Valeo, designing omnidirectional vision algorithms for low-speed vehicle automation.

Illuminating the Underground World with Multimodal Algorithms

Dive deep into the future of underground exploration! This talk introduces a groundbreaking approach that leverages the power of multimodal algorithms, combining advanced computer vision techniques, SLAM algorithms, sensor metadata, and GIS data. By integrating diverse data streams, we unlock unprecedented levels of detail and accuracy in underground inspections, paving the way for safer, more efficient, and insightful subterranean analyses. About the Speaker

Speaker: Adonaí Vera is a Machine Learning Engineer with expertise in computer vision and AI algorithms, specializing in AI solutions for underground inspections using TensorFlow and OpenCV. Recognized by Google as a top TensorFlow developer in Colombia, he is also the founder of a company focused on AI innovations and currently contributes his expertise to Subterra AI.

The Role of AI in Fixing Hiring

From writing job descriptions, to sourcing candidates, to interviews and evaluation – our current hiring practices are often rife with human bias. Even when such bias is unconscious, it can result in expensive mis-hires. This talk explores the types of biases and the pivotal role of AI in mitigating them. We will discuss the common sources of bias in hiring, the current AI landscape that attempts to address these issues and further opportunities.

Speaker: Saurav Pandit is a seasoned AI leader with expertise in natural language understanding, search and language models. He is on the advisory board of AI 2030, an initiative that promotes AI for good.

Don’t Forget

  • Voxel51 will make a donation on behalf of the Meetup members to the charity that gets the most votes this month.
  • Can’t make the date and time? No problem! Just make sure to register here so we can send you links to the playbacks.
March 2024 – AI, Machine Learning & Data Science Meetup

When March 21, 2024 – 10:00 AM Pacific

Where Virtual / Zoom - https://voxel51.com/computer-vision-events/march-2024-ai-machine-learning-data-science-meetup/

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Images that change their appearance under a transformation, such as a rotation or a flip, have long fascinated students of perception, from Salvador Dalli to M. C. Escher. The appeal of these multi-view optical illusions lies partly in the challenge of arranging visual elements such that they may be understood in multiple different ways. Creating these illusions requires accurately modeling—and then subverting—visual perception. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image. We then combine these noise estimates together and denoise the image. A theoretical analysis suggests that this method works precisely for views that can be written as orthogonal transformations. An important special case of these transformations are permutations of an image's pixels, which we use to create a variety of "anagram" illusions, such as jigsaw puzzles that can be solved in two different ways.

Speaker: Andrew Owens is an assistant professor at The University of Michigan in the department of Electrical Engineering and Computer Science. Prior to that, he was a postdoctoral scholar at UC Berkeley. He received a Ph.D. in Electrical Engineering and Computer Science from MIT in 2016. He is a recipient of a Computer Vision and Pattern Recognition (CVPR) Best Paper Honorable Mention Award, and a Microsoft Research Ph.D. Fellowship.

Omnidirectional Computer Vision

Omnidirectional cameras are a different camera modality than what we are typically used to, where we sacrifice nice geometry for significantly larger fields-of-view. Ciarán will discuss Omnidirectional cameras, how to represent them geometrically, optical effects of using omnidirectional cameras, neural networks on omnidirectional cameras, and applications that use omnidirectional cameras.

Speaker: Ciaran Eising is an Associate Professor of Artificial Intelligence and Computer Vision at the University of Limerick, Ireland and co-founder of the D2iCE Research Group. Prior to joining the University of Limerick, Ciarán was a Senior Expert of Computer Vision (director level) at Valeo, designing omnidirectional vision algorithms for low-speed vehicle automation.

Illuminating the Underground World with Multimodal Algorithms

Dive deep into the future of underground exploration! This talk introduces a groundbreaking approach that leverages the power of multimodal algorithms, combining advanced computer vision techniques, SLAM algorithms, sensor metadata, and GIS data. By integrating diverse data streams, we unlock unprecedented levels of detail and accuracy in underground inspections, paving the way for safer, more efficient, and insightful subterranean analyses. About the Speaker

Speaker: Adonaí Vera is a Machine Learning Engineer with expertise in computer vision and AI algorithms, specializing in AI solutions for underground inspections using TensorFlow and OpenCV. Recognized by Google as a top TensorFlow developer in Colombia, he is also the founder of a company focused on AI innovations and currently contributes his expertise to Subterra AI.

The Role of AI in Fixing Hiring

From writing job descriptions, to sourcing candidates, to interviews and evaluation – our current hiring practices are often rife with human bias. Even when such bias is unconscious, it can result in expensive mis-hires. This talk explores the types of biases and the pivotal role of AI in mitigating them. We will discuss the common sources of bias in hiring, the current AI landscape that attempts to address these issues and further opportunities.

Speaker: Saurav Pandit is a seasoned AI leader with expertise in natural language understanding, search and language models. He is on the advisory board of AI 2030, an initiative that promotes AI for good.

Don’t Forget

  • Voxel51 will make a donation on behalf of the Meetup members to the charity that gets the most votes this month.
  • Can’t make the date and time? No problem! Just make sure to register here so we can send you links to the playbacks.
March 2024 – AI, Machine Learning & Data Science Meetup

Welcome to the March of Pydata Berlin meetup!

We would like to welcome you all starting from 18:45. The talks begin around 19.15.

Please provide your first and last name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited.

************************************************************************** The Lineup for the Evening

Talk 1: Deploying ML to production with Metaflow

Abstract: Metaflow is an open-source framework for managing AI projects. In this talk, Jacek Filipczuk will give a short introduction to the framework and will share his personal experience of using it, with insights on what worked and what didn't during his project.

Speaker: Jacek Filipczuk is a seasoned Lead Machine Learning engineer with 8 years of experience applying various AI tools in over 5 different startups. Jacek worked in different domains, with a focus on text data and NLP solutions. In the past 5 years, he focused on kickstarting AI projects in startups which defined his role as a blend of Machine Learning Engineer and business strategist, bridging the gap between company objectives and AI solutions. Currently, Jacek is working at Walking on Earth leveraging state-of-the-art technologies to help users with stress management.

Talk 2: Switching from IC to EM (fresh perspective)

Abstract: In this presentation, I will address his shift from Data scientist to management. It covers the essential aspects of managerial responsibilities and offers resources for navigating the transition.

Speaker: Theodore meynard is a Data Science Manager at GetYourGuide. He worked previously as a Data Science manager for 3 years and switched to management beginning of 2023.

There will be slots for 2-3 Lightning Talks (3-5 Minutes for each). Kindly let us know if you would like to present something at the start of the meetup :)

*** NumFOCUS Code of Conduct THE SHORT VERSION Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS. All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery are not appropriate. NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form. Thank you for helping make this a welcoming, friendly community for all. If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct *** HOSTS: Google Cloud is excited to welcome you for this month's version of pydata.

PyData Berlin 2024 March Meetup

Hello Data & AI London community, Merry Christmas and Happy new year to you all! We will be kicking off our first meet up of 2024 with a bang.... As the New Year resolutions kick in and dry Janners dry jan I thought March would be best to get the ball rolling with meet ups... and what a line up it is!

Enda Ridge \| Product Analytics and Google shopping @ Google Putting Algorithms into the Business - war stories and success stories Better Algorithms, Sophisticated Cloud Services, More Data. Improvements to the tools at our disposal continue at pace. Demands of the business for the latest innovations outpace our supply. We know that accelerating algorithm automation will decelerate business decline. We know that less time on manual process means more time on great user journeys and customer service. But what is simple in principle proves surprisingly difficult in practice.

In this talk, I will take us through what makes it hard to put data science and algorithms into action in a business. We’ll talk about the operational, people, process and technology blockers I’ve encountered and tips for how I (sometimes) overcame them.

About me Enda is the author of "Guerrilla Analytics – a practical approach to working with data". His teams have helped the world's largest Big Tech companies, FTSE 100 retailers and global consultancies maximise their cost savings and opportunities using data, analytics and machine learning. He currently leads the EMEA product analytics team in Google Shopping, helping understand how millions of merchants are receiving value from the billions of their offers on Google Search and Shopping using experimentation and data driven hypotheses. His PhD is in algorithm tuning. He is a bad runner, a good folk dancer and a great cat dad.

Hudson Mendes \| Ex Machine Learning Manager @ Peloton Challenges & Learnings when optimising a Classier for a Class-Imbalanced Multimodal Dataset Multimodal Deep Learning Classiers are known for being hard to optimise. They are computationally hungry and can't be trained on CPUs. Class-imbalanced datasets also force you out of the comfort of using Cross Entropy, which is convex, as your objective function. In this talk, we investigate aspects of a Multimodal Dataset (MELD) based on the Friends series as well as its separation challenges, the struggles and workarounds when training with PyTorch on TPUs and GPUs (over Google Colab), issues regarding the different scales of using embeddings from different modalities & the application of the Dice & Focal Losses as alternatives to Cross Entropy for class-imbalanced datasets.

Please aim to arrive at 6pm. The talks will start around 6.30pm (give or take a few mins) and there will be pizza and refreshments as always on arrival :)

Look forward to seeing you all soon!

Meetup #8 - Multimodal Deep Learning & Putting Algorithms in Businesses
Showing 14 results