Search – talk-data.com

Feb 5 - AI, ML and Computer Vision Meetup 2026-02-05 · 17:00

Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Feb 5, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Unlocking Visual Anomaly Detection: Navigating Challenges and Pioneering with Vision-Language Models

Visual anomaly detection (VAD) is pivotal for ensuring quality in manufacturing, medical imaging, and safety inspections, yet it continues to face challenges such as data scarcity, domain shifts, and the need for precise localization and reasoning. This seminar explores VAD fundamentals, core challenges, and recent advancements leveraging vision-language models and multimodal large language models (MLLMs). We contrast CLIP-based methods for efficient zero/few-shot detection with MLLM-driven reasoning for explainable, threshold-free outcomes. Drawing from recent studies, we highlight emerging trends, benchmarks, and future directions toward building adaptable, real-world VAD systems. This talk is designed for researchers and practitioners interested in AI-driven inspection and next-generation multimodal approaches.

About the Speaker

Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR, WACV, ICIP, IJCB, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.

Data-Centric Lessons To Improve Speech-Language Pretraining

Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs.

We focus on three research questions fundamental to speech-language pretraining data:

How to process raw web-crawled audio content for speech-text pretraining;
How to construct synthetic pretraining datasets to augment web-crawled data;
How to interleave (text, audio) segments into training sequences.

We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. He is also a part of the International Max Planck Research School for Intelligent Systems. He is mainly interested in understanding the generalisation properties of foundation models, both vision-language models (VLMs) and large multi-modal models (LMMs), through the lens of their pre-training and test data distributions. His research is funded by a Google PhD Fellowship in Machine Intelligence.

A Practical Pipeline for Synthetic Data with Nano Banana Pro + FiftyOne

Most computer-vision failures come from the rare cases, the dark corners, odd combinations, and edge conditions we never capture enough in real datasets. In this session, we walk through a practical end-to-end pipeline for generating targeted synthetic data using Google’s Nano Banana Pro and managing it with FiftyOne. We’ll explore how to translate dataset gaps into generation prompts, create thousands of high-quality synthetic images, automatically enrich them with metadata, and bring everything into FiftyOne for inspection, filtering, and validation. By the end, you’ll understand how to build a repeatable synthetic-first workflow that closes real vision gaps and improves model performance on the scenarios that matter most.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI\, making technology simple\, accessible\, and reliable.

Making Computer Vision Models Faster: An Introduction to TensorRT Optimization

Modern computer vision applications demand real-time performance, yet many deep learning models struggle with high latency during deployment. This talk introduces how TensorRT can significantly accelerate inference by applying optimizations such as layer fusion, precision calibration, and efficient memory management. Attendees will learn the core concepts behind TensorRT, how it integrates into existing CV pipelines, and how to measure and benchmark improvements. Through practical examples and performance comparisons, the session will demonstrate how substantial speedups can be achieved with minimal model-accuracy loss. By the end, participants will understand when and how to apply TensorRT to make their CV models production-ready.

About the Speaker

Tushar Gadhiya is a Technical Lead at Infocusp Innovations, specialising in deep learning, computer vision, graph learning, and agentic AI. My experience spans academic research as a PhD holder and industry work, where I have contributed to multiple patents.

Feb 5 - AI, ML and Computer Vision Meetup

Feb 5 - AI, ML and Computer Vision Meetup 2026-02-05 · 17:00

Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Feb 5, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Unlocking Visual Anomaly Detection: Navigating Challenges and Pioneering with Vision-Language Models

Visual anomaly detection (VAD) is pivotal for ensuring quality in manufacturing, medical imaging, and safety inspections, yet it continues to face challenges such as data scarcity, domain shifts, and the need for precise localization and reasoning. This seminar explores VAD fundamentals, core challenges, and recent advancements leveraging vision-language models and multimodal large language models (MLLMs). We contrast CLIP-based methods for efficient zero/few-shot detection with MLLM-driven reasoning for explainable, threshold-free outcomes. Drawing from recent studies, we highlight emerging trends, benchmarks, and future directions toward building adaptable, real-world VAD systems. This talk is designed for researchers and practitioners interested in AI-driven inspection and next-generation multimodal approaches.

About the Speaker

Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR, WACV, ICIP, IJCB, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.

Data-Centric Lessons To Improve Speech-Language Pretraining

Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs.

We focus on three research questions fundamental to speech-language pretraining data:

How to process raw web-crawled audio content for speech-text pretraining;
How to construct synthetic pretraining datasets to augment web-crawled data;
How to interleave (text, audio) segments into training sequences.

We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. He is also a part of the International Max Planck Research School for Intelligent Systems. He is mainly interested in understanding the generalisation properties of foundation models, both vision-language models (VLMs) and large multi-modal models (LMMs), through the lens of their pre-training and test data distributions. His research is funded by a Google PhD Fellowship in Machine Intelligence.

A Practical Pipeline for Synthetic Data with Nano Banana Pro + FiftyOne

Most computer-vision failures come from the rare cases, the dark corners, odd combinations, and edge conditions we never capture enough in real datasets. In this session, we walk through a practical end-to-end pipeline for generating targeted synthetic data using Google’s Nano Banana Pro and managing it with FiftyOne. We’ll explore how to translate dataset gaps into generation prompts, create thousands of high-quality synthetic images, automatically enrich them with metadata, and bring everything into FiftyOne for inspection, filtering, and validation. By the end, you’ll understand how to build a repeatable synthetic-first workflow that closes real vision gaps and improves model performance on the scenarios that matter most.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI\, making technology simple\, accessible\, and reliable.

Making Computer Vision Models Faster: An Introduction to TensorRT Optimization

Modern computer vision applications demand real-time performance, yet many deep learning models struggle with high latency during deployment. This talk introduces how TensorRT can significantly accelerate inference by applying optimizations such as layer fusion, precision calibration, and efficient memory management. Attendees will learn the core concepts behind TensorRT, how it integrates into existing CV pipelines, and how to measure and benchmark improvements. Through practical examples and performance comparisons, the session will demonstrate how substantial speedups can be achieved with minimal model-accuracy loss. By the end, participants will understand when and how to apply TensorRT to make their CV models production-ready.

About the Speaker

Tushar Gadhiya is a Technical Lead at Infocusp Innovations, specialising in deep learning, computer vision, graph learning, and agentic AI. My experience spans academic research as a PhD holder and industry work, where I have contributed to multiple patents.

Feb 5 - AI, ML and Computer Vision Meetup

Feb 5 - AI, ML and Computer Vision Meetup 2026-02-05 · 17:00

Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Feb 5, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Unlocking Visual Anomaly Detection: Navigating Challenges and Pioneering with Vision-Language Models

Visual anomaly detection (VAD) is pivotal for ensuring quality in manufacturing, medical imaging, and safety inspections, yet it continues to face challenges such as data scarcity, domain shifts, and the need for precise localization and reasoning. This seminar explores VAD fundamentals, core challenges, and recent advancements leveraging vision-language models and multimodal large language models (MLLMs). We contrast CLIP-based methods for efficient zero/few-shot detection with MLLM-driven reasoning for explainable, threshold-free outcomes. Drawing from recent studies, we highlight emerging trends, benchmarks, and future directions toward building adaptable, real-world VAD systems. This talk is designed for researchers and practitioners interested in AI-driven inspection and next-generation multimodal approaches.

About the Speaker

Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR, WACV, ICIP, IJCB, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.

Data-Centric Lessons To Improve Speech-Language Pretraining

Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs.

We focus on three research questions fundamental to speech-language pretraining data:

How to process raw web-crawled audio content for speech-text pretraining;
How to construct synthetic pretraining datasets to augment web-crawled data;
How to interleave (text, audio) segments into training sequences.

We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. He is also a part of the International Max Planck Research School for Intelligent Systems. He is mainly interested in understanding the generalisation properties of foundation models, both vision-language models (VLMs) and large multi-modal models (LMMs), through the lens of their pre-training and test data distributions. His research is funded by a Google PhD Fellowship in Machine Intelligence.

A Practical Pipeline for Synthetic Data with Nano Banana Pro + FiftyOne

Most computer-vision failures come from the rare cases, the dark corners, odd combinations, and edge conditions we never capture enough in real datasets. In this session, we walk through a practical end-to-end pipeline for generating targeted synthetic data using Google’s Nano Banana Pro and managing it with FiftyOne. We’ll explore how to translate dataset gaps into generation prompts, create thousands of high-quality synthetic images, automatically enrich them with metadata, and bring everything into FiftyOne for inspection, filtering, and validation. By the end, you’ll understand how to build a repeatable synthetic-first workflow that closes real vision gaps and improves model performance on the scenarios that matter most.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI\, making technology simple\, accessible\, and reliable.

Making Computer Vision Models Faster: An Introduction to TensorRT Optimization

Modern computer vision applications demand real-time performance, yet many deep learning models struggle with high latency during deployment. This talk introduces how TensorRT can significantly accelerate inference by applying optimizations such as layer fusion, precision calibration, and efficient memory management. Attendees will learn the core concepts behind TensorRT, how it integrates into existing CV pipelines, and how to measure and benchmark improvements. Through practical examples and performance comparisons, the session will demonstrate how substantial speedups can be achieved with minimal model-accuracy loss. By the end, participants will understand when and how to apply TensorRT to make their CV models production-ready.

About the Speaker

Tushar Gadhiya is a Technical Lead at Infocusp Innovations, specialising in deep learning, computer vision, graph learning, and agentic AI. My experience spans academic research as a PhD holder and industry work, where I have contributed to multiple patents.

Feb 5 - AI, ML and Computer Vision Meetup

Feb 5 - AI, ML and Computer Vision Meetup 2026-02-05 · 17:00

Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Feb 5, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Unlocking Visual Anomaly Detection: Navigating Challenges and Pioneering with Vision-Language Models

Visual anomaly detection (VAD) is pivotal for ensuring quality in manufacturing, medical imaging, and safety inspections, yet it continues to face challenges such as data scarcity, domain shifts, and the need for precise localization and reasoning. This seminar explores VAD fundamentals, core challenges, and recent advancements leveraging vision-language models and multimodal large language models (MLLMs). We contrast CLIP-based methods for efficient zero/few-shot detection with MLLM-driven reasoning for explainable, threshold-free outcomes. Drawing from recent studies, we highlight emerging trends, benchmarks, and future directions toward building adaptable, real-world VAD systems. This talk is designed for researchers and practitioners interested in AI-driven inspection and next-generation multimodal approaches.

About the Speaker

Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR, WACV, ICIP, IJCB, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.

Data-Centric Lessons To Improve Speech-Language Pretraining

Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs.

We focus on three research questions fundamental to speech-language pretraining data:

How to process raw web-crawled audio content for speech-text pretraining;
How to construct synthetic pretraining datasets to augment web-crawled data;
How to interleave (text, audio) segments into training sequences.

We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. He is also a part of the International Max Planck Research School for Intelligent Systems. He is mainly interested in understanding the generalisation properties of foundation models, both vision-language models (VLMs) and large multi-modal models (LMMs), through the lens of their pre-training and test data distributions. His research is funded by a Google PhD Fellowship in Machine Intelligence.

A Practical Pipeline for Synthetic Data with Nano Banana Pro + FiftyOne

Most computer-vision failures come from the rare cases, the dark corners, odd combinations, and edge conditions we never capture enough in real datasets. In this session, we walk through a practical end-to-end pipeline for generating targeted synthetic data using Google’s Nano Banana Pro and managing it with FiftyOne. We’ll explore how to translate dataset gaps into generation prompts, create thousands of high-quality synthetic images, automatically enrich them with metadata, and bring everything into FiftyOne for inspection, filtering, and validation. By the end, you’ll understand how to build a repeatable synthetic-first workflow that closes real vision gaps and improves model performance on the scenarios that matter most.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI\, making technology simple\, accessible\, and reliable.

Making Computer Vision Models Faster: An Introduction to TensorRT Optimization

Modern computer vision applications demand real-time performance, yet many deep learning models struggle with high latency during deployment. This talk introduces how TensorRT can significantly accelerate inference by applying optimizations such as layer fusion, precision calibration, and efficient memory management. Attendees will learn the core concepts behind TensorRT, how it integrates into existing CV pipelines, and how to measure and benchmark improvements. Through practical examples and performance comparisons, the session will demonstrate how substantial speedups can be achieved with minimal model-accuracy loss. By the end, participants will understand when and how to apply TensorRT to make their CV models production-ready.

About the Speaker

Tushar Gadhiya is a Technical Lead at Infocusp Innovations, specialising in deep learning, computer vision, graph learning, and agentic AI. My experience spans academic research as a PhD holder and industry work, where I have contributed to multiple patents.

Feb 5 - AI, ML and Computer Vision Meetup

Feb 5 - AI, ML and Computer Vision Meetup 2026-02-05 · 17:00

Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Feb 5, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Unlocking Visual Anomaly Detection: Navigating Challenges and Pioneering with Vision-Language Models

Visual anomaly detection (VAD) is pivotal for ensuring quality in manufacturing, medical imaging, and safety inspections, yet it continues to face challenges such as data scarcity, domain shifts, and the need for precise localization and reasoning. This seminar explores VAD fundamentals, core challenges, and recent advancements leveraging vision-language models and multimodal large language models (MLLMs). We contrast CLIP-based methods for efficient zero/few-shot detection with MLLM-driven reasoning for explainable, threshold-free outcomes. Drawing from recent studies, we highlight emerging trends, benchmarks, and future directions toward building adaptable, real-world VAD systems. This talk is designed for researchers and practitioners interested in AI-driven inspection and next-generation multimodal approaches.

About the Speaker

Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR, WACV, ICIP, IJCB, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.

Data-Centric Lessons To Improve Speech-Language Pretraining

Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs.

We focus on three research questions fundamental to speech-language pretraining data:

How to process raw web-crawled audio content for speech-text pretraining;
How to construct synthetic pretraining datasets to augment web-crawled data;
How to interleave (text, audio) segments into training sequences.

We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. He is also a part of the International Max Planck Research School for Intelligent Systems. He is mainly interested in understanding the generalisation properties of foundation models, both vision-language models (VLMs) and large multi-modal models (LMMs), through the lens of their pre-training and test data distributions. His research is funded by a Google PhD Fellowship in Machine Intelligence.

A Practical Pipeline for Synthetic Data with Nano Banana Pro + FiftyOne

Most computer-vision failures come from the rare cases, the dark corners, odd combinations, and edge conditions we never capture enough in real datasets. In this session, we walk through a practical end-to-end pipeline for generating targeted synthetic data using Google’s Nano Banana Pro and managing it with FiftyOne. We’ll explore how to translate dataset gaps into generation prompts, create thousands of high-quality synthetic images, automatically enrich them with metadata, and bring everything into FiftyOne for inspection, filtering, and validation. By the end, you’ll understand how to build a repeatable synthetic-first workflow that closes real vision gaps and improves model performance on the scenarios that matter most.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI\, making technology simple\, accessible\, and reliable.

Making Computer Vision Models Faster: An Introduction to TensorRT Optimization

Modern computer vision applications demand real-time performance, yet many deep learning models struggle with high latency during deployment. This talk introduces how TensorRT can significantly accelerate inference by applying optimizations such as layer fusion, precision calibration, and efficient memory management. Attendees will learn the core concepts behind TensorRT, how it integrates into existing CV pipelines, and how to measure and benchmark improvements. Through practical examples and performance comparisons, the session will demonstrate how substantial speedups can be achieved with minimal model-accuracy loss. By the end, participants will understand when and how to apply TensorRT to make their CV models production-ready.

About the Speaker

Tushar Gadhiya is a Technical Lead at Infocusp Innovations, specialising in deep learning, computer vision, graph learning, and agentic AI. My experience spans academic research as a PhD holder and industry work, where I have contributed to multiple patents.

Feb 5 - AI, ML and Computer Vision Meetup

Feb 5 - AI, ML and Computer Vision Meetup 2026-02-05 · 17:00

Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Feb 5, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Unlocking Visual Anomaly Detection: Navigating Challenges and Pioneering with Vision-Language Models

Visual anomaly detection (VAD) is pivotal for ensuring quality in manufacturing, medical imaging, and safety inspections, yet it continues to face challenges such as data scarcity, domain shifts, and the need for precise localization and reasoning. This seminar explores VAD fundamentals, core challenges, and recent advancements leveraging vision-language models and multimodal large language models (MLLMs). We contrast CLIP-based methods for efficient zero/few-shot detection with MLLM-driven reasoning for explainable, threshold-free outcomes. Drawing from recent studies, we highlight emerging trends, benchmarks, and future directions toward building adaptable, real-world VAD systems. This talk is designed for researchers and practitioners interested in AI-driven inspection and next-generation multimodal approaches.

About the Speaker

Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR, WACV, ICIP, IJCB, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.

Data-Centric Lessons To Improve Speech-Language Pretraining

Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs.

We focus on three research questions fundamental to speech-language pretraining data:

How to process raw web-crawled audio content for speech-text pretraining;
How to construct synthetic pretraining datasets to augment web-crawled data;
How to interleave (text, audio) segments into training sequences.

We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. He is also a part of the International Max Planck Research School for Intelligent Systems. He is mainly interested in understanding the generalisation properties of foundation models, both vision-language models (VLMs) and large multi-modal models (LMMs), through the lens of their pre-training and test data distributions. His research is funded by a Google PhD Fellowship in Machine Intelligence.

A Practical Pipeline for Synthetic Data with Nano Banana Pro + FiftyOne

Most computer-vision failures come from the rare cases, the dark corners, odd combinations, and edge conditions we never capture enough in real datasets. In this session, we walk through a practical end-to-end pipeline for generating targeted synthetic data using Google’s Nano Banana Pro and managing it with FiftyOne. We’ll explore how to translate dataset gaps into generation prompts, create thousands of high-quality synthetic images, automatically enrich them with metadata, and bring everything into FiftyOne for inspection, filtering, and validation. By the end, you’ll understand how to build a repeatable synthetic-first workflow that closes real vision gaps and improves model performance on the scenarios that matter most.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI\, making technology simple\, accessible\, and reliable.

Making Computer Vision Models Faster: An Introduction to TensorRT Optimization

Modern computer vision applications demand real-time performance, yet many deep learning models struggle with high latency during deployment. This talk introduces how TensorRT can significantly accelerate inference by applying optimizations such as layer fusion, precision calibration, and efficient memory management. Attendees will learn the core concepts behind TensorRT, how it integrates into existing CV pipelines, and how to measure and benchmark improvements. Through practical examples and performance comparisons, the session will demonstrate how substantial speedups can be achieved with minimal model-accuracy loss. By the end, participants will understand when and how to apply TensorRT to make their CV models production-ready.

About the Speaker

Tushar Gadhiya is a Technical Lead at Infocusp Innovations, specialising in deep learning, computer vision, graph learning, and agentic AI. My experience spans academic research as a PhD holder and industry work, where I have contributed to multiple patents.

Feb 5 - AI, ML and Computer Vision Meetup

Feb 5 - AI, ML and Computer Vision Meetup 2026-02-05 · 17:00

Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Feb 5, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Unlocking Visual Anomaly Detection: Navigating Challenges and Pioneering with Vision-Language Models

Visual anomaly detection (VAD) is pivotal for ensuring quality in manufacturing, medical imaging, and safety inspections, yet it continues to face challenges such as data scarcity, domain shifts, and the need for precise localization and reasoning. This seminar explores VAD fundamentals, core challenges, and recent advancements leveraging vision-language models and multimodal large language models (MLLMs). We contrast CLIP-based methods for efficient zero/few-shot detection with MLLM-driven reasoning for explainable, threshold-free outcomes. Drawing from recent studies, we highlight emerging trends, benchmarks, and future directions toward building adaptable, real-world VAD systems. This talk is designed for researchers and practitioners interested in AI-driven inspection and next-generation multimodal approaches.

About the Speaker

Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR, WACV, ICIP, IJCB, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.

Data-Centric Lessons To Improve Speech-Language Pretraining

Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs.

We focus on three research questions fundamental to speech-language pretraining data:

How to process raw web-crawled audio content for speech-text pretraining;
How to construct synthetic pretraining datasets to augment web-crawled data;
How to interleave (text, audio) segments into training sequences.

We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. He is also a part of the International Max Planck Research School for Intelligent Systems. He is mainly interested in understanding the generalisation properties of foundation models, both vision-language models (VLMs) and large multi-modal models (LMMs), through the lens of their pre-training and test data distributions. His research is funded by a Google PhD Fellowship in Machine Intelligence.

A Practical Pipeline for Synthetic Data with Nano Banana Pro + FiftyOne

Most computer-vision failures come from the rare cases, the dark corners, odd combinations, and edge conditions we never capture enough in real datasets. In this session, we walk through a practical end-to-end pipeline for generating targeted synthetic data using Google’s Nano Banana Pro and managing it with FiftyOne. We’ll explore how to translate dataset gaps into generation prompts, create thousands of high-quality synthetic images, automatically enrich them with metadata, and bring everything into FiftyOne for inspection, filtering, and validation. By the end, you’ll understand how to build a repeatable synthetic-first workflow that closes real vision gaps and improves model performance on the scenarios that matter most.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI\, making technology simple\, accessible\, and reliable.

Making Computer Vision Models Faster: An Introduction to TensorRT Optimization

Modern computer vision applications demand real-time performance, yet many deep learning models struggle with high latency during deployment. This talk introduces how TensorRT can significantly accelerate inference by applying optimizations such as layer fusion, precision calibration, and efficient memory management. Attendees will learn the core concepts behind TensorRT, how it integrates into existing CV pipelines, and how to measure and benchmark improvements. Through practical examples and performance comparisons, the session will demonstrate how substantial speedups can be achieved with minimal model-accuracy loss. By the end, participants will understand when and how to apply TensorRT to make their CV models production-ready.

About the Speaker

Tushar Gadhiya is a Technical Lead at Infocusp Innovations, specialising in deep learning, computer vision, graph learning, and agentic AI. My experience spans academic research as a PhD holder and industry work, where I have contributed to multiple patents.

Feb 5 - AI, ML and Computer Vision Meetup

Feb 5 - AI, ML and Computer Vision Meetup 2026-02-05 · 17:00

Join our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

Feb 5, 2026 9 - 11 AM Pacific Online. Register for the Zoom!

Unlocking Visual Anomaly Detection: Navigating Challenges and Pioneering with Vision-Language Models

Visual anomaly detection (VAD) is pivotal for ensuring quality in manufacturing, medical imaging, and safety inspections, yet it continues to face challenges such as data scarcity, domain shifts, and the need for precise localization and reasoning. This seminar explores VAD fundamentals, core challenges, and recent advancements leveraging vision-language models and multimodal large language models (MLLMs). We contrast CLIP-based methods for efficient zero/few-shot detection with MLLM-driven reasoning for explainable, threshold-free outcomes. Drawing from recent studies, we highlight emerging trends, benchmarks, and future directions toward building adaptable, real-world VAD systems. This talk is designed for researchers and practitioners interested in AI-driven inspection and next-generation multimodal approaches.

About the Speaker

Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR, WACV, ICIP, IJCB, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.

Data-Centric Lessons To Improve Speech-Language Pretraining

Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs.

We focus on three research questions fundamental to speech-language pretraining data:

How to process raw web-crawled audio content for speech-text pretraining;
How to construct synthetic pretraining datasets to augment web-crawled data;
How to interleave (text, audio) segments into training sequences.

We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.

About the Speaker

Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. He is also a part of the International Max Planck Research School for Intelligent Systems. He is mainly interested in understanding the generalisation properties of foundation models, both vision-language models (VLMs) and large multi-modal models (LMMs), through the lens of their pre-training and test data distributions. His research is funded by a Google PhD Fellowship in Machine Intelligence.

A Practical Pipeline for Synthetic Data with Nano Banana Pro + FiftyOne

Most computer-vision failures come from the rare cases, the dark corners, odd combinations, and edge conditions we never capture enough in real datasets. In this session, we walk through a practical end-to-end pipeline for generating targeted synthetic data using Google’s Nano Banana Pro and managing it with FiftyOne. We’ll explore how to translate dataset gaps into generation prompts, create thousands of high-quality synthetic images, automatically enrich them with metadata, and bring everything into FiftyOne for inspection, filtering, and validation. By the end, you’ll understand how to build a repeatable synthetic-first workflow that closes real vision gaps and improves model performance on the scenarios that matter most.

About the Speaker

Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow\, Docker\, and OpenCV. I started as a software developer\, moved into AI\, led teams\, and served as CTO. Today\, I connect code and community to build open\, production-ready AI\, making technology simple\, accessible\, and reliable.

Making Computer Vision Models Faster: An Introduction to TensorRT Optimization

Modern computer vision applications demand real-time performance, yet many deep learning models struggle with high latency during deployment. This talk introduces how TensorRT can significantly accelerate inference by applying optimizations such as layer fusion, precision calibration, and efficient memory management. Attendees will learn the core concepts behind TensorRT, how it integrates into existing CV pipelines, and how to measure and benchmark improvements. Through practical examples and performance comparisons, the session will demonstrate how substantial speedups can be achieved with minimal model-accuracy loss. By the end, participants will understand when and how to apply TensorRT to make their CV models production-ready.

About the Speaker

Tushar Gadhiya is a Technical Lead at Infocusp Innovations, specialising in deep learning, computer vision, graph learning, and agentic AI. My experience spans academic research as a PhD holder and industry work, where I have contributed to multiple patents.

Feb 5 - AI, ML and Computer Vision Meetup

AI Seminar #3: GenAI and AI Agent with Google and Intel 2025-03-26 · 16:00

Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link).

Join us for a series of AI-focused webinars designed to enhance your development skills, accelerate productivity, and explore the latest AI innovations. Whether you're building local LLMs, optimizing AI workflows, or deploying intelligent AI agents, these sessions—led by industry experts—will provide invaluable insights and hands-on experience.

Each session is tailored to different skill levels, from novice to advanced developers, offering deep technical insights and real-world applications. Register for each of the sessions:

Mar 26th (Intel): Building AI assistants and Agents in the enterprise, RSVP->
April 3 (Google): AI Seminar (Virtual S15)
April 5: AI Study Group (Virtual) Session 3 - MCP
If you can't make to the live session, still register to receive recordings. *

Session #3: Building AI assistants and Agents in the enterprise Speakers: Dr. Amr Awadallah (Vectara) , Ofer Mendelevitch (Vectara) Abstract: Retrieval Augmented Generation (RAG) is widely recognized as an effective means to empower knowledge workers with low-hallucination results, maximize privacy and security, improve scalability, and foster explainability. Explore the proven techniques for deploying a trusted RAG system versus building a complex, difficult-to-manage approach.

Local and Global AI Community on Discord Join us on discord for local and global AI tech community:

Events chat: chat and connect with speakers and global and local attendees;
Learning AI: events, learning materials, study groups;
Startups: innovation, projects collaborations, founders/co-founders;
Jobs and Careers: job openings, post resumes, hiring managers

AI Seminar #3: GenAI and AI Agent with Google and Intel

Async, Concurrency, and Batching: The Vertex AI LLMOps Trinity 2024-09-12 · 08:00

In this session, you will learn how you can leverage Gemini's 2M context window and multimodal (text, image, audio & video) input to build a question-answering system for financial analysis. You will explore Vertex Gemini API, Vertex text-embeddings and Gemini's cross-modal reasoning capabilities.

vertex gemini api vertex text-embeddings gemini cross-modal reasoning gemini 2m context window multimodal input

Session 5: Google AI Seminar (Virtual)

Tech Talk: Async, Concurrency, and Batching: The Vertex AI LLMOps Trinity 2024-09-12 · 08:00

In this session, you will learn how you can leverage Gemini's 2M context window and multimodal (text, image, audio & video) input to build a question-answering system for financial analysis. You will explore Vertex Gemini API, Vertex text-embeddings and Gemini's cross-modal reasoning capabilities.

vertex ai gemini gemini api text embeddings

Session 5: Google AI Seminar (Virtual)

Async, Concurrency, and Batching: The Vertex AI LLMOps Trinity 2024-09-12 · 08:00

In this session, you will learn how you can leverage Gemini's 2M context window and multimodal (text, image, audio & video) input to build a question-answering system for financial analysis. You will explore Vertex Gemini API, Vertex text-embeddings and Gemini's cross-modal reasoning capabilities.

vertex ai gemini llmops text embeddings multimodal reasoning

Session 5: Google AI Seminar (Virtual)

Session 5: Google AI Seminar (Virtual) 2024-09-12 · 08:00

Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link).

Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.

All sessions:

Jul 17th: Session 1, RSVP.
Jul 31st: Session 2, RSVP.
Aug 14th: Session 3, RSVP.
Aug 28th: Session 4, RSVP.
Sep 12th: Session 5, RSVP.

Tech Talk: Async, Concurrency, and Batching: The Vertex AI LLMOps Trinity Speaker: Lavi Nigam (Google) Abstract: In this session, you will learn how you can leverage Gemini's 2M context window and multimodal (text, image, audio & video) input to build a question-answering system for financial analysis. You will explore Vertex Gemini API, Vertex text-embeddings and Gemini's cross-modal reasoning capabilities.

Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only have the chance to speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 350K+ AI developers worldwide.

AICamp Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations

Session 5: Google AI Seminar (Virtual)

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4 2024-08-28 · 16:00

Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link).

Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.

All sessions:

Jul 17th: Session 1, RSVP.
Jul 31st: Session 2, RSVP.
Aug 14th: Session 3, RSVP.
Aug 28th: Session 4, RSVP.
Sep 12th: Session 5, RSVP.

Tech Talk: Building Multimodal Question-Answering (RAG) system with Gemini Speaker: Lavi Nigam (Google) Abstract: In this session, you will learn how you can leverage Gemini's 2M context window and multimodal (text, image, audio & video) input to build a question-answering system for financial analysis. You will explore Vertex Gemini API, Vertex text-embeddings and Gemini's cross-modal reasoning capabilities.

Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only have the chance to speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 350K+ AI developers worldwide.

AICamp Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4 2024-08-28 · 16:00

Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link).

Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.

All sessions:

Jul 17th: Session 1, RSVP.
Jul 31st: Session 2, RSVP.
Aug 14th: Session 3, RSVP.
Aug 28th: Session 4, RSVP.
Sep 12th: Session 5, RSVP.

Tech Talk: Building Multimodal Question-Answering (RAG) system with Gemini Speaker: Lavi Nigam (Google) Abstract: In this session, you will learn how you can leverage Gemini's 2M context window and multimodal (text, image, audio & video) input to build a question-answering system for financial analysis. You will explore Vertex Gemini API, Vertex text-embeddings and Gemini's cross-modal reasoning capabilities.

Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only have the chance to speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 350K+ AI developers worldwide.

AICamp Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4 2024-08-28 · 16:00

Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link).

Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.

All sessions:

Jul 17th: Session 1, RSVP.
Jul 31st: Session 2, RSVP.
Aug 14th: Session 3, RSVP.
Aug 28th: Session 4, RSVP.
Sep 12th: Session 5, RSVP.

Tech Talk: Building Multimodal Question-Answering (RAG) system with Gemini Speaker: Lavi Nigam (Google) Abstract: In this session, you will learn how you can leverage Gemini's 2M context window and multimodal (text, image, audio & video) input to build a question-answering system for financial analysis. You will explore Vertex Gemini API, Vertex text-embeddings and Gemini's cross-modal reasoning capabilities.

Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only have the chance to speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 350K+ AI developers worldwide.

AICamp Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4 2024-08-28 · 16:00

Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link).

Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.

All sessions:

Jul 17th: Session 1, RSVP.
Jul 31st: Session 2, RSVP.
Aug 14th: Session 3, RSVP.
Aug 28th: Session 4, RSVP.
Sep 12th: Session 5, RSVP.

Tech Talk: Building Multimodal Question-Answering (RAG) system with Gemini Speaker: Lavi Nigam (Google) Abstract: In this session, you will learn how you can leverage Gemini's 2M context window and multimodal (text, image, audio & video) input to build a question-answering system for financial analysis. You will explore Vertex Gemini API, Vertex text-embeddings and Gemini's cross-modal reasoning capabilities.

Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only have the chance to speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 350K+ AI developers worldwide.

AICamp Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4 2024-08-28 · 16:00

Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link).

Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.

All sessions:

Jul 17th: Session 1, RSVP.
Jul 31st: Session 2, RSVP.
Aug 14th: Session 3, RSVP.
Aug 28th: Session 4, RSVP.
Sep 12th: Session 5, RSVP.

Tech Talk: Building Multimodal Question-Answering (RAG) system with Gemini Speaker: Lavi Nigam (Google) Abstract: In this session, you will learn how you can leverage Gemini's 2M context window and multimodal (text, image, audio & video) input to build a question-answering system for financial analysis. You will explore Vertex Gemini API, Vertex text-embeddings and Gemini's cross-modal reasoning capabilities.

Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only have the chance to speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 350K+ AI developers worldwide.

AICamp Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 4

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 3 2024-08-14 · 16:00

Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link).

Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.

All sessions:

Jul 17th: Session 1, RSVP.
Jul 31st: Session 2, RSVP.
Aug 6th: Session 3, RSVP.
Aug 8th: Session 4, RSVP.
Sep 12th: Session 5, RSVP.

Tech Talk: Multi-agent games with CrewAI and VertexAI Speaker: Peter Danenberg (Google) Abstract: In this session, I will be discussing Google Gemini, plays games: multi-agent games with CrewAI and VertexAI

Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only have the chance to speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 350K+ AI developers worldwide.

AICamp Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 3

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 3 2024-08-14 · 16:00

Important: Register here to receive your customized joining link. RSVP on meetup will NOT receive joining link. RSVP on meetup will NOT receive joining link.

Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world.

All sessions:

Jul 17th: Session 1, RSVP.
Jul 31st: Session 2, RSVP.
Aug 14th: Session 3, RSVP.
Aug 28th: Session 4, RSVP.
Sep 12th: Session 5, RSVP.

Tech Talk: Multi-agent games with CrewAI and VertexAI Speaker: Peter Danenberg (Google) Abstract: In this session, I will be discussing Google Gemini, plays games: multi-agent games with CrewAI and VertexAI

Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only have the chance to speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 350K+ AI developers worldwide.

AICamp Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations

Google AI Seminar (Virtual) on Gemini, Gemma and Vertex AI - Session 3

Activities & events