talk-data.com
People (8 results)
See all 8 →Aishwarya Srinivasan
Data scientist and deep learning researcher · Fireworks AI
BARBARA LATULIPPE
CDO, Enterprise Data and AI · Takeda Pharmaceuticals - USA
Anagha Vyas
Director, Data Science, AI, Commercial Technologies and Enterprise Digital · Cardinal Health
Activities & events
| Title & Speakers | Event |
|---|---|
|
Feb 11 - Visual AI for Video Use Cases
2026-02-11 · 17:00
Join our virtual Meetup to hear talks from experts on cutting-edge topics at the intersection of Visual AI and video use cases. Time and Location Feb 11, 2026 9 - 11 AM Pacific Online. Register for the Zoom! VIDEOP2R: Video Understanding from Perception to Reasoning Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning. About the Speaker Yifan Jiang is a third-year Ph.D. student in the Information Science Institute at the University of Southern California (USC-ISI), advised by Dr. Jay Pujara, focusing on natural language processing, commonsense reasoning and multimodality large language models. Layer-Aware Video Composition via Split-then-Merge Split-then-Merge (StM) is a novel generative framework that overcomes data scarcity in video composition by splitting unlabeled videos into separate foreground and background layers for self-supervised learning. By utilizing a transformation-aware training pipeline with multi-layer fusion, the model learns to realistically compose dynamic subjects into diverse scenes without relying on expensive annotated datasets. This presentation will cover the problem of video composition and the details of StM, an approach looking at this problem from a generative AI perspective. We will conclude by demonstrating how StM is working, and outperforming state-of-the-art methods in both quantitative benchmarks and qualitative evaluations. About the Speaker Ozgur Kara is a 4th year Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), advised by Founder Professor James M. Rehg. His research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. Video Reasoning for Worker Safety Ensuring worker safety in industrial environments requires more than object detection or motion tracking; it demands a genuine understanding of human actions, context, and risk. This talk demonstrates how NVIDIA Cosmos Reason, a multimodal video-reasoning model, interprets workplace scenarios with sophisticated temporal and semantic awareness, identifying nuanced safe and unsafe behaviors that conventional vision systems frequently overlook. By integrating Cosmos Reason with FiftyOne, users achieve both automated safety assessments and transparent, interpretable explanations revealing why specific actions are deemed hazardous. Using a curated worker-safety dataset of authentic factory-floor footage, we show how video reasoning enhances audits, training, and compliance workflows while minimizing dependence on extensive labeled datasets. The resulting system demonstrates the potential of explainable multimodal AI to enable safer, more informed decision-making across manufacturing, logistics, construction, healthcare, and other sectors where understanding human behavior is essential. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Video Intelligence Is Going Agentic Video content has become ubiquitous in our digital world, yet the tools for working with video have remained largely unchanged for decades. This talk explores how the convergence of foundation models and agent architectures is fundamentally transforming video interaction and creation. We'll examine how video-native foundation models, multimodal interfaces, and agent transparency are reshaping enterprise media workflows through a deep dive into Jockey, a pioneering video agent system. About the Speaker James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem. |
Feb 11 - Visual AI for Video Use Cases
|
|
Feb 11 - Visual AI for Video Use Cases
2026-02-11 · 17:00
Join our virtual Meetup to hear talks from experts on cutting-edge topics at the intersection of Visual AI and video use cases. Time and Location Feb 11, 2026 9 - 11 AM Pacific Online. Register for the Zoom! VIDEOP2R: Video Understanding from Perception to Reasoning Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning. About the Speaker Yifan Jiang is a third-year Ph.D. student in the Information Science Institute at the University of Southern California (USC-ISI), advised by Dr. Jay Pujara, focusing on natural language processing, commonsense reasoning and multimodality large language models. Layer-Aware Video Composition via Split-then-Merge Split-then-Merge (StM) is a novel generative framework that overcomes data scarcity in video composition by splitting unlabeled videos into separate foreground and background layers for self-supervised learning. By utilizing a transformation-aware training pipeline with multi-layer fusion, the model learns to realistically compose dynamic subjects into diverse scenes without relying on expensive annotated datasets. This presentation will cover the problem of video composition and the details of StM, an approach looking at this problem from a generative AI perspective. We will conclude by demonstrating how StM is working, and outperforming state-of-the-art methods in both quantitative benchmarks and qualitative evaluations. About the Speaker Ozgur Kara is a 4th year Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), advised by Founder Professor James M. Rehg. His research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. Video Reasoning for Worker Safety Ensuring worker safety in industrial environments requires more than object detection or motion tracking; it demands a genuine understanding of human actions, context, and risk. This talk demonstrates how NVIDIA Cosmos Reason, a multimodal video-reasoning model, interprets workplace scenarios with sophisticated temporal and semantic awareness, identifying nuanced safe and unsafe behaviors that conventional vision systems frequently overlook. By integrating Cosmos Reason with FiftyOne, users achieve both automated safety assessments and transparent, interpretable explanations revealing why specific actions are deemed hazardous. Using a curated worker-safety dataset of authentic factory-floor footage, we show how video reasoning enhances audits, training, and compliance workflows while minimizing dependence on extensive labeled datasets. The resulting system demonstrates the potential of explainable multimodal AI to enable safer, more informed decision-making across manufacturing, logistics, construction, healthcare, and other sectors where understanding human behavior is essential. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Video Intelligence Is Going Agentic Video content has become ubiquitous in our digital world, yet the tools for working with video have remained largely unchanged for decades. This talk explores how the convergence of foundation models and agent architectures is fundamentally transforming video interaction and creation. We'll examine how video-native foundation models, multimodal interfaces, and agent transparency are reshaping enterprise media workflows through a deep dive into Jockey, a pioneering video agent system. About the Speaker James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem. |
Feb 11 - Visual AI for Video Use Cases
|
|
Feb 11 - Visual AI for Video Use Cases
2026-02-11 · 17:00
Join our virtual Meetup to hear talks from experts on cutting-edge topics at the intersection of Visual AI and video use cases. Time and Location Feb 11, 2026 9 - 11 AM Pacific Online. Register for the Zoom! VIDEOP2R: Video Understanding from Perception to Reasoning Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning. About the Speaker Yifan Jiang is a third-year Ph.D. student in the Information Science Institute at the University of Southern California (USC-ISI), advised by Dr. Jay Pujara, focusing on natural language processing, commonsense reasoning and multimodality large language models. Layer-Aware Video Composition via Split-then-Merge Split-then-Merge (StM) is a novel generative framework that overcomes data scarcity in video composition by splitting unlabeled videos into separate foreground and background layers for self-supervised learning. By utilizing a transformation-aware training pipeline with multi-layer fusion, the model learns to realistically compose dynamic subjects into diverse scenes without relying on expensive annotated datasets. This presentation will cover the problem of video composition and the details of StM, an approach looking at this problem from a generative AI perspective. We will conclude by demonstrating how StM is working, and outperforming state-of-the-art methods in both quantitative benchmarks and qualitative evaluations. About the Speaker Ozgur Kara is a 4th year Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), advised by Founder Professor James M. Rehg. His research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. Video Reasoning for Worker Safety Ensuring worker safety in industrial environments requires more than object detection or motion tracking; it demands a genuine understanding of human actions, context, and risk. This talk demonstrates how NVIDIA Cosmos Reason, a multimodal video-reasoning model, interprets workplace scenarios with sophisticated temporal and semantic awareness, identifying nuanced safe and unsafe behaviors that conventional vision systems frequently overlook. By integrating Cosmos Reason with FiftyOne, users achieve both automated safety assessments and transparent, interpretable explanations revealing why specific actions are deemed hazardous. Using a curated worker-safety dataset of authentic factory-floor footage, we show how video reasoning enhances audits, training, and compliance workflows while minimizing dependence on extensive labeled datasets. The resulting system demonstrates the potential of explainable multimodal AI to enable safer, more informed decision-making across manufacturing, logistics, construction, healthcare, and other sectors where understanding human behavior is essential. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Video Intelligence Is Going Agentic Video content has become ubiquitous in our digital world, yet the tools for working with video have remained largely unchanged for decades. This talk explores how the convergence of foundation models and agent architectures is fundamentally transforming video interaction and creation. We'll examine how video-native foundation models, multimodal interfaces, and agent transparency are reshaping enterprise media workflows through a deep dive into Jockey, a pioneering video agent system. About the Speaker James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem. |
Feb 11 - Visual AI for Video Use Cases
|
|
Feb 11 - Visual AI for Video Use Cases
2026-02-11 · 17:00
Join our virtual Meetup to hear talks from experts on cutting-edge topics at the intersection of Visual AI and video use cases. Time and Location Feb 11, 2026 9 - 11 AM Pacific Online. Register for the Zoom! VIDEOP2R: Video Understanding from Perception to Reasoning Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning. About the Speaker Yifan Jiang is a third-year Ph.D. student in the Information Science Institute at the University of Southern California (USC-ISI), advised by Dr. Jay Pujara, focusing on natural language processing, commonsense reasoning and multimodality large language models. Layer-Aware Video Composition via Split-then-Merge Split-then-Merge (StM) is a novel generative framework that overcomes data scarcity in video composition by splitting unlabeled videos into separate foreground and background layers for self-supervised learning. By utilizing a transformation-aware training pipeline with multi-layer fusion, the model learns to realistically compose dynamic subjects into diverse scenes without relying on expensive annotated datasets. This presentation will cover the problem of video composition and the details of StM, an approach looking at this problem from a generative AI perspective. We will conclude by demonstrating how StM is working, and outperforming state-of-the-art methods in both quantitative benchmarks and qualitative evaluations. About the Speaker Ozgur Kara is a 4th year Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), advised by Founder Professor James M. Rehg. His research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. Video Reasoning for Worker Safety Ensuring worker safety in industrial environments requires more than object detection or motion tracking; it demands a genuine understanding of human actions, context, and risk. This talk demonstrates how NVIDIA Cosmos Reason, a multimodal video-reasoning model, interprets workplace scenarios with sophisticated temporal and semantic awareness, identifying nuanced safe and unsafe behaviors that conventional vision systems frequently overlook. By integrating Cosmos Reason with FiftyOne, users achieve both automated safety assessments and transparent, interpretable explanations revealing why specific actions are deemed hazardous. Using a curated worker-safety dataset of authentic factory-floor footage, we show how video reasoning enhances audits, training, and compliance workflows while minimizing dependence on extensive labeled datasets. The resulting system demonstrates the potential of explainable multimodal AI to enable safer, more informed decision-making across manufacturing, logistics, construction, healthcare, and other sectors where understanding human behavior is essential. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Video Intelligence Is Going Agentic Video content has become ubiquitous in our digital world, yet the tools for working with video have remained largely unchanged for decades. This talk explores how the convergence of foundation models and agent architectures is fundamentally transforming video interaction and creation. We'll examine how video-native foundation models, multimodal interfaces, and agent transparency are reshaping enterprise media workflows through a deep dive into Jockey, a pioneering video agent system. About the Speaker James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem. |
Feb 11 - Visual AI for Video Use Cases
|
|
Feb 11 - Visual AI for Video Use Cases
2026-02-11 · 17:00
Join our virtual Meetup to hear talks from experts on cutting-edge topics at the intersection of Visual AI and video use cases. Time and Location Feb 11, 2026 9 - 11 AM Pacific Online. Register for the Zoom! VIDEOP2R: Video Understanding from Perception to Reasoning Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning. About the Speaker Yifan Jiang is a third-year Ph.D. student in the Information Science Institute at the University of Southern California (USC-ISI), advised by Dr. Jay Pujara, focusing on natural language processing, commonsense reasoning and multimodality large language models. Layer-Aware Video Composition via Split-then-Merge Split-then-Merge (StM) is a novel generative framework that overcomes data scarcity in video composition by splitting unlabeled videos into separate foreground and background layers for self-supervised learning. By utilizing a transformation-aware training pipeline with multi-layer fusion, the model learns to realistically compose dynamic subjects into diverse scenes without relying on expensive annotated datasets. This presentation will cover the problem of video composition and the details of StM, an approach looking at this problem from a generative AI perspective. We will conclude by demonstrating how StM is working, and outperforming state-of-the-art methods in both quantitative benchmarks and qualitative evaluations. About the Speaker Ozgur Kara is a 4th year Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), advised by Founder Professor James M. Rehg. His research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. Video Reasoning for Worker Safety Ensuring worker safety in industrial environments requires more than object detection or motion tracking; it demands a genuine understanding of human actions, context, and risk. This talk demonstrates how NVIDIA Cosmos Reason, a multimodal video-reasoning model, interprets workplace scenarios with sophisticated temporal and semantic awareness, identifying nuanced safe and unsafe behaviors that conventional vision systems frequently overlook. By integrating Cosmos Reason with FiftyOne, users achieve both automated safety assessments and transparent, interpretable explanations revealing why specific actions are deemed hazardous. Using a curated worker-safety dataset of authentic factory-floor footage, we show how video reasoning enhances audits, training, and compliance workflows while minimizing dependence on extensive labeled datasets. The resulting system demonstrates the potential of explainable multimodal AI to enable safer, more informed decision-making across manufacturing, logistics, construction, healthcare, and other sectors where understanding human behavior is essential. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Video Intelligence Is Going Agentic Video content has become ubiquitous in our digital world, yet the tools for working with video have remained largely unchanged for decades. This talk explores how the convergence of foundation models and agent architectures is fundamentally transforming video interaction and creation. We'll examine how video-native foundation models, multimodal interfaces, and agent transparency are reshaping enterprise media workflows through a deep dive into Jockey, a pioneering video agent system. About the Speaker James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem. |
Feb 11 - Visual AI for Video Use Cases
|
|
Feb 11 - Visual AI for Video Use Cases
2026-02-11 · 17:00
Join our virtual Meetup to hear talks from experts on cutting-edge topics at the intersection of Visual AI and video use cases. Time and Location Feb 11, 2026 9 - 11 AM Pacific Online. Register for the Zoom! VIDEOP2R: Video Understanding from Perception to Reasoning Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning. About the Speaker Yifan Jiang is a third-year Ph.D. student in the Information Science Institute at the University of Southern California (USC-ISI), advised by Dr. Jay Pujara, focusing on natural language processing, commonsense reasoning and multimodality large language models. Layer-Aware Video Composition via Split-then-Merge Split-then-Merge (StM) is a novel generative framework that overcomes data scarcity in video composition by splitting unlabeled videos into separate foreground and background layers for self-supervised learning. By utilizing a transformation-aware training pipeline with multi-layer fusion, the model learns to realistically compose dynamic subjects into diverse scenes without relying on expensive annotated datasets. This presentation will cover the problem of video composition and the details of StM, an approach looking at this problem from a generative AI perspective. We will conclude by demonstrating how StM is working, and outperforming state-of-the-art methods in both quantitative benchmarks and qualitative evaluations. About the Speaker Ozgur Kara is a 4th year Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), advised by Founder Professor James M. Rehg. His research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. Video Reasoning for Worker Safety Ensuring worker safety in industrial environments requires more than object detection or motion tracking; it demands a genuine understanding of human actions, context, and risk. This talk demonstrates how NVIDIA Cosmos Reason, a multimodal video-reasoning model, interprets workplace scenarios with sophisticated temporal and semantic awareness, identifying nuanced safe and unsafe behaviors that conventional vision systems frequently overlook. By integrating Cosmos Reason with FiftyOne, users achieve both automated safety assessments and transparent, interpretable explanations revealing why specific actions are deemed hazardous. Using a curated worker-safety dataset of authentic factory-floor footage, we show how video reasoning enhances audits, training, and compliance workflows while minimizing dependence on extensive labeled datasets. The resulting system demonstrates the potential of explainable multimodal AI to enable safer, more informed decision-making across manufacturing, logistics, construction, healthcare, and other sectors where understanding human behavior is essential. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Video Intelligence Is Going Agentic Video content has become ubiquitous in our digital world, yet the tools for working with video have remained largely unchanged for decades. This talk explores how the convergence of foundation models and agent architectures is fundamentally transforming video interaction and creation. We'll examine how video-native foundation models, multimodal interfaces, and agent transparency are reshaping enterprise media workflows through a deep dive into Jockey, a pioneering video agent system. About the Speaker James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem. |
Feb 11 - Visual AI for Video Use Cases
|
|
Feb 11 - Visual AI for Video Use Cases
2026-02-11 · 17:00
Join our virtual Meetup to hear talks from experts on cutting-edge topics at the intersection of Visual AI and video use cases. Time and Location Feb 11, 2026 9 - 11 AM Pacific Online. Register for the Zoom! VIDEOP2R: Video Understanding from Perception to Reasoning Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning. About the Speaker Yifan Jiang is a third-year Ph.D. student in the Information Science Institute at the University of Southern California (USC-ISI), advised by Dr. Jay Pujara, focusing on natural language processing, commonsense reasoning and multimodality large language models. Layer-Aware Video Composition via Split-then-Merge Split-then-Merge (StM) is a novel generative framework that overcomes data scarcity in video composition by splitting unlabeled videos into separate foreground and background layers for self-supervised learning. By utilizing a transformation-aware training pipeline with multi-layer fusion, the model learns to realistically compose dynamic subjects into diverse scenes without relying on expensive annotated datasets. This presentation will cover the problem of video composition and the details of StM, an approach looking at this problem from a generative AI perspective. We will conclude by demonstrating how StM is working, and outperforming state-of-the-art methods in both quantitative benchmarks and qualitative evaluations. About the Speaker Ozgur Kara is a 4th year Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), advised by Founder Professor James M. Rehg. His research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. Video Reasoning for Worker Safety Ensuring worker safety in industrial environments requires more than object detection or motion tracking; it demands a genuine understanding of human actions, context, and risk. This talk demonstrates how NVIDIA Cosmos Reason, a multimodal video-reasoning model, interprets workplace scenarios with sophisticated temporal and semantic awareness, identifying nuanced safe and unsafe behaviors that conventional vision systems frequently overlook. By integrating Cosmos Reason with FiftyOne, users achieve both automated safety assessments and transparent, interpretable explanations revealing why specific actions are deemed hazardous. Using a curated worker-safety dataset of authentic factory-floor footage, we show how video reasoning enhances audits, training, and compliance workflows while minimizing dependence on extensive labeled datasets. The resulting system demonstrates the potential of explainable multimodal AI to enable safer, more informed decision-making across manufacturing, logistics, construction, healthcare, and other sectors where understanding human behavior is essential. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Video Intelligence Is Going Agentic Video content has become ubiquitous in our digital world, yet the tools for working with video have remained largely unchanged for decades. This talk explores how the convergence of foundation models and agent architectures is fundamentally transforming video interaction and creation. We'll examine how video-native foundation models, multimodal interfaces, and agent transparency are reshaping enterprise media workflows through a deep dive into Jockey, a pioneering video agent system. About the Speaker James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem. |
Feb 11 - Visual AI for Video Use Cases
|
|
Feb 11 - Visual AI for Video Use Cases
2026-02-11 · 17:00
Join our virtual Meetup to hear talks from experts on cutting-edge topics at the intersection of Visual AI and video use cases. Time and Location Feb 11, 2026 9 - 11 AM Pacific Online. Register for the Zoom! VIDEOP2R: Video Understanding from Perception to Reasoning Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven video reasoning and understanding benchmarks. Ablation studies further confirm the effectiveness of our process-aware modeling and PA-GRPO and demonstrate that model's perception output is information-sufficient for downstream reasoning. About the Speaker Yifan Jiang is a third-year Ph.D. student in the Information Science Institute at the University of Southern California (USC-ISI), advised by Dr. Jay Pujara, focusing on natural language processing, commonsense reasoning and multimodality large language models. Layer-Aware Video Composition via Split-then-Merge Split-then-Merge (StM) is a novel generative framework that overcomes data scarcity in video composition by splitting unlabeled videos into separate foreground and background layers for self-supervised learning. By utilizing a transformation-aware training pipeline with multi-layer fusion, the model learns to realistically compose dynamic subjects into diverse scenes without relying on expensive annotated datasets. This presentation will cover the problem of video composition and the details of StM, an approach looking at this problem from a generative AI perspective. We will conclude by demonstrating how StM is working, and outperforming state-of-the-art methods in both quantitative benchmarks and qualitative evaluations. About the Speaker Ozgur Kara is a 4th year Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), advised by Founder Professor James M. Rehg. His research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. Video Reasoning for Worker Safety Ensuring worker safety in industrial environments requires more than object detection or motion tracking; it demands a genuine understanding of human actions, context, and risk. This talk demonstrates how NVIDIA Cosmos Reason, a multimodal video-reasoning model, interprets workplace scenarios with sophisticated temporal and semantic awareness, identifying nuanced safe and unsafe behaviors that conventional vision systems frequently overlook. By integrating Cosmos Reason with FiftyOne, users achieve both automated safety assessments and transparent, interpretable explanations revealing why specific actions are deemed hazardous. Using a curated worker-safety dataset of authentic factory-floor footage, we show how video reasoning enhances audits, training, and compliance workflows while minimizing dependence on extensive labeled datasets. The resulting system demonstrates the potential of explainable multimodal AI to enable safer, more informed decision-making across manufacturing, logistics, construction, healthcare, and other sectors where understanding human behavior is essential. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. Video Intelligence Is Going Agentic Video content has become ubiquitous in our digital world, yet the tools for working with video have remained largely unchanged for decades. This talk explores how the convergence of foundation models and agent architectures is fundamentally transforming video interaction and creation. We'll examine how video-native foundation models, multimodal interfaces, and agent transparency are reshaping enterprise media workflows through a deep dive into Jockey, a pioneering video agent system. About the Speaker James Le currently leads the developer experience function at TwelveLabs - a startup building foundation models for video understanding. He previously operated in the MLOps space and ran a blog/podcast on the Data & AI infrastructure ecosystem. |
Feb 11 - Visual AI for Video Use Cases
|
|
Jan 22 - Women in AI
2026-01-22 · 23:00
Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd. Date, Time and Location Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom! Align Before You Recommend The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements. While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning. By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items. Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage About the Speaker Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts. Generalizable Vision-Language Models: Challenges, Advances, and Future Directions Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings. About the Speaker Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM. Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved! About the Speaker Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist. FiftyOne Labs: Enabling experimentation for the computer vision community FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product. About the Speaker Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data. |
Jan 22 - Women in AI
|
|
Jan 22 - Women in AI
2026-01-22 · 23:00
Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd. Date, Time and Location Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom! Align Before You Recommend The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements. While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning. By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items. Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage About the Speaker Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts. Generalizable Vision-Language Models: Challenges, Advances, and Future Directions Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings. About the Speaker Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM. Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved! About the Speaker Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist. FiftyOne Labs: Enabling experimentation for the computer vision community FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product. About the Speaker Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data. |
Jan 22 - Women in AI
|
|
Jan 22 - Women in AI
2026-01-22 · 23:00
Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd. Date, Time and Location Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom! Align Before You Recommend The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements. While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning. By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items. Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage About the Speaker Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts. Generalizable Vision-Language Models: Challenges, Advances, and Future Directions Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings. About the Speaker Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM. Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved! About the Speaker Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist. FiftyOne Labs: Enabling experimentation for the computer vision community FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product. About the Speaker Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data. |
Jan 22 - Women in AI
|
|
Jan 22 - Women in AI
2026-01-22 · 23:00
Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd. Date, Time and Location Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom! Align Before You Recommend The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements. While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning. By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items. Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage About the Speaker Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts. Generalizable Vision-Language Models: Challenges, Advances, and Future Directions Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings. About the Speaker Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM. Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved! About the Speaker Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist. FiftyOne Labs: Enabling experimentation for the computer vision community FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product. About the Speaker Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data. |
Jan 22 - Women in AI
|
|
Jan 22 - Women in AI
2026-01-22 · 23:00
Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd. Date, Time and Location Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom! Align Before You Recommend The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements. While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning. By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items. Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage About the Speaker Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts. Generalizable Vision-Language Models: Challenges, Advances, and Future Directions Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings. About the Speaker Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM. Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved! About the Speaker Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist. FiftyOne Labs: Enabling experimentation for the computer vision community FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product. About the Speaker Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data. |
Jan 22 - Women in AI
|
|
Jan 22 - Women in AI
2026-01-22 · 17:00
Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd. Date, Time and Location Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom! Align Before You Recommend The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements. While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning. By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items. Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage About the Speaker Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts. Generalizable Vision-Language Models: Challenges, Advances, and Future Directions Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings. About the Speaker Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM. Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved! About the Speaker Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist. FiftyOne Labs: Enabling experimentation for the computer vision community FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product. About the Speaker Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data. |
Jan 22 - Women in AI
|
|
Jan 22 - Women in AI
2026-01-22 · 17:00
Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd. Date, Time and Location Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom! Align Before You Recommend The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements. While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning. By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items. Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage About the Speaker Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts. Generalizable Vision-Language Models: Challenges, Advances, and Future Directions Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings. About the Speaker Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM. Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved! About the Speaker Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist. FiftyOne Labs: Enabling experimentation for the computer vision community FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product. About the Speaker Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data. |
Jan 22 - Women in AI
|
|
Jan 22 - Women in AI
2026-01-22 · 17:00
Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd. Date, Time and Location Jan 22, 2026 9 - 11 AM Pacific Online. Register for the Zoom! Align Before You Recommend The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements. While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning. By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items. Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage About the Speaker Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts. Generalizable Vision-Language Models: Challenges, Advances, and Future Directions Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs. While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored. This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings. About the Speaker Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM. Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved! About the Speaker Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat \~ Ethical AI governance advocate\, pioneering AI frameworks that prioritize emergent AI behavior & consciousness\, R&D\, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist. FiftyOne Labs: Enabling experimentation for the computer vision community FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product. About the Speaker Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data. |
Jan 22 - Women in AI
|
|
ClickHouse Gurgaon/Delhi Meetup
2026-01-10 · 05:00
Start 2026 with the ClickHouse India community in Gurgaon! Connect with fellow data practitioners and hear from industry experts through engaging talks focused on lessons learned, best practices, and modern data challenges. Agenda:
👉🏼 RSVP to secure your spot! Interested in speaking at this meetup or future ClickHouse events? 🎤Shoot an email to [email protected] and she'll be in touch. ******** 🎤 Session Details: Inside ClickStack: Engineering Observability for Scale Dive deep into ClickStack, ClickHouse’s fresh approach to observability built for engineers who care about speed, scale, and simplicity. We’ll unpack the technical architecture behind how ClickStack handles metrics, logs, and traces using ClickHouse as the backbone for real-time, high-cardinality analytics. Expect a hands-on look at ingestion pipelines, schema design patterns, query optimization, and the integrations that make ClickStack tick. Speaker: Rakesh Puttaswamy, Lead Solutions Architect @ ClickHouse 🎤 Session Details: Supercharging Personalised Notifications At Jobhai With ClickHouse Calculating personalized alerts for 2 million users is a data-heavy challenge that requires more than just standard indexing. This talk explores how Jobhai uses ClickHouse to power its morning notification pipeline, focusing on the architectural shifts and query optimizations that made our massive scale manageable and fast. Speaker: Sumit Kumar and Arvind Saini, Tech Leads @ Info Edge Sumit is a seasoned software engineer with deep expertise in databases, backend systems, and machine learning. For over six years, he has led the Jobhai engineering team, driving continuous improvements across their database infrastructure and user-facing systems while streamlining workflows through ongoing innovation. Connect with Sumit Kumar on LinkedIn. Arvind is a Tech Lead at Info Edge India Ltd with experience building and scaling backend systems for large consumer and enterprise platforms. Over the years, they have worked across system design, backend optimization, and data-driven services, contributing to initiatives such as notification platforms, workflow automation, and product revamps. Their work focuses on improving reliability, performance, and scalability of distributed systems, and they enjoy solving complex engineering problems while mentoring teams and driving technical excellence. 🎤 Session Details: Simplifying CDC: Migrating from Debezium to ClickPipes In this talk, Abhash will share their engineering team's journey migrating our core MySQL and MongoDB CDC flows to ClickPipes. We will contrast our previous architecture—where every schema change required manual intervention or complex Debezium configurations—with the new reality of ClickPipes' automated schema evolution, which seamlessly handles upstream schema changes and ingests flexible data without breaking pipelines. Speaker: Abhash Solanki, DevOps Engineer @ Spyne AI Abhash serves as a DevOps Engineer at Spyne, orchestrating the AWS infrastructure behind the company's data warehouse and CDC pipelines. Having managed complex self-hosted Debezium and Kafka clusters, he understands the operational overhead of running stateful data stacks in the cloud. He recently led the architectural shift to ClickHouse Cloud, focusing on eliminating engineering toil and automating schema evolution handling. 🎤 Session Details: Solving Analytics at Scale: From CDC to Actionable Insights As SAMARTH’s data volumes grew rapidly, our analytics systems faced challenges with frequent data changes and near real-time reporting. These challenges were compounded by the platform’s inherently high cardinality in multidimensional data models - spanning institutions, programmes, states, categories, workflow stages, and time, resulting in highly complex and dynamic query patterns. This talk describes how we evolved from basic CDC pipelines to a fast, reliable, and scalable near real-time analytics platform using ClickHouse. We share key design and operational learnings that enabled us to process continuous high-volume transactional data and deliver low-latency analytics for operational monitoring and policy-level decision-making. Speaker: Kunal Sharma, Software Developer @ Samarth eGov Kunal Sharma is a data-focused professional with experience in building scalable data pipelines. His work includes designing and implementing robust ETL/ELT workflows, data-driven decision engines, and large-scale analytics platforms. At SAMARTH, he has contributed to building near real-time analytics systems, including the implementation of ClickHouse for large-scale, low-latency analytics. |
ClickHouse Gurgaon/Delhi Meetup
|
|
July 24 - Women in AI
2025-07-24 · 16:00
Hear talks from experts on cutting-edge topics in AI, ML, and computer vision! When Jul 24, 2025 at 9 - 11 AM Pacific Where Online. Register for the Zoom Exploring Vision-Language-Action (VLA) Models: From LLMs to Embodied AI This talk will explore the evolution of foundation models, highlighting the shift from large language models (LLMs) to vision-language models (VLMs), and now to vision-language-action (VLA) models. We'll dive into the emerging field of robot instruction following—what it means, and how recent research is shaping its future. I will present insights from my 2024 work on natural language-based robot instruction following and connect it to more recent advancements driving progress in this domain. About the Speaker Shreya Sharma is a Research Engineer at Reality Labs, Meta, where she works on photorealistic human avatars for AR/VR applications. She holds a bachelor’s degree in Computer Science from IIT Delhi and a master’s in Robotics from Carnegie Mellon University. Shreya is also a member of the inaugural 2023 cohort of the Quad Fellowship. Her research interests lie at the intersection of robotics and vision foundation models. Farming with CLIP: Foundation Models for Biodiversity and Agriculture Using open-source tools, we will explore the power and limitations of foundation models in agriculture and biodiversity applications. Leveraging the BIOTROVE dataset. The largest publicly accessible biodiversity dataset curated from iNaturalist, we will showcase real-world use cases powered by vision-language models trained on 40 million captioned images. We focus on understanding zero-shot capabilities, taxonomy-aware evaluation, and data-centric curation workflows. We will demonstrate how to visualize, filter, evaluate, and augment data at scale. This session includes practical walkthroughs on embedding visualization with CLIP, dataset slicing by taxonomic hierarchy, identification of model failure modes, and building fine-tuned pest and crop monitoring models. Attendees will gain insights into how to apply multi-modal foundation models for critical challenges in agriculture, like ecosystem monitoring in farming. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry. Multi-modal AI in Medical Edge and Client Device Computing In this live demo, we explore the transformative potential of multi-modal AI in medical edge and client device computing, focusing on real-time inference on a local AI PC. Attendees will witness how users can upload medical images, such as X-Rays, and ask questions about the images to the AI model. Inference is executed locally on Intel's integrated GPU and NPU using OpenVINO, enabling developers without deep AI experience to create generative AI applications. About the Speaker Helena Klosterman is an AI Engineer at Intel, based in the Netherlands, Helena enables organizations to unlock the potential of AI with OpenVINO, Intel's AI inference runtime. She is passionate about democratizing AI, developer experience, and bridging the gap between complex AI technology and practical applications. The Business of AI The talk will focus on the importance of clearly defining a specific problem and a use case, how to quantify the potential benefits of an AI solution in terms of measurable outcomes, evaluating technical feasibility in terms of technical challenges and limitations of implementing an AI solution, and envisioning the future of enterprise AI. About the Speaker Milica Cvetkovic is an AI engineer and consultant driving the development and deployment of production-ready AI systems for diverse organizations. Her expertise spans custom machine learning, generative AI, and AI operationalization. With degrees in mathematics and statistics, she possesses a decade of experience in education and edtech, including curriculum design and machine learning instruction for technical and non-technical audiences. Prior to Google, Milica held a data scientist role in biotechnology and has a proven track record of advising startups, demonstrating a deep understanding of AI's practical application. |
July 24 - Women in AI
|
|
July 24 - Women in AI
2025-07-24 · 16:00
Hear talks from experts on cutting-edge topics in AI, ML, and computer vision! When Jul 24, 2025 at 9 - 11 AM Pacific Where Online. Register for the Zoom Exploring Vision-Language-Action (VLA) Models: From LLMs to Embodied AI This talk will explore the evolution of foundation models, highlighting the shift from large language models (LLMs) to vision-language models (VLMs), and now to vision-language-action (VLA) models. We'll dive into the emerging field of robot instruction following—what it means, and how recent research is shaping its future. I will present insights from my 2024 work on natural language-based robot instruction following and connect it to more recent advancements driving progress in this domain. About the Speaker Shreya Sharma is a Research Engineer at Reality Labs, Meta, where she works on photorealistic human avatars for AR/VR applications. She holds a bachelor’s degree in Computer Science from IIT Delhi and a master’s in Robotics from Carnegie Mellon University. Shreya is also a member of the inaugural 2023 cohort of the Quad Fellowship. Her research interests lie at the intersection of robotics and vision foundation models. Farming with CLIP: Foundation Models for Biodiversity and Agriculture Using open-source tools, we will explore the power and limitations of foundation models in agriculture and biodiversity applications. Leveraging the BIOTROVE dataset. The largest publicly accessible biodiversity dataset curated from iNaturalist, we will showcase real-world use cases powered by vision-language models trained on 40 million captioned images. We focus on understanding zero-shot capabilities, taxonomy-aware evaluation, and data-centric curation workflows. We will demonstrate how to visualize, filter, evaluate, and augment data at scale. This session includes practical walkthroughs on embedding visualization with CLIP, dataset slicing by taxonomic hierarchy, identification of model failure modes, and building fine-tuned pest and crop monitoring models. Attendees will gain insights into how to apply multi-modal foundation models for critical challenges in agriculture, like ecosystem monitoring in farming. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry. Multi-modal AI in Medical Edge and Client Device Computing In this live demo, we explore the transformative potential of multi-modal AI in medical edge and client device computing, focusing on real-time inference on a local AI PC. Attendees will witness how users can upload medical images, such as X-Rays, and ask questions about the images to the AI model. Inference is executed locally on Intel's integrated GPU and NPU using OpenVINO, enabling developers without deep AI experience to create generative AI applications. About the Speaker Helena Klosterman is an AI Engineer at Intel, based in the Netherlands, Helena enables organizations to unlock the potential of AI with OpenVINO, Intel's AI inference runtime. She is passionate about democratizing AI, developer experience, and bridging the gap between complex AI technology and practical applications. The Business of AI The talk will focus on the importance of clearly defining a specific problem and a use case, how to quantify the potential benefits of an AI solution in terms of measurable outcomes, evaluating technical feasibility in terms of technical challenges and limitations of implementing an AI solution, and envisioning the future of enterprise AI. About the Speaker Milica Cvetkovic is an AI engineer and consultant driving the development and deployment of production-ready AI systems for diverse organizations. Her expertise spans custom machine learning, generative AI, and AI operationalization. With degrees in mathematics and statistics, she possesses a decade of experience in education and edtech, including curriculum design and machine learning instruction for technical and non-technical audiences. Prior to Google, Milica held a data scientist role in biotechnology and has a proven track record of advising startups, demonstrating a deep understanding of AI's practical application. |
July 24 - Women in AI
|
|
July 24 - Women in AI
2025-07-24 · 16:00
Hear talks from experts on cutting-edge topics in AI, ML, and computer vision! When Jul 24, 2025 at 9 - 11 AM Pacific Where Online. Register for the Zoom Exploring Vision-Language-Action (VLA) Models: From LLMs to Embodied AI This talk will explore the evolution of foundation models, highlighting the shift from large language models (LLMs) to vision-language models (VLMs), and now to vision-language-action (VLA) models. We'll dive into the emerging field of robot instruction following—what it means, and how recent research is shaping its future. I will present insights from my 2024 work on natural language-based robot instruction following and connect it to more recent advancements driving progress in this domain. About the Speaker Shreya Sharma is a Research Engineer at Reality Labs, Meta, where she works on photorealistic human avatars for AR/VR applications. She holds a bachelor’s degree in Computer Science from IIT Delhi and a master’s in Robotics from Carnegie Mellon University. Shreya is also a member of the inaugural 2023 cohort of the Quad Fellowship. Her research interests lie at the intersection of robotics and vision foundation models. Farming with CLIP: Foundation Models for Biodiversity and Agriculture Using open-source tools, we will explore the power and limitations of foundation models in agriculture and biodiversity applications. Leveraging the BIOTROVE dataset. The largest publicly accessible biodiversity dataset curated from iNaturalist, we will showcase real-world use cases powered by vision-language models trained on 40 million captioned images. We focus on understanding zero-shot capabilities, taxonomy-aware evaluation, and data-centric curation workflows. We will demonstrate how to visualize, filter, evaluate, and augment data at scale. This session includes practical walkthroughs on embedding visualization with CLIP, dataset slicing by taxonomic hierarchy, identification of model failure modes, and building fine-tuned pest and crop monitoring models. Attendees will gain insights into how to apply multi-modal foundation models for critical challenges in agriculture, like ecosystem monitoring in farming. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry. Multi-modal AI in Medical Edge and Client Device Computing In this live demo, we explore the transformative potential of multi-modal AI in medical edge and client device computing, focusing on real-time inference on a local AI PC. Attendees will witness how users can upload medical images, such as X-Rays, and ask questions about the images to the AI model. Inference is executed locally on Intel's integrated GPU and NPU using OpenVINO, enabling developers without deep AI experience to create generative AI applications. About the Speaker Helena Klosterman is an AI Engineer at Intel, based in the Netherlands, Helena enables organizations to unlock the potential of AI with OpenVINO, Intel's AI inference runtime. She is passionate about democratizing AI, developer experience, and bridging the gap between complex AI technology and practical applications. The Business of AI The talk will focus on the importance of clearly defining a specific problem and a use case, how to quantify the potential benefits of an AI solution in terms of measurable outcomes, evaluating technical feasibility in terms of technical challenges and limitations of implementing an AI solution, and envisioning the future of enterprise AI. About the Speaker Milica Cvetkovic is an AI engineer and consultant driving the development and deployment of production-ready AI systems for diverse organizations. Her expertise spans custom machine learning, generative AI, and AI operationalization. With degrees in mathematics and statistics, she possesses a decade of experience in education and edtech, including curriculum design and machine learning instruction for technical and non-technical audiences. Prior to Google, Milica held a data scientist role in biotechnology and has a proven track record of advising startups, demonstrating a deep understanding of AI's practical application. |
July 24 - Women in AI
|