Search – talk-data.com

Stripe @ Night: Lessons from 9 exits and $144M raised 2025-12-04 · 19:00

Join us for a special edition of the Stripe Community Meetup, with extended networking afterwards in the beautiful Waterloo WeWork!

Go Behind the Build in a fireside chat with Bernardo and Estefanía, the team behind Blackstone Studio and First Time Founders. With 50+ MVPs shipped, 9 exits, and over $144M raised across the companies they’ve helped build, they’ve spent the last seven years working side-by-side with founders bringing ideas to life. In this session, we’ll break down the systems, mindsets, and behind-the-scenes decisions that separate projects that ship from the ones that stall. Expect real stories, practical insights, and the kind of conversation you only get from people who’ve been in the arena.

🎫 Tickets There is a hard limit for RSVP :) Wait list will be operating on Meetup.

📍Location The event is hosted at the beautiful auditorium in the Waterloo WeWork. If you've never been here, you're in for a treat!

🕚 Rough timings

7:00 pm - 7:30 pm: Doors open with refreshments
7:30 pm - 8:30 pm: Talks and Q&A
8:30 pm - 10:00 pm: Extended networking session! Hence Stripe @ Night

Partners: First Time Founders is a free community for first-time founders led by operators behind 50+ MVPs, 9 exits, and $150M+ raised. Learn to launch faster with expert guidance, weekly calls, proven frameworks, and real founder perks. Join here: https://bit.ly/4nd2AjO Stripe is the financial infrastructure powering millions of high-growth startups. From payments to billing to global expansion, Stripe helps founders scale revenue with simple, developer-friendly tools. Explore Stripe: https://stripe.com WeWork is a global workspace platform offering flexible offices, meeting rooms, and community-driven environments built for founders and fast-growing teams. From London to New York, WeWork provides the space and support creators need to build, collaborate, and scale. Explore WeWork: https://www.wework.com Blackstone Studio is a nearshore product and engineering studio that helps US startups ship high-quality software fast. With 50+ MVPs shipped and teams behind 9 exits, Blackstone partners with founders to build and scale production-ready products. Learn more: https://blackstone.studio SQLPipe is a data engineering consultancy and Stripe Community Partner: https://sqlpipe.com

Don't miss this opportunity to connect with the Stripe community.

Stripe developer code of conduct.

Stripe @ Night: Lessons from 9 exits and $144M raised

Aug 28 - AI, ML and Computer Vision Meetup 2025-08-28 · 17:00

Date and Time

Aug 28, 2025 at 10 AM Pacific

Location

Virtual - Register for the Zoom

Exploiting Vulnerabilities In CV Models Through Adversarial Attacks

As AI and computer vision models are leveraged more broadly in society, we should be better prepared for adversarial attacks by bad actors. In this talk, we'll cover some of the common methods for performing adversarial attacks on CV models. Adversarial attacks are deliberate attempts to deceive neural networks into generating incorrect predictions by making subtle alterations to the input data.

About the Speaker

Elisa Chen is a data scientist at Meta on the Ads AI Infra team with 5+ years of experience in the industry.

EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation

Recent 3D deep networks such as SwinUNETR, SwinUNETRv2, and 3D UX-Net have shown promising performance by leveraging self-attention and large-kernel convolutions to capture the volumetric context. However, their substantial computational requirements limit their use in real-time and resource-constrained environments.

In this paper, we propose EffiDec3D, an optimized 3D decoder that employs a channel reduction strategy across all decoder stages and removes the high-resolution layers when their contribution to segmentation quality is minimal. Our optimized EffiDec3D decoder achieves a 96.4% reduction in #Params and a 93.0% reduction in #FLOPs compared to the decoder of original 3D UX-Net. Our extensive experiments on 12 different medical imaging tasks confirm that EffiDec3D not only significantly reduces the computational demands, but also maintains a performance level comparable to original models, thus establishing a new standard for efficient 3D medical image segmentation.

About the Speaker

Md Mostafijur Rahman is a final-year Ph.D. candidate in Electrical and Computer Engineering at The University of Texas at Austin, advised by Dr. Radu Marculescu, where he builds efficient AI methods for biomedical imaging tasks such as segmentation, synthesis, and diagnosis. By uniting efficient architectures with data-efficient training, his work delivers robust and efficient clinically deployable imaging solutions.

What Makes a Good AV Dataset? Lessons from the Front Lines of Sensor Calibration and Projection

Getting autonomous vehicle data ready for real use, whether for training, simulation, or evaluation, isn’t just about collecting LIDAR and camera frames. It’s about making sure every point lands where it should, in the right frame, at the right time.

In this talk, we’ll break down what it actually takes to go from raw logs to a clean, usable AV dataset. We’ll walk through the practical process of validating transformations, aligning coordinate systems, checking intrinsics and extrinsics, and making sure your projected points actually show up on camera images. Along the way, we’ll share a checklist of common failure points and hard-won debugging tips.

Finally, we’ll show how doing this right unlocks downstream tools like Omniverse Nurec and Cosmos—enabling powerful workflows like digital reconstruction, simulation, and large-scale synthetic data generation

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

Clustering in Computer Vision: From Theory to Applications

In today’s AI landscape, these techniques are crucial. Clustering methods help organize unstructured data into meaningful groups, aiding knowledge discovery, feature analysis, and retrieval-augmented generation. From k-means to DBSCAN and hierarchical approaches like FINCH, selecting the right method is key: including balancing scalability, managing noise sensitivity, and fitting computational demands. This presentation provides an in-depth exploration of the current state-of-the-art of clustering techniques with a strong focus on their applications within computer vision.

About the Speaker

Constantin Seibold leads research group on the development of machine learning methods in the diagnostic and interventional radiology department at the university hospital Heidelberg. His research aims to improve the daily life of both doctors and patients.

Aug 28 - AI, ML and Computer Vision Meetup

Aug 28 - AI, ML and Computer Vision Meetup 2025-08-28 · 17:00

Date and Time

Aug 28, 2025 at 10 AM Pacific

Location

Virtual - Register for the Zoom

Exploiting Vulnerabilities In CV Models Through Adversarial Attacks

As AI and computer vision models are leveraged more broadly in society, we should be better prepared for adversarial attacks by bad actors. In this talk, we'll cover some of the common methods for performing adversarial attacks on CV models. Adversarial attacks are deliberate attempts to deceive neural networks into generating incorrect predictions by making subtle alterations to the input data.

About the Speaker

Elisa Chen is a data scientist at Meta on the Ads AI Infra team with 5+ years of experience in the industry.

EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation

Recent 3D deep networks such as SwinUNETR, SwinUNETRv2, and 3D UX-Net have shown promising performance by leveraging self-attention and large-kernel convolutions to capture the volumetric context. However, their substantial computational requirements limit their use in real-time and resource-constrained environments.

In this paper, we propose EffiDec3D, an optimized 3D decoder that employs a channel reduction strategy across all decoder stages and removes the high-resolution layers when their contribution to segmentation quality is minimal. Our optimized EffiDec3D decoder achieves a 96.4% reduction in #Params and a 93.0% reduction in #FLOPs compared to the decoder of original 3D UX-Net. Our extensive experiments on 12 different medical imaging tasks confirm that EffiDec3D not only significantly reduces the computational demands, but also maintains a performance level comparable to original models, thus establishing a new standard for efficient 3D medical image segmentation.

About the Speaker

Md Mostafijur Rahman is a final-year Ph.D. candidate in Electrical and Computer Engineering at The University of Texas at Austin, advised by Dr. Radu Marculescu, where he builds efficient AI methods for biomedical imaging tasks such as segmentation, synthesis, and diagnosis. By uniting efficient architectures with data-efficient training, his work delivers robust and efficient clinically deployable imaging solutions.

What Makes a Good AV Dataset? Lessons from the Front Lines of Sensor Calibration and Projection

Getting autonomous vehicle data ready for real use, whether for training, simulation, or evaluation, isn’t just about collecting LIDAR and camera frames. It’s about making sure every point lands where it should, in the right frame, at the right time.

In this talk, we’ll break down what it actually takes to go from raw logs to a clean, usable AV dataset. We’ll walk through the practical process of validating transformations, aligning coordinate systems, checking intrinsics and extrinsics, and making sure your projected points actually show up on camera images. Along the way, we’ll share a checklist of common failure points and hard-won debugging tips.

Finally, we’ll show how doing this right unlocks downstream tools like Omniverse Nurec and Cosmos—enabling powerful workflows like digital reconstruction, simulation, and large-scale synthetic data generation

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

Clustering in Computer Vision: From Theory to Applications

In today’s AI landscape, these techniques are crucial. Clustering methods help organize unstructured data into meaningful groups, aiding knowledge discovery, feature analysis, and retrieval-augmented generation. From k-means to DBSCAN and hierarchical approaches like FINCH, selecting the right method is key: including balancing scalability, managing noise sensitivity, and fitting computational demands. This presentation provides an in-depth exploration of the current state-of-the-art of clustering techniques with a strong focus on their applications within computer vision.

About the Speaker

Constantin Seibold leads research group on the development of machine learning methods in the diagnostic and interventional radiology department at the university hospital Heidelberg. His research aims to improve the daily life of both doctors and patients.

Aug 28 - AI, ML and Computer Vision Meetup

Aug 28 - AI, ML and Computer Vision Meetup 2025-08-28 · 17:00

Date and Time

Aug 28, 2025 at 10 AM Pacific

Location

Virtual - Register for the Zoom

Exploiting Vulnerabilities In CV Models Through Adversarial Attacks

As AI and computer vision models are leveraged more broadly in society, we should be better prepared for adversarial attacks by bad actors. In this talk, we'll cover some of the common methods for performing adversarial attacks on CV models. Adversarial attacks are deliberate attempts to deceive neural networks into generating incorrect predictions by making subtle alterations to the input data.

About the Speaker

Elisa Chen is a data scientist at Meta on the Ads AI Infra team with 5+ years of experience in the industry.

EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation

Recent 3D deep networks such as SwinUNETR, SwinUNETRv2, and 3D UX-Net have shown promising performance by leveraging self-attention and large-kernel convolutions to capture the volumetric context. However, their substantial computational requirements limit their use in real-time and resource-constrained environments.

In this paper, we propose EffiDec3D, an optimized 3D decoder that employs a channel reduction strategy across all decoder stages and removes the high-resolution layers when their contribution to segmentation quality is minimal. Our optimized EffiDec3D decoder achieves a 96.4% reduction in #Params and a 93.0% reduction in #FLOPs compared to the decoder of original 3D UX-Net. Our extensive experiments on 12 different medical imaging tasks confirm that EffiDec3D not only significantly reduces the computational demands, but also maintains a performance level comparable to original models, thus establishing a new standard for efficient 3D medical image segmentation.

About the Speaker

Md Mostafijur Rahman is a final-year Ph.D. candidate in Electrical and Computer Engineering at The University of Texas at Austin, advised by Dr. Radu Marculescu, where he builds efficient AI methods for biomedical imaging tasks such as segmentation, synthesis, and diagnosis. By uniting efficient architectures with data-efficient training, his work delivers robust and efficient clinically deployable imaging solutions.

What Makes a Good AV Dataset? Lessons from the Front Lines of Sensor Calibration and Projection

Getting autonomous vehicle data ready for real use, whether for training, simulation, or evaluation, isn’t just about collecting LIDAR and camera frames. It’s about making sure every point lands where it should, in the right frame, at the right time.

In this talk, we’ll break down what it actually takes to go from raw logs to a clean, usable AV dataset. We’ll walk through the practical process of validating transformations, aligning coordinate systems, checking intrinsics and extrinsics, and making sure your projected points actually show up on camera images. Along the way, we’ll share a checklist of common failure points and hard-won debugging tips.

Finally, we’ll show how doing this right unlocks downstream tools like Omniverse Nurec and Cosmos—enabling powerful workflows like digital reconstruction, simulation, and large-scale synthetic data generation

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

Clustering in Computer Vision: From Theory to Applications

In today’s AI landscape, these techniques are crucial. Clustering methods help organize unstructured data into meaningful groups, aiding knowledge discovery, feature analysis, and retrieval-augmented generation. From k-means to DBSCAN and hierarchical approaches like FINCH, selecting the right method is key: including balancing scalability, managing noise sensitivity, and fitting computational demands. This presentation provides an in-depth exploration of the current state-of-the-art of clustering techniques with a strong focus on their applications within computer vision.

About the Speaker

Constantin Seibold leads research group on the development of machine learning methods in the diagnostic and interventional radiology department at the university hospital Heidelberg. His research aims to improve the daily life of both doctors and patients.

Aug 28 - AI, ML and Computer Vision Meetup

Aug 28 - AI, ML and Computer Vision Meetup 2025-08-28 · 17:00

Date and Time

Aug 28, 2025 at 10 AM Pacific

Location

Virtual - Register for the Zoom

Exploiting Vulnerabilities In CV Models Through Adversarial Attacks

As AI and computer vision models are leveraged more broadly in society, we should be better prepared for adversarial attacks by bad actors. In this talk, we'll cover some of the common methods for performing adversarial attacks on CV models. Adversarial attacks are deliberate attempts to deceive neural networks into generating incorrect predictions by making subtle alterations to the input data.

About the Speaker

Elisa Chen is a data scientist at Meta on the Ads AI Infra team with 5+ years of experience in the industry.

EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation

Recent 3D deep networks such as SwinUNETR, SwinUNETRv2, and 3D UX-Net have shown promising performance by leveraging self-attention and large-kernel convolutions to capture the volumetric context. However, their substantial computational requirements limit their use in real-time and resource-constrained environments.

In this paper, we propose EffiDec3D, an optimized 3D decoder that employs a channel reduction strategy across all decoder stages and removes the high-resolution layers when their contribution to segmentation quality is minimal. Our optimized EffiDec3D decoder achieves a 96.4% reduction in #Params and a 93.0% reduction in #FLOPs compared to the decoder of original 3D UX-Net. Our extensive experiments on 12 different medical imaging tasks confirm that EffiDec3D not only significantly reduces the computational demands, but also maintains a performance level comparable to original models, thus establishing a new standard for efficient 3D medical image segmentation.

About the Speaker

Md Mostafijur Rahman is a final-year Ph.D. candidate in Electrical and Computer Engineering at The University of Texas at Austin, advised by Dr. Radu Marculescu, where he builds efficient AI methods for biomedical imaging tasks such as segmentation, synthesis, and diagnosis. By uniting efficient architectures with data-efficient training, his work delivers robust and efficient clinically deployable imaging solutions.

What Makes a Good AV Dataset? Lessons from the Front Lines of Sensor Calibration and Projection

Getting autonomous vehicle data ready for real use, whether for training, simulation, or evaluation, isn’t just about collecting LIDAR and camera frames. It’s about making sure every point lands where it should, in the right frame, at the right time.

In this talk, we’ll break down what it actually takes to go from raw logs to a clean, usable AV dataset. We’ll walk through the practical process of validating transformations, aligning coordinate systems, checking intrinsics and extrinsics, and making sure your projected points actually show up on camera images. Along the way, we’ll share a checklist of common failure points and hard-won debugging tips.

Finally, we’ll show how doing this right unlocks downstream tools like Omniverse Nurec and Cosmos—enabling powerful workflows like digital reconstruction, simulation, and large-scale synthetic data generation

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

Clustering in Computer Vision: From Theory to Applications

In today’s AI landscape, these techniques are crucial. Clustering methods help organize unstructured data into meaningful groups, aiding knowledge discovery, feature analysis, and retrieval-augmented generation. From k-means to DBSCAN and hierarchical approaches like FINCH, selecting the right method is key: including balancing scalability, managing noise sensitivity, and fitting computational demands. This presentation provides an in-depth exploration of the current state-of-the-art of clustering techniques with a strong focus on their applications within computer vision.

About the Speaker

Constantin Seibold leads research group on the development of machine learning methods in the diagnostic and interventional radiology department at the university hospital Heidelberg. His research aims to improve the daily life of both doctors and patients.

Aug 28 - AI, ML and Computer Vision Meetup

Aug 28 - AI, ML and Computer Vision Meetup 2025-08-28 · 17:00

Date and Time

Aug 28, 2025 at 10 AM Pacific

Location

Virtual - Register for the Zoom

Exploiting Vulnerabilities In CV Models Through Adversarial Attacks

As AI and computer vision models are leveraged more broadly in society, we should be better prepared for adversarial attacks by bad actors. In this talk, we'll cover some of the common methods for performing adversarial attacks on CV models. Adversarial attacks are deliberate attempts to deceive neural networks into generating incorrect predictions by making subtle alterations to the input data.

About the Speaker

Elisa Chen is a data scientist at Meta on the Ads AI Infra team with 5+ years of experience in the industry.

EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation

Recent 3D deep networks such as SwinUNETR, SwinUNETRv2, and 3D UX-Net have shown promising performance by leveraging self-attention and large-kernel convolutions to capture the volumetric context. However, their substantial computational requirements limit their use in real-time and resource-constrained environments.

In this paper, we propose EffiDec3D, an optimized 3D decoder that employs a channel reduction strategy across all decoder stages and removes the high-resolution layers when their contribution to segmentation quality is minimal. Our optimized EffiDec3D decoder achieves a 96.4% reduction in #Params and a 93.0% reduction in #FLOPs compared to the decoder of original 3D UX-Net. Our extensive experiments on 12 different medical imaging tasks confirm that EffiDec3D not only significantly reduces the computational demands, but also maintains a performance level comparable to original models, thus establishing a new standard for efficient 3D medical image segmentation.

About the Speaker

Md Mostafijur Rahman is a final-year Ph.D. candidate in Electrical and Computer Engineering at The University of Texas at Austin, advised by Dr. Radu Marculescu, where he builds efficient AI methods for biomedical imaging tasks such as segmentation, synthesis, and diagnosis. By uniting efficient architectures with data-efficient training, his work delivers robust and efficient clinically deployable imaging solutions.

What Makes a Good AV Dataset? Lessons from the Front Lines of Sensor Calibration and Projection

Getting autonomous vehicle data ready for real use, whether for training, simulation, or evaluation, isn’t just about collecting LIDAR and camera frames. It’s about making sure every point lands where it should, in the right frame, at the right time.

In this talk, we’ll break down what it actually takes to go from raw logs to a clean, usable AV dataset. We’ll walk through the practical process of validating transformations, aligning coordinate systems, checking intrinsics and extrinsics, and making sure your projected points actually show up on camera images. Along the way, we’ll share a checklist of common failure points and hard-won debugging tips.

Finally, we’ll show how doing this right unlocks downstream tools like Omniverse Nurec and Cosmos—enabling powerful workflows like digital reconstruction, simulation, and large-scale synthetic data generation

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

Clustering in Computer Vision: From Theory to Applications

In today’s AI landscape, these techniques are crucial. Clustering methods help organize unstructured data into meaningful groups, aiding knowledge discovery, feature analysis, and retrieval-augmented generation. From k-means to DBSCAN and hierarchical approaches like FINCH, selecting the right method is key: including balancing scalability, managing noise sensitivity, and fitting computational demands. This presentation provides an in-depth exploration of the current state-of-the-art of clustering techniques with a strong focus on their applications within computer vision.

About the Speaker

Constantin Seibold leads research group on the development of machine learning methods in the diagnostic and interventional radiology department at the university hospital Heidelberg. His research aims to improve the daily life of both doctors and patients.

Aug 28 - AI, ML and Computer Vision Meetup

Aug 28 - AI, ML and Computer Vision Meetup 2025-08-28 · 17:00

Date and Time

Aug 28, 2025 at 10 AM Pacific

Location

Virtual - Register for the Zoom

Exploiting Vulnerabilities In CV Models Through Adversarial Attacks

As AI and computer vision models are leveraged more broadly in society, we should be better prepared for adversarial attacks by bad actors. In this talk, we'll cover some of the common methods for performing adversarial attacks on CV models. Adversarial attacks are deliberate attempts to deceive neural networks into generating incorrect predictions by making subtle alterations to the input data.

About the Speaker

Elisa Chen is a data scientist at Meta on the Ads AI Infra team with 5+ years of experience in the industry.

EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation

Recent 3D deep networks such as SwinUNETR, SwinUNETRv2, and 3D UX-Net have shown promising performance by leveraging self-attention and large-kernel convolutions to capture the volumetric context. However, their substantial computational requirements limit their use in real-time and resource-constrained environments.

In this paper, we propose EffiDec3D, an optimized 3D decoder that employs a channel reduction strategy across all decoder stages and removes the high-resolution layers when their contribution to segmentation quality is minimal. Our optimized EffiDec3D decoder achieves a 96.4% reduction in #Params and a 93.0% reduction in #FLOPs compared to the decoder of original 3D UX-Net. Our extensive experiments on 12 different medical imaging tasks confirm that EffiDec3D not only significantly reduces the computational demands, but also maintains a performance level comparable to original models, thus establishing a new standard for efficient 3D medical image segmentation.

About the Speaker

Md Mostafijur Rahman is a final-year Ph.D. candidate in Electrical and Computer Engineering at The University of Texas at Austin, advised by Dr. Radu Marculescu, where he builds efficient AI methods for biomedical imaging tasks such as segmentation, synthesis, and diagnosis. By uniting efficient architectures with data-efficient training, his work delivers robust and efficient clinically deployable imaging solutions.

What Makes a Good AV Dataset? Lessons from the Front Lines of Sensor Calibration and Projection

Getting autonomous vehicle data ready for real use, whether for training, simulation, or evaluation, isn’t just about collecting LIDAR and camera frames. It’s about making sure every point lands where it should, in the right frame, at the right time.

In this talk, we’ll break down what it actually takes to go from raw logs to a clean, usable AV dataset. We’ll walk through the practical process of validating transformations, aligning coordinate systems, checking intrinsics and extrinsics, and making sure your projected points actually show up on camera images. Along the way, we’ll share a checklist of common failure points and hard-won debugging tips.

Finally, we’ll show how doing this right unlocks downstream tools like Omniverse Nurec and Cosmos—enabling powerful workflows like digital reconstruction, simulation, and large-scale synthetic data generation

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

Clustering in Computer Vision: From Theory to Applications

In today’s AI landscape, these techniques are crucial. Clustering methods help organize unstructured data into meaningful groups, aiding knowledge discovery, feature analysis, and retrieval-augmented generation. From k-means to DBSCAN and hierarchical approaches like FINCH, selecting the right method is key: including balancing scalability, managing noise sensitivity, and fitting computational demands. This presentation provides an in-depth exploration of the current state-of-the-art of clustering techniques with a strong focus on their applications within computer vision.

About the Speaker

Constantin Seibold leads research group on the development of machine learning methods in the diagnostic and interventional radiology department at the university hospital Heidelberg. His research aims to improve the daily life of both doctors and patients.

Aug 28 - AI, ML and Computer Vision Meetup

Aug 28 - AI, ML and Computer Vision Meetup 2025-08-28 · 17:00

Date and Time

Aug 28, 2025 at 10 AM Pacific

Location

Virtual - Register for the Zoom

Exploiting Vulnerabilities In CV Models Through Adversarial Attacks

As AI and computer vision models are leveraged more broadly in society, we should be better prepared for adversarial attacks by bad actors. In this talk, we'll cover some of the common methods for performing adversarial attacks on CV models. Adversarial attacks are deliberate attempts to deceive neural networks into generating incorrect predictions by making subtle alterations to the input data.

About the Speaker

Elisa Chen is a data scientist at Meta on the Ads AI Infra team with 5+ years of experience in the industry.

EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation

Recent 3D deep networks such as SwinUNETR, SwinUNETRv2, and 3D UX-Net have shown promising performance by leveraging self-attention and large-kernel convolutions to capture the volumetric context. However, their substantial computational requirements limit their use in real-time and resource-constrained environments.

In this paper, we propose EffiDec3D, an optimized 3D decoder that employs a channel reduction strategy across all decoder stages and removes the high-resolution layers when their contribution to segmentation quality is minimal. Our optimized EffiDec3D decoder achieves a 96.4% reduction in #Params and a 93.0% reduction in #FLOPs compared to the decoder of original 3D UX-Net. Our extensive experiments on 12 different medical imaging tasks confirm that EffiDec3D not only significantly reduces the computational demands, but also maintains a performance level comparable to original models, thus establishing a new standard for efficient 3D medical image segmentation.

About the Speaker

Md Mostafijur Rahman is a final-year Ph.D. candidate in Electrical and Computer Engineering at The University of Texas at Austin, advised by Dr. Radu Marculescu, where he builds efficient AI methods for biomedical imaging tasks such as segmentation, synthesis, and diagnosis. By uniting efficient architectures with data-efficient training, his work delivers robust and efficient clinically deployable imaging solutions.

What Makes a Good AV Dataset? Lessons from the Front Lines of Sensor Calibration and Projection

Getting autonomous vehicle data ready for real use, whether for training, simulation, or evaluation, isn’t just about collecting LIDAR and camera frames. It’s about making sure every point lands where it should, in the right frame, at the right time.

In this talk, we’ll break down what it actually takes to go from raw logs to a clean, usable AV dataset. We’ll walk through the practical process of validating transformations, aligning coordinate systems, checking intrinsics and extrinsics, and making sure your projected points actually show up on camera images. Along the way, we’ll share a checklist of common failure points and hard-won debugging tips.

Finally, we’ll show how doing this right unlocks downstream tools like Omniverse Nurec and Cosmos—enabling powerful workflows like digital reconstruction, simulation, and large-scale synthetic data generation

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

Clustering in Computer Vision: From Theory to Applications

In today’s AI landscape, these techniques are crucial. Clustering methods help organize unstructured data into meaningful groups, aiding knowledge discovery, feature analysis, and retrieval-augmented generation. From k-means to DBSCAN and hierarchical approaches like FINCH, selecting the right method is key: including balancing scalability, managing noise sensitivity, and fitting computational demands. This presentation provides an in-depth exploration of the current state-of-the-art of clustering techniques with a strong focus on their applications within computer vision.

About the Speaker

Constantin Seibold leads research group on the development of machine learning methods in the diagnostic and interventional radiology department at the university hospital Heidelberg. His research aims to improve the daily life of both doctors and patients.

Aug 28 - AI, ML and Computer Vision Meetup

July 17 - AI, ML and Computer Vision Meetup 2025-07-17 · 17:00

When and Where

July 17\, 2025 \| 10:00 – 11:30 AM Pacific

Virtually over Zoom. Sign up!

Using VLMs to Navigate the Sea of Data

At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.

About the Speaker

Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.

About the Speaker

Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.

Building Efficient and Reliable Workflows for Object Detection

Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.

About the Speaker

Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.

Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets

High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.

July 17 - AI, ML and Computer Vision Meetup

July 17 - AI, ML and Computer Vision Meetup 2025-07-17 · 17:00

When and Where

July 17\, 2025 \| 10:00 – 11:30 AM Pacific

Virtually over Zoom. Sign up!

Using VLMs to Navigate the Sea of Data

At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.

About the Speaker

Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.

About the Speaker

Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.

Building Efficient and Reliable Workflows for Object Detection

Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.

About the Speaker

Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.

Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets

High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.

July 17 - AI, ML and Computer Vision Meetup

July 17 - AI, ML and Computer Vision Meetup 2025-07-17 · 17:00

When and Where

July 17\, 2025 \| 10:00 – 11:30 AM Pacific

Virtually over Zoom. Sign up!

Using VLMs to Navigate the Sea of Data

At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.

About the Speaker

Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.

About the Speaker

Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.

Building Efficient and Reliable Workflows for Object Detection

Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.

About the Speaker

Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.

Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets

High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.

July 17 - AI, ML and Computer Vision Meetup

July 17 - AI, ML and Computer Vision Meetup 2025-07-17 · 17:00

When and Where

July 17\, 2025 \| 10:00 – 11:30 AM Pacific

Virtually over Zoom. Sign up!

Using VLMs to Navigate the Sea of Data

At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.

About the Speaker

Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.

About the Speaker

Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.

Building Efficient and Reliable Workflows for Object Detection

Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.

About the Speaker

Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.

Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets

High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.

July 17 - AI, ML and Computer Vision Meetup

July 17 - AI, ML and Computer Vision Meetup 2025-07-17 · 17:00

When and Where

July 17\, 2025 \| 10:00 – 11:30 AM Pacific

Virtually over Zoom. Sign up!

Using VLMs to Navigate the Sea of Data

At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.

About the Speaker

Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.

About the Speaker

Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.

Building Efficient and Reliable Workflows for Object Detection

Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.

About the Speaker

Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.

Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets

High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.

July 17 - AI, ML and Computer Vision Meetup

July 17 - AI, ML and Computer Vision Meetup 2025-07-17 · 17:00

When and Where

July 17\, 2025 \| 10:00 – 11:30 AM Pacific

Virtually over Zoom. Sign up!

Using VLMs to Navigate the Sea of Data

At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.

About the Speaker

Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.

About the Speaker

Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.

Building Efficient and Reliable Workflows for Object Detection

Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.

About the Speaker

Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.

Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets

High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.

July 17 - AI, ML and Computer Vision Meetup

July 17 - AI, ML and Computer Vision Meetup 2025-07-17 · 17:00

When and Where

July 17\, 2025 \| 10:00 – 11:30 AM Pacific

Virtually over Zoom. Sign up!

Using VLMs to Navigate the Sea of Data

At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.

About the Speaker

Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.

About the Speaker

Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.

Building Efficient and Reliable Workflows for Object Detection

Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.

About the Speaker

Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.

Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets

High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.

July 17 - AI, ML and Computer Vision Meetup

dbt Global Circuit Series: Stockholm dbt Meetup 2025-06-10 · 15:30

This meetup is designed for builders, thinkers, and tinkerers — folks who want to roll up their sleeves (figuratively), learn from each other, and imagine what’s next for the analytics engineering stack.

Whether you’re curious about what’s new, want to see how others are approaching similar problems, or just want to ask hard questions — you’re in the right place.

🤝 Organizer: Solita & dbt Labs 🏠 Venue: Solita, Lästmakargatan 10, 111 44 Stockholm SWEDEN Go left when you see SATS and take the elevator to the 4th Floor. 🥣 Refreshments: Light food, and drinks will be provided.

## Details This dbt Meetup is an opportunity for the local Stockholm dbt Community to connect and collaborate. If you work with data, this event is for you. We welcome data analysts, scientists, engineers, architects, and more!

📝Agenda 17.30: Welcome Meet and great. Take some food & drinks while socialising with your peers.

18.30: How Rebtel increased data product value: A migration story (30 min) (Abraham Setiawan, Analytics Engineer @ Rebtel) Rebtel aims to help friends and family in different countries stay in touch and support each other by delivering the world’s most relevant cross-border solutions. Currently, Rebtel provides cross-border calling and mobile top-up, as well as mobile e-SIM. In this presentation, Rebtel shares insights on how it improves data quality and data team productivity by migrating from legacy architecture to dbt Cloud.

19.00: For Builders, by Builders: The Latest Tools from dbt Labs (30m) (Ludwig Sewall, Solutions Architect @ dbt Labs) Ludwig will dive into the latest tools and ideas dbt Labs has been shipping — what they unlock, how they fit together, and why dbt Labs built them the way they did.

19.00-21.30: Networking, Discussions & Drinks

➡️ Join the dbt Slack community: https://www.getdbt.com/community/ 🤝 Make sure to join the #local-sweden channel in dbt Slack (https://slack.getdbt.com/).

To attend, please read the Health and Safety Policy and Terms of Participation:** https://www.getdbt.com/legal/health-and-safety-policy

dbt is the standard in data transformation, used by over 50,000 organizations worldwide. Through the application of software engineering best practices like modularity, version control, testing, and documentation, dbt’s analytics engineering workflow helps teams work more efficiently to produce data the entire organization can trust. Learn more: https://www.getdbt.com/

dbt Global Circuit Series: Stockholm dbt Meetup

May 22 - AI, ML and Computer Vision Meetup 2025-05-22 · 17:00

When and Where

May 22\, 2025 \| 10:00 AM Pacific
Virtual - Register for the Zoom

CountGD: Multi-Modal Open-World Counting

We propose CountGD, the first open-world counting model that can count any object specified by text only, visual examples only, or both together. CountGD extends the Grounding DINO architecture and adds components to enable specifying the object with visual examples. This new capability – being able to specify the target object by multi-modalites (text and exemplars) – lead to an improvement in counting accuracy. CountGD is powering multiple products and has been applied to problems across different domains including counting large populations of penguins to monitor the influence of climate change, counting buildings from satellite images, and counting seals for conservation.

About the Speaker

Niki Amini-Naieni is a DPhil student focusing on developing foundation model capabilities for visual understanding of the open world at the Visual Geometry Group (VGG), Oxford supervised by Andrew Zisserman. In the past, Niki has consulted with Amazon and other companies in robotics and computer vision, interned at SpaceX, and studied computer science and engineering at Cornell.

GorillaWatch: Advancing Gorilla Re-Identification and Population Monitoring with AI

Accurate monitoring of endangered gorilla populations is critical for conservation efforts in the field, where scientists currently rely on labor-intensive manual video labeling methods. The GorillaWatch project applies visual AI to provide robust re-identification of individual gorillas and generate local population estimates in wildlife encounters.

About the Speaker

Maximilian von Klinski is a Computer Science student at the Hasso-Plattner-Institut and is currently working on the GorillaWatch project alongside seven fellow students.

This Gets Under Your Skin – The Art of Skin Type Classification

Skin analysis is deceptively hard: inconsistent portrait quality, lighting variations, and the presence of sunscreen or makeup often obscure what’s truly “under the skin.” In this talk, I’ll share how we built an AI pipeline for skin type classification that tackles these real-world challenges with a combination of vision models. The architecture includes image quality control, facial segmentation, and a final classifier trained on curated dermatological features.

About the Speaker

Markus Hinsche is the co-founder and CTO of Thea Care, where he builds AI-powered skincare solutions at the intersection of health, beauty, and longevity. He holds a Master’s in Software Engineering from the Hasso Plattner Institute and brings a deep background in AI and product development.

A Spot Pattern Is like a Fingerprint: Jaguar Identification Project

The Jaguar Identification Project is a citizen science initiative actively engaging the public in conservation efforts in Porto Jofre, Brazil. This project increases awareness and provides an interesting and challenging dataset that requires the use of fine-grained visual classification algorithms. We use this rich dataset for dual purposes: teaching data-centric visual AI and directly contributing to conservation efforts for this vulnerable species.

Learn more: Jaguar Identification Project | Jaguar Conservation NGO in Brazil | Porto Jofre – Poconé, State of Mato Grosso, Brazil

About the Speaker

Antonio Rueda-Toicen, an AI Engineer in Berlin, has extensive experience in deploying machine learning models and has taught over 300 professionals. He is currently a Research Scientist at the Hasso Plattner Institute. Since 2019, he has organized the Berlin Computer Vision Group and taught at Berlin’s Data Science Retreat. He specializes in computer vision, cloud technologies, and machine learning. Antonio is also a certified instructor of deep learning and diffusion models in NVIDIA’s Deep Learning Institute.

May 22 - AI, ML and Computer Vision Meetup

talk-data.com

People (688 results)

Companies (1 result)

Activities & events