talk-data.com
People (8 results)
See all 8 →Amritha Arun Babu Mysore
Product Leader · Amazon; Wayfair; Klaviyo (experience across AI platforms, supply chain, and enterprise workflows)
Greg Hintermeister
Distinguished Engineer, Master Inventor, AI+ Enterprise Transformation Lead · IBM
Charlie Huang
Sr. Product Marketing Manager, Enterprise AI · NVIDIA
Activities & events
| Title & Speakers | Event |
|---|---|
|
🤖 Building an Agentic AI HR Assistant with n8n
2026-01-09 · 11:00
Companies across industries are increasingly exploring AI-powered HR assistants to streamline recruitment, employee support, onboarding, and internal HR processes. But moving from original processes to a reliable, agentic HR assistant, requires more than just plugging in an LLM. How do you design, build, and deploy an AI HR Assistant that actually works in real business environments? In this 60-minute interactive webinar, we’ll walk through the end-to-end design and implementation of an Agentic AI HR Assistant, with a strong focus on practical architecture and real-world execution. You will also see a live implementation of an HR Agent built with n8n, orchestrating LLMs, workflows, and enterprise tools to solve concrete HR use cases. You’ll learn🧩 Core Concepts of Agentic AI for HR How agentic systems can help leverage actual HR challenges a lot of teams are currently facing. 🏗️ Architecture of an AI HR Assistant How to design a production-ready HR assistant:
🤖 Typical HR Use Cases Powered by Agents Concrete examples, including:
💡 Live Demo: Building an Agentic HR Assistant with n8n Watch a full HR agent in action using n8n:
🧪 Interactive Q&A and Architecture Discussion Ask questions live and discuss design trade-offs, limitations, and best practices for deploying AI agents in HR environments. 📅 Duration: 60 minutes 🔗 URL: https://events.teams.microsoft.com/event/1f0a6abf-96bc-4d89-962f-3c896092055b@d94ea0cb-fd25-43ad-bf69-8d9e42e4d175 |
🤖 Building an Agentic AI HR Assistant with n8n
|
|
183 - Part II: Designing with the Flow of Work: Accelerating Sales in B2B Analytics and AI Products by Minimizing Behavior Change
2025-11-27 · 02:00
Brian T. O’Neill
– host
In this second part of my three-part series (catch Part I via episode 182), I dig deeper into the key idea that sales in commercial data products can be accelerated by designing for actual user workflows—vs. going wide with a “many-purpose” AI and analytics solution that “does more,” but is misaligned with how users’ most important work actually gets done. To explain this, I will explain the concept of user experience (UX) outcomes, and how building your solution to enable these outcomes may be a dependency for you to get sales traction, and for your customer to see the value of your solution. I also share practical steps to improve UX outcomes in commercial data products, from establishing a baseline definition of UX quality to mapping out users’ current workflows (and future ones, when agentic AI changes their job). Finally, I talk about how approaching product development as small “bets” helps you build small, and learn fast so you can accelerate value creation. Highlights/ Skip to: Continuing the journey: designing for users, workflows, and tasks (00:32) How UX impacts sales—not just usage and adoption(02:16) Understanding how you can leverage users’ frustrations and perceived risks as fuel for building an indispensable data product (04:11) Definition of a UX outcome (7:30) Establishing a baseline definition of product (UX) quality, so you know how to observe and measure improvement (11:04 ) Spotting friction and solving the right customer problems first (15:34) Collecting actionable user feedback (20:02) Moving users along the scale from frustration to satisfaction to delight (23:04) Unique challenges of designing B2B AI and analytics products used for decision intelligence (25:04) Quotes from Today’s Episode One of the hardest parts of building anything meaningful, especially in B2B or data-heavy spaces, is pausing long enough to ask what the actual ‘it’ is that we’re trying to solve. People rush into building the fix, pitching the feature, or drafting the roadmap before they’ve taken even a moment to define what the user keeps tripping over in their day-to-day environment. And until you slow down and articulate that shared, observable frustration, you’re basically operating on vibes and assumptions instead of behavior and reality. What you want is not a generic problem statement but an agreed-upon description of the two or three most painful frictions that are obvious to everyone involved, frictions the user experiences visibly and repeatedly in the flow of work. Once you have that grounding, everything else prioritization, design decisions, sequencing, even organizational alignment suddenly becomes much easier because you’re no longer debating abstractions, you’re working against the same measurable anchor. And the irony is, the faster you try to skip this step, the longer the project drags on, because every downstream conversation becomes a debate about interpretive language rather than a conversation about a shared, observable experience. __ Want people to pay for your product? Solve an observable problem—not a vague information or data problem. What do I mean? “When you’re trying to solve a problem for users, especially in analytical or AI-driven products, one of the biggest traps is relying on interpretive statements instead of observable ones. Interpretive phrasing like ‘they’re overwhelmed’ or ‘they don’t trust the data’ feels descriptive, but it hides the important question of what, exactly, we can see them doing that signals the problem. If you can’t film it happening, if you can’t watch the behavior occur in real time, then you don’t actually have a problem definition you can design around. Observable frustration might be the user jumping between four screens, copying and pasting the same value into different systems, or re-running a query five times because something feels off even though they can’t articulate why. Those concrete behaviors are what allow teams to converge and say, ‘Yes, that’s the thing, that is the friction we agree must change,’ and that shift from interpretation to observation becomes the foundation for better design, better decision-making, and far less wasted effort. And once you anchor the conversation in visible behavior, you eliminate so many circular debates and give everyone, from engineering to leadership, a shared starting point that’s grounded in reality instead of theory." __ One of the reasons that measuring the usability/utility/satisfaction of your product’s UX might seem hard is that you don’t have a baseline definition of how satisfactory (or not) the product is right now. As such, it’s very hard to tell if you’re just making product changes—or you’re making improvements that might make the product worth paying for at all, worth paying more for, or easier to buy. "It’s surprisingly common for teams to claim they’re improving something when they’ve never taken the time to document what the current state even looks like. If you want to create a meaningful improvement, something a user actually feels, you need to understand the baseline level of friction they tolerate today, not what you imagine that friction might be. Establishing a baseline is not glamorous work, but it’s the work that prevents you from building changes that make sense on paper but do nothing to the real flow of work. When you diagram the existing workflow, when you map the sequence of steps the user actually takes, the mismatches between your mental model and their lived experience become crystal clear, and the design direction becomes far less ambiguous. That act of grounding yourself in the current state allows every subsequent decision, prioritizing fixes, determining scope, measuring progress, to be aligned with reality rather than assumptions. And without that baseline, you risk designing solutions that float in conceptual space, disconnected from the very pains you claim to be addressing." __ Prototypes are a great way to learn—if you’re actually treating them as a means to learn, and not a product you intend to deliver regardless of the feedback customers give you. "People often think prototyping is about validating whether their solution works, but the deeper purpose is to refine the problem itself. Once you put even a rough prototype in front of someone and watch what they do with it, you discover the edges of the problem more accurately than any conversation or meeting can reveal. Users will click in surprising places, ignore the part you thought mattered most, or reveal entirely different frictions just by trying to interact with the thing you placed in front of them. That process doesn’t just improve the design, it improves the team’s understanding of which parts of the problem are real and which parts were just guesses. Prototyping becomes a kind of externalization of assumptions, forcing you to confront whether you’re solving the friction that actually holds back the flow of work or a friction you merely predicted. And every iteration becomes less about perfecting the interface and more about sharpening the clarity of the underlying problem, which is why the teams that prototype early tend to build faster, with better alignment, and far fewer detours." __ Most founders and data people tend to measure UX quality by “counting usage” of their solution. Tracking usage stats, analytics on sessions, etc. The problem with this is that it tells you nothing useful about whether people are satisfied (“meets spec”) or delighted (“a product they can’t live without”). These are product metrics—but they don’t reflect how people feel. There are better measurements to use for evaluating users’ experience that go beyond “willingness to pay.” Payment is great, but in B2B products, buyers aren’t always users—and we’ve all bought something based on the promise of what it would do for us, but the promise fell short. "In B2B analytics and AI products, the biggest challenge isn’t complexity, it’s ambiguity around what outcome the product is actually responsible for changing. Teams often define success in terms of internal goals like ‘adoption,’ ‘usage,’ or ‘efficiency,’ but those metrics don’t tell you what the user’s experience is supposed to look like once the product is working well. A product tied to vague business outcomes tends to drift because no one agrees on what the improvement should feel like in the user’s real workflow. What you want are visible, measurable, user-centric outcomes, outcomes that describe how the user’s behavior or experience will change once the solution is in place, down to the concrete actions they’ll no longer need to take. When you articulate outcomes at that level, it forces the entire organization to align around a shared target, reduces the scope bloat that normally plagues enterprise products, and gives you a way to evaluate whether you’re actually removing friction rather than just adding more layers of tooling. And ironically, the clearer the user outcome is, the easier it becomes to achieve the business outcome, because the product is no longer floating in abstraction, it’s anchored in the lived reality of the people who use it." Links Listen to part one: Episode 182 Schedule a Design-Eyes Assessment with me and get clarity, now. |
Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design) |
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Address: 122 Fifth Avenue, 122 5th Ave, New York, NY 10011 Come join us for our first in-person event at the NEW Microsoft office on 5th avenue, Manhattan (not the Times Square location) to recap on Summit for those that were there and for those that were not able to make it! We have an awesome DUG session led by Peter Joeckel & Kati Hvidtfeldt both of who have been in the Microsoft Ecosystem for years! Session 1: Best practices for the Budgeting and Discovery stages in selecting an ERP & CRM Time: 12:45 pm - 2:00 pm Key Topics Covered:
Insights & Takeaways:
What is different about this session is if you can convince your bosses to take the full afternoon off to join us, we have a second session in the afternoon on The Ultimate Executives Guide for D365 Implementation. Session 2: The Ultimate Executives Guide for D365 Implementation. Time: 3:00 pm - 5:00 pm Key Topics Covered
Insights & Takeaways
You can attend just our Dynamics User Group session only or join us for both! Refreshments: We will have lunch served at our Dynamics User Group and a happy hour to follow the second session! |
Best practices for the Budgeting and Discovery stages in selecting an ERP & CRM
|
|
Building Enterprise AI That Works
2025-10-15 · 23:00
Topic: Building the Next Generation of Enterprise AI: From Intelligent Automation to Document Search with RAG Description: The promise of AI is here, but how do we move from hype to tangible business value? Organizations today are drowning in unstructured data and slowed by complex manual workflows. The next generation of enterprise AI offers a powerful solution, capable of not just automating tasks but understanding, reasoning, and interacting with information in unprecedented ways. Join Bibin Prathap, a Microsoft MVP for AI and a seasoned AI & Analytics Leader, for a deep dive into the practical architecture and application of modern enterprise AI. Drawing from his hands-on experience building an AI-driven workflow automation platform and a generative AI document explorer, Bibin will demystify the core technologies transforming the modern enterprise. This session will provide a technical roadmap for building impactful, scalable, and intelligent systems. What You Will Learn:
Who Should Attend: This session is designed for AI Engineers, Data Scientists, Software Architects, Developers, and Tech Leaders who are responsible for implementing AI solutions and driving digital transformation. Speak with Our Knowledgeable Advisor Access Our Complimentary Career Guide Transform Your Career with Us in Just 14 Weeks Discover More About WeCloudData ABOUT US WeCloudData is the leading accredited education institute in North America that focuses on Data Science, Data Engineering, DevOps, Artificial Intelligence, and Business Intelligence. Developed by industry experts, and hiring managers, and highly recognized by our hiring partners, WeCloudData’s learning paths have helped many students make successful transitions into data and DevOps roles that fit their backgrounds and passions. WeCloudData provides a different and more practical teaching methodology, so that students not only learn the technical skills but also acquire the soft skills that will make them stand out in a work environment. WeCloudData has also partnered with many big companies to help them adopt the latest tech in Data, AI, and DevOps. Visit our website for more information: https://weclouddata.com |
Building Enterprise AI That Works
|
|
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
2025-09-12 · 19:00
Join us for day three in a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI, Manufacturing and Robotics. Date and Time Sept 12 at 9 AM Pacific Location Virtual. Register for the Zoom! Towards Robotics Foundation Models that Can Reason In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities. Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings. In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for action reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks. About the Speaker Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire. Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection In this talk, I will share our recent research efforts in visual industrial anomaly detection. It will present a comprehensive empirical analysis with a focus on real-world applications, demonstrating that recent SOTA methods perform worse than methods from 2021 when evaluated on a variety of datasets. We will also investigate how different practical aspects, such as input size, distribution shift, data contamination, and having a validation set, affect the results. About the Speaker Aimira Baitieva is a Research Engineer at Valeo, where she works primarily on computer vision problems. Her recent work has been focused on deep learning anomaly detection for automating visual inspection, incorporating both research and practical applications in the manufacturing sector. The Digital Reasoning Thread in Manufacturing: Orchestrating Vision, Simulation, and Robotics Manufacturing is entering a new phase where AI is no longer confined to isolated tasks like defect detection or predictive maintenance. Advances in reasoning AI, simulation, and robotics are converging to create end-to-end systems that can perceive, decide, and act – in both digital and physical environments. This talk introduces the Digital Reasoning Thread – a consistent layer of AI reasoning that runs through every stage of manufacturing, connecting visual intelligence, digital twins, simulation environments, and robotic execution. By linking perception with advanced reasoning and action, this approach enables faster, higher-quality decisions across the entire value chain. We will explore real-world examples of applying reasoning AI in industrial settings, combining simulation-driven analysis, orchestration frameworks, and the foundations needed for robotic execution in the physical world. Along the way, we will examine the key technical building blocks – from data pipelines and interoperability standards to agentic AI architectures – that make this level of integration possible. Attendees will gain a clear understanding of how to bridge AI-driven perception with simulation and robotics, and what it takes to move from isolated pilots to orchestrated, autonomous manufacturing systems. About the Speaker Vlad Larichev is an Industrial AI Lead at Accenture Industry X, specializing in applying AI, generative AI, and agentic AI to engineering, manufacturing, and large-scale industrial operations. With a background as an engineer, solution architect, and software developer, he has led AI initiatives across sectors including automotive, energy, and consumer goods, integrating advanced analytics, computer vision, and simulation into complex industrial environments. Vlad is the creator of the Digital Reasoning Thread – a framework for connecting AI reasoning across visual intelligence, simulation, and physical execution. He is an active public speaker, podcast host, and community builder, sharing practical insights on scaling AI from pilot projects to enterprise-wide adoption. The Road to Useful Robots This talk explores the current state of AI-enabled robots and the issues with deploying more advanced models on constrained hardware, including limited compute and power budgets. It then moves on to what's next for developing useful, intelligent robots. About the Speaker Michael Hart, also known as Mike Likes Robots. is a robotics software engineer and content creator. His mission is to share knowledge to accelerate robotics. @mikelikesrobots |
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
|
|
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
2025-09-12 · 19:00
Join us for day three in a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI, Manufacturing and Robotics. Date and Time Sept 12 at 9 AM Pacific Location Virtual. Register for the Zoom! Towards Robotics Foundation Models that Can Reason In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities. Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings. In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for action reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks. About the Speaker Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire. Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection In this talk, I will share our recent research efforts in visual industrial anomaly detection. It will present a comprehensive empirical analysis with a focus on real-world applications, demonstrating that recent SOTA methods perform worse than methods from 2021 when evaluated on a variety of datasets. We will also investigate how different practical aspects, such as input size, distribution shift, data contamination, and having a validation set, affect the results. About the Speaker Aimira Baitieva is a Research Engineer at Valeo, where she works primarily on computer vision problems. Her recent work has been focused on deep learning anomaly detection for automating visual inspection, incorporating both research and practical applications in the manufacturing sector. The Digital Reasoning Thread in Manufacturing: Orchestrating Vision, Simulation, and Robotics Manufacturing is entering a new phase where AI is no longer confined to isolated tasks like defect detection or predictive maintenance. Advances in reasoning AI, simulation, and robotics are converging to create end-to-end systems that can perceive, decide, and act – in both digital and physical environments. This talk introduces the Digital Reasoning Thread – a consistent layer of AI reasoning that runs through every stage of manufacturing, connecting visual intelligence, digital twins, simulation environments, and robotic execution. By linking perception with advanced reasoning and action, this approach enables faster, higher-quality decisions across the entire value chain. We will explore real-world examples of applying reasoning AI in industrial settings, combining simulation-driven analysis, orchestration frameworks, and the foundations needed for robotic execution in the physical world. Along the way, we will examine the key technical building blocks – from data pipelines and interoperability standards to agentic AI architectures – that make this level of integration possible. Attendees will gain a clear understanding of how to bridge AI-driven perception with simulation and robotics, and what it takes to move from isolated pilots to orchestrated, autonomous manufacturing systems. About the Speaker Vlad Larichev is an Industrial AI Lead at Accenture Industry X, specializing in applying AI, generative AI, and agentic AI to engineering, manufacturing, and large-scale industrial operations. With a background as an engineer, solution architect, and software developer, he has led AI initiatives across sectors including automotive, energy, and consumer goods, integrating advanced analytics, computer vision, and simulation into complex industrial environments. Vlad is the creator of the Digital Reasoning Thread – a framework for connecting AI reasoning across visual intelligence, simulation, and physical execution. He is an active public speaker, podcast host, and community builder, sharing practical insights on scaling AI from pilot projects to enterprise-wide adoption. The Road to Useful Robots This talk explores the current state of AI-enabled robots and the issues with deploying more advanced models on constrained hardware, including limited compute and power budgets. It then moves on to what's next for developing useful, intelligent robots. About the Speaker Michael Hart, also known as Mike Likes Robots. is a robotics software engineer and content creator. His mission is to share knowledge to accelerate robotics. @mikelikesrobots |
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
|
|
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
2025-09-12 · 19:00
Join us for day three in a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI, Manufacturing and Robotics. Date and Time Sept 12 at 9 AM Pacific Location Virtual. Register for the Zoom! Towards Robotics Foundation Models that Can Reason In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities. Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings. In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for action reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks. About the Speaker Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire. Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection In this talk, I will share our recent research efforts in visual industrial anomaly detection. It will present a comprehensive empirical analysis with a focus on real-world applications, demonstrating that recent SOTA methods perform worse than methods from 2021 when evaluated on a variety of datasets. We will also investigate how different practical aspects, such as input size, distribution shift, data contamination, and having a validation set, affect the results. About the Speaker Aimira Baitieva is a Research Engineer at Valeo, where she works primarily on computer vision problems. Her recent work has been focused on deep learning anomaly detection for automating visual inspection, incorporating both research and practical applications in the manufacturing sector. The Digital Reasoning Thread in Manufacturing: Orchestrating Vision, Simulation, and Robotics Manufacturing is entering a new phase where AI is no longer confined to isolated tasks like defect detection or predictive maintenance. Advances in reasoning AI, simulation, and robotics are converging to create end-to-end systems that can perceive, decide, and act – in both digital and physical environments. This talk introduces the Digital Reasoning Thread – a consistent layer of AI reasoning that runs through every stage of manufacturing, connecting visual intelligence, digital twins, simulation environments, and robotic execution. By linking perception with advanced reasoning and action, this approach enables faster, higher-quality decisions across the entire value chain. We will explore real-world examples of applying reasoning AI in industrial settings, combining simulation-driven analysis, orchestration frameworks, and the foundations needed for robotic execution in the physical world. Along the way, we will examine the key technical building blocks – from data pipelines and interoperability standards to agentic AI architectures – that make this level of integration possible. Attendees will gain a clear understanding of how to bridge AI-driven perception with simulation and robotics, and what it takes to move from isolated pilots to orchestrated, autonomous manufacturing systems. About the Speaker Vlad Larichev is an Industrial AI Lead at Accenture Industry X, specializing in applying AI, generative AI, and agentic AI to engineering, manufacturing, and large-scale industrial operations. With a background as an engineer, solution architect, and software developer, he has led AI initiatives across sectors including automotive, energy, and consumer goods, integrating advanced analytics, computer vision, and simulation into complex industrial environments. Vlad is the creator of the Digital Reasoning Thread – a framework for connecting AI reasoning across visual intelligence, simulation, and physical execution. He is an active public speaker, podcast host, and community builder, sharing practical insights on scaling AI from pilot projects to enterprise-wide adoption. The Road to Useful Robots This talk explores the current state of AI-enabled robots and the issues with deploying more advanced models on constrained hardware, including limited compute and power budgets. It then moves on to what's next for developing useful, intelligent robots. About the Speaker Michael Hart, also known as Mike Likes Robots. is a robotics software engineer and content creator. His mission is to share knowledge to accelerate robotics. @mikelikesrobots |
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
|
|
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
2025-09-12 · 19:00
Join us for day three in a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI, Manufacturing and Robotics. Date and Time Sept 12 at 9 AM Pacific Location Virtual. Register for the Zoom! Towards Robotics Foundation Models that Can Reason In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities. Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings. In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for action reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks. About the Speaker Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire. Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection In this talk, I will share our recent research efforts in visual industrial anomaly detection. It will present a comprehensive empirical analysis with a focus on real-world applications, demonstrating that recent SOTA methods perform worse than methods from 2021 when evaluated on a variety of datasets. We will also investigate how different practical aspects, such as input size, distribution shift, data contamination, and having a validation set, affect the results. About the Speaker Aimira Baitieva is a Research Engineer at Valeo, where she works primarily on computer vision problems. Her recent work has been focused on deep learning anomaly detection for automating visual inspection, incorporating both research and practical applications in the manufacturing sector. The Digital Reasoning Thread in Manufacturing: Orchestrating Vision, Simulation, and Robotics Manufacturing is entering a new phase where AI is no longer confined to isolated tasks like defect detection or predictive maintenance. Advances in reasoning AI, simulation, and robotics are converging to create end-to-end systems that can perceive, decide, and act – in both digital and physical environments. This talk introduces the Digital Reasoning Thread – a consistent layer of AI reasoning that runs through every stage of manufacturing, connecting visual intelligence, digital twins, simulation environments, and robotic execution. By linking perception with advanced reasoning and action, this approach enables faster, higher-quality decisions across the entire value chain. We will explore real-world examples of applying reasoning AI in industrial settings, combining simulation-driven analysis, orchestration frameworks, and the foundations needed for robotic execution in the physical world. Along the way, we will examine the key technical building blocks – from data pipelines and interoperability standards to agentic AI architectures – that make this level of integration possible. Attendees will gain a clear understanding of how to bridge AI-driven perception with simulation and robotics, and what it takes to move from isolated pilots to orchestrated, autonomous manufacturing systems. About the Speaker Vlad Larichev is an Industrial AI Lead at Accenture Industry X, specializing in applying AI, generative AI, and agentic AI to engineering, manufacturing, and large-scale industrial operations. With a background as an engineer, solution architect, and software developer, he has led AI initiatives across sectors including automotive, energy, and consumer goods, integrating advanced analytics, computer vision, and simulation into complex industrial environments. Vlad is the creator of the Digital Reasoning Thread – a framework for connecting AI reasoning across visual intelligence, simulation, and physical execution. He is an active public speaker, podcast host, and community builder, sharing practical insights on scaling AI from pilot projects to enterprise-wide adoption. The Road to Useful Robots This talk explores the current state of AI-enabled robots and the issues with deploying more advanced models on constrained hardware, including limited compute and power budgets. It then moves on to what's next for developing useful, intelligent robots. About the Speaker Michael Hart, also known as Mike Likes Robots. is a robotics software engineer and content creator. His mission is to share knowledge to accelerate robotics. @mikelikesrobots |
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
|
|
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
2025-09-12 · 19:00
Join us for day three in a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI, Manufacturing and Robotics. Date and Time Sept 12 at 9 AM Pacific Location Virtual. Register for the Zoom! Towards Robotics Foundation Models that Can Reason In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities. Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings. In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for action reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks. About the Speaker Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire. Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection In this talk, I will share our recent research efforts in visual industrial anomaly detection. It will present a comprehensive empirical analysis with a focus on real-world applications, demonstrating that recent SOTA methods perform worse than methods from 2021 when evaluated on a variety of datasets. We will also investigate how different practical aspects, such as input size, distribution shift, data contamination, and having a validation set, affect the results. About the Speaker Aimira Baitieva is a Research Engineer at Valeo, where she works primarily on computer vision problems. Her recent work has been focused on deep learning anomaly detection for automating visual inspection, incorporating both research and practical applications in the manufacturing sector. The Digital Reasoning Thread in Manufacturing: Orchestrating Vision, Simulation, and Robotics Manufacturing is entering a new phase where AI is no longer confined to isolated tasks like defect detection or predictive maintenance. Advances in reasoning AI, simulation, and robotics are converging to create end-to-end systems that can perceive, decide, and act – in both digital and physical environments. This talk introduces the Digital Reasoning Thread – a consistent layer of AI reasoning that runs through every stage of manufacturing, connecting visual intelligence, digital twins, simulation environments, and robotic execution. By linking perception with advanced reasoning and action, this approach enables faster, higher-quality decisions across the entire value chain. We will explore real-world examples of applying reasoning AI in industrial settings, combining simulation-driven analysis, orchestration frameworks, and the foundations needed for robotic execution in the physical world. Along the way, we will examine the key technical building blocks – from data pipelines and interoperability standards to agentic AI architectures – that make this level of integration possible. Attendees will gain a clear understanding of how to bridge AI-driven perception with simulation and robotics, and what it takes to move from isolated pilots to orchestrated, autonomous manufacturing systems. About the Speaker Vlad Larichev is an Industrial AI Lead at Accenture Industry X, specializing in applying AI, generative AI, and agentic AI to engineering, manufacturing, and large-scale industrial operations. With a background as an engineer, solution architect, and software developer, he has led AI initiatives across sectors including automotive, energy, and consumer goods, integrating advanced analytics, computer vision, and simulation into complex industrial environments. Vlad is the creator of the Digital Reasoning Thread – a framework for connecting AI reasoning across visual intelligence, simulation, and physical execution. He is an active public speaker, podcast host, and community builder, sharing practical insights on scaling AI from pilot projects to enterprise-wide adoption. The Road to Useful Robots This talk explores the current state of AI-enabled robots and the issues with deploying more advanced models on constrained hardware, including limited compute and power budgets. It then moves on to what's next for developing useful, intelligent robots. About the Speaker Michael Hart, also known as Mike Likes Robots. is a robotics software engineer and content creator. His mission is to share knowledge to accelerate robotics. @mikelikesrobots |
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
|
|
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
2025-09-12 · 19:00
Join us for day three in a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI, Manufacturing and Robotics. Date and Time Sept 12 at 9 AM Pacific Location Virtual. Register for the Zoom! Towards Robotics Foundation Models that Can Reason In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities. Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings. In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for action reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks. About the Speaker Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire. Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection In this talk, I will share our recent research efforts in visual industrial anomaly detection. It will present a comprehensive empirical analysis with a focus on real-world applications, demonstrating that recent SOTA methods perform worse than methods from 2021 when evaluated on a variety of datasets. We will also investigate how different practical aspects, such as input size, distribution shift, data contamination, and having a validation set, affect the results. About the Speaker Aimira Baitieva is a Research Engineer at Valeo, where she works primarily on computer vision problems. Her recent work has been focused on deep learning anomaly detection for automating visual inspection, incorporating both research and practical applications in the manufacturing sector. The Digital Reasoning Thread in Manufacturing: Orchestrating Vision, Simulation, and Robotics Manufacturing is entering a new phase where AI is no longer confined to isolated tasks like defect detection or predictive maintenance. Advances in reasoning AI, simulation, and robotics are converging to create end-to-end systems that can perceive, decide, and act – in both digital and physical environments. This talk introduces the Digital Reasoning Thread – a consistent layer of AI reasoning that runs through every stage of manufacturing, connecting visual intelligence, digital twins, simulation environments, and robotic execution. By linking perception with advanced reasoning and action, this approach enables faster, higher-quality decisions across the entire value chain. We will explore real-world examples of applying reasoning AI in industrial settings, combining simulation-driven analysis, orchestration frameworks, and the foundations needed for robotic execution in the physical world. Along the way, we will examine the key technical building blocks – from data pipelines and interoperability standards to agentic AI architectures – that make this level of integration possible. Attendees will gain a clear understanding of how to bridge AI-driven perception with simulation and robotics, and what it takes to move from isolated pilots to orchestrated, autonomous manufacturing systems. About the Speaker Vlad Larichev is an Industrial AI Lead at Accenture Industry X, specializing in applying AI, generative AI, and agentic AI to engineering, manufacturing, and large-scale industrial operations. With a background as an engineer, solution architect, and software developer, he has led AI initiatives across sectors including automotive, energy, and consumer goods, integrating advanced analytics, computer vision, and simulation into complex industrial environments. Vlad is the creator of the Digital Reasoning Thread – a framework for connecting AI reasoning across visual intelligence, simulation, and physical execution. He is an active public speaker, podcast host, and community builder, sharing practical insights on scaling AI from pilot projects to enterprise-wide adoption. The Road to Useful Robots This talk explores the current state of AI-enabled robots and the issues with deploying more advanced models on constrained hardware, including limited compute and power budgets. It then moves on to what's next for developing useful, intelligent robots. About the Speaker Michael Hart, also known as Mike Likes Robots. is a robotics software engineer and content creator. His mission is to share knowledge to accelerate robotics. @mikelikesrobots |
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
|
|
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
2025-09-12 · 19:00
Join us for day three in a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI, Manufacturing and Robotics. Date and Time Sept 12 at 9 AM Pacific Location Virtual. Register for the Zoom! Towards Robotics Foundation Models that Can Reason In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities. Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings. In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for action reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks. About the Speaker Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire. Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection In this talk, I will share our recent research efforts in visual industrial anomaly detection. It will present a comprehensive empirical analysis with a focus on real-world applications, demonstrating that recent SOTA methods perform worse than methods from 2021 when evaluated on a variety of datasets. We will also investigate how different practical aspects, such as input size, distribution shift, data contamination, and having a validation set, affect the results. About the Speaker Aimira Baitieva is a Research Engineer at Valeo, where she works primarily on computer vision problems. Her recent work has been focused on deep learning anomaly detection for automating visual inspection, incorporating both research and practical applications in the manufacturing sector. The Digital Reasoning Thread in Manufacturing: Orchestrating Vision, Simulation, and Robotics Manufacturing is entering a new phase where AI is no longer confined to isolated tasks like defect detection or predictive maintenance. Advances in reasoning AI, simulation, and robotics are converging to create end-to-end systems that can perceive, decide, and act – in both digital and physical environments. This talk introduces the Digital Reasoning Thread – a consistent layer of AI reasoning that runs through every stage of manufacturing, connecting visual intelligence, digital twins, simulation environments, and robotic execution. By linking perception with advanced reasoning and action, this approach enables faster, higher-quality decisions across the entire value chain. We will explore real-world examples of applying reasoning AI in industrial settings, combining simulation-driven analysis, orchestration frameworks, and the foundations needed for robotic execution in the physical world. Along the way, we will examine the key technical building blocks – from data pipelines and interoperability standards to agentic AI architectures – that make this level of integration possible. Attendees will gain a clear understanding of how to bridge AI-driven perception with simulation and robotics, and what it takes to move from isolated pilots to orchestrated, autonomous manufacturing systems. About the Speaker Vlad Larichev is an Industrial AI Lead at Accenture Industry X, specializing in applying AI, generative AI, and agentic AI to engineering, manufacturing, and large-scale industrial operations. With a background as an engineer, solution architect, and software developer, he has led AI initiatives across sectors including automotive, energy, and consumer goods, integrating advanced analytics, computer vision, and simulation into complex industrial environments. Vlad is the creator of the Digital Reasoning Thread – a framework for connecting AI reasoning across visual intelligence, simulation, and physical execution. He is an active public speaker, podcast host, and community builder, sharing practical insights on scaling AI from pilot projects to enterprise-wide adoption. The Road to Useful Robots This talk explores the current state of AI-enabled robots and the issues with deploying more advanced models on constrained hardware, including limited compute and power budgets. It then moves on to what's next for developing useful, intelligent robots. About the Speaker Michael Hart, also known as Mike Likes Robots. is a robotics software engineer and content creator. His mission is to share knowledge to accelerate robotics. @mikelikesrobots |
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)
|
|
July 24 - Women in AI
2025-07-24 · 16:00
Hear talks from experts on cutting-edge topics in AI, ML, and computer vision! When Jul 24, 2025 at 9 - 11 AM Pacific Where Online. Register for the Zoom Exploring Vision-Language-Action (VLA) Models: From LLMs to Embodied AI This talk will explore the evolution of foundation models, highlighting the shift from large language models (LLMs) to vision-language models (VLMs), and now to vision-language-action (VLA) models. We'll dive into the emerging field of robot instruction following—what it means, and how recent research is shaping its future. I will present insights from my 2024 work on natural language-based robot instruction following and connect it to more recent advancements driving progress in this domain. About the Speaker Shreya Sharma is a Research Engineer at Reality Labs, Meta, where she works on photorealistic human avatars for AR/VR applications. She holds a bachelor’s degree in Computer Science from IIT Delhi and a master’s in Robotics from Carnegie Mellon University. Shreya is also a member of the inaugural 2023 cohort of the Quad Fellowship. Her research interests lie at the intersection of robotics and vision foundation models. Farming with CLIP: Foundation Models for Biodiversity and Agriculture Using open-source tools, we will explore the power and limitations of foundation models in agriculture and biodiversity applications. Leveraging the BIOTROVE dataset. The largest publicly accessible biodiversity dataset curated from iNaturalist, we will showcase real-world use cases powered by vision-language models trained on 40 million captioned images. We focus on understanding zero-shot capabilities, taxonomy-aware evaluation, and data-centric curation workflows. We will demonstrate how to visualize, filter, evaluate, and augment data at scale. This session includes practical walkthroughs on embedding visualization with CLIP, dataset slicing by taxonomic hierarchy, identification of model failure modes, and building fine-tuned pest and crop monitoring models. Attendees will gain insights into how to apply multi-modal foundation models for critical challenges in agriculture, like ecosystem monitoring in farming. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry. Multi-modal AI in Medical Edge and Client Device Computing In this live demo, we explore the transformative potential of multi-modal AI in medical edge and client device computing, focusing on real-time inference on a local AI PC. Attendees will witness how users can upload medical images, such as X-Rays, and ask questions about the images to the AI model. Inference is executed locally on Intel's integrated GPU and NPU using OpenVINO, enabling developers without deep AI experience to create generative AI applications. About the Speaker Helena Klosterman is an AI Engineer at Intel, based in the Netherlands, Helena enables organizations to unlock the potential of AI with OpenVINO, Intel's AI inference runtime. She is passionate about democratizing AI, developer experience, and bridging the gap between complex AI technology and practical applications. The Business of AI The talk will focus on the importance of clearly defining a specific problem and a use case, how to quantify the potential benefits of an AI solution in terms of measurable outcomes, evaluating technical feasibility in terms of technical challenges and limitations of implementing an AI solution, and envisioning the future of enterprise AI. About the Speaker Milica Cvetkovic is an AI engineer and consultant driving the development and deployment of production-ready AI systems for diverse organizations. Her expertise spans custom machine learning, generative AI, and AI operationalization. With degrees in mathematics and statistics, she possesses a decade of experience in education and edtech, including curriculum design and machine learning instruction for technical and non-technical audiences. Prior to Google, Milica held a data scientist role in biotechnology and has a proven track record of advising startups, demonstrating a deep understanding of AI's practical application. |
July 24 - Women in AI
|
|
July 24 - Women in AI
2025-07-24 · 16:00
Hear talks from experts on cutting-edge topics in AI, ML, and computer vision! When Jul 24, 2025 at 9 - 11 AM Pacific Where Online. Register for the Zoom Exploring Vision-Language-Action (VLA) Models: From LLMs to Embodied AI This talk will explore the evolution of foundation models, highlighting the shift from large language models (LLMs) to vision-language models (VLMs), and now to vision-language-action (VLA) models. We'll dive into the emerging field of robot instruction following—what it means, and how recent research is shaping its future. I will present insights from my 2024 work on natural language-based robot instruction following and connect it to more recent advancements driving progress in this domain. About the Speaker Shreya Sharma is a Research Engineer at Reality Labs, Meta, where she works on photorealistic human avatars for AR/VR applications. She holds a bachelor’s degree in Computer Science from IIT Delhi and a master’s in Robotics from Carnegie Mellon University. Shreya is also a member of the inaugural 2023 cohort of the Quad Fellowship. Her research interests lie at the intersection of robotics and vision foundation models. Farming with CLIP: Foundation Models for Biodiversity and Agriculture Using open-source tools, we will explore the power and limitations of foundation models in agriculture and biodiversity applications. Leveraging the BIOTROVE dataset. The largest publicly accessible biodiversity dataset curated from iNaturalist, we will showcase real-world use cases powered by vision-language models trained on 40 million captioned images. We focus on understanding zero-shot capabilities, taxonomy-aware evaluation, and data-centric curation workflows. We will demonstrate how to visualize, filter, evaluate, and augment data at scale. This session includes practical walkthroughs on embedding visualization with CLIP, dataset slicing by taxonomic hierarchy, identification of model failure modes, and building fine-tuned pest and crop monitoring models. Attendees will gain insights into how to apply multi-modal foundation models for critical challenges in agriculture, like ecosystem monitoring in farming. About the Speaker Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry. Multi-modal AI in Medical Edge and Client Device Computing In this live demo, we explore the transformative potential of multi-modal AI in medical edge and client device computing, focusing on real-time inference on a local AI PC. Attendees will witness how users can upload medical images, such as X-Rays, and ask questions about the images to the AI model. Inference is executed locally on Intel's integrated GPU and NPU using OpenVINO, enabling developers without deep AI experience to create generative AI applications. About the Speaker Helena Klosterman is an AI Engineer at Intel, based in the Netherlands, Helena enables organizations to unlock the potential of AI with OpenVINO, Intel's AI inference runtime. She is passionate about democratizing AI, developer experience, and bridging the gap between complex AI technology and practical applications. The Business of AI The talk will focus on the importance of clearly defining a specific problem and a use case, how to quantify the potential benefits of an AI solution in terms of measurable outcomes, evaluating technical feasibility in terms of technical challenges and limitations of implementing an AI solution, and envisioning the future of enterprise AI. About the Speaker Milica Cvetkovic is an AI engineer and consultant driving the development and deployment of production-ready AI systems for diverse organizations. Her expertise spans custom machine learning, generative AI, and AI operationalization. With degrees in mathematics and statistics, she possesses a decade of experience in education and edtech, including curriculum design and machine learning instruction for technical and non-technical audiences. Prior to Google, Milica held a data scientist role in biotechnology and has a proven track record of advising startups, demonstrating a deep understanding of AI's practical application. |
July 24 - Women in AI
|