talk-data.com
People (8 results)
See all 8 →Activities & events
| Title & Speakers | Event |
|---|---|
|
Session #17: Google AI Seminar (Virtual)
2025-05-08 · 16:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. May 8: AI Seminar (Virtual S17): Google Gemini and Vertex AI More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
Session #17: Google AI Seminar (Virtual)
|
|
Session #17: Google AI Seminar (Virtual)
2025-05-08 · 16:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI/ML/Data, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. May 8: AI Seminar (Virtual S17): Google Gemini and Vertex AI More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
Session #17: Google AI Seminar (Virtual)
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment. RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment. RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
Building a state-of-the-art AI web researcher
2025-04-17 · 21:30
Boris Toledano
– COO and Co-founder
@ Linkup
In this session, we'll discuss the next-generation search infrastructure that gives AI agents seamless access to web information and hard-to-find intelligence. Traditional methods can't handle these new workflows, and legacy search engines - designed for human attention - aren't built for these emerging AI use cases. We will address: a)The power of web search for LLM-based applications; b) the need to avoid scraping of legacy search engines; c) How we're building a new category of "searcher" models; and d) What you can power with a web retrieval engine, including demos. |
|
|
Building a Self-Improving Agent
2025-04-17 · 21:30
Agents are powerful—but without feedback, they're flying blind. In this talk, we’ll walk through how to build self-improving agents by closing the loop with evaluation, experimentation, tracing, and prompt optimization. You’ll learn how to capture the right telemetry, run meaningful tests, and apply insights in a way that actually improves performance over time. Whether you’re building copilots, chatbots, or autonomous workflows, this session will give you the practical tools and architecture patterns you need to make your agents smarter—automatically. |
|
|
AI Meetup (April): Agentic AI
2025-04-17 · 21:30
** Important **: Due to room capacity and building security, you must register on the event website for admission. Description: Welcome to the GenAI meetup in New York City. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers. Agenda: * 5:30pm\~6:00pm: Checkin, Food and networking * 6:00pm\~6:10pm: Welcome/community update * 6:10pm\~8:00pm: Tech talks * 8:00pm: Q&A, Open discussion Tech Talk: Building a state-of-the-art AI web researcher Speaker: Boris Toledano (COO and Co-founder of Linkup) Abstract: In this session, we'll discuss the next-generation search infrastructure that gives AI agents seamless access to web information and hard-to-find intelligence. Traditional methods can't handle these new workflows, and legacy search engines - designed for human attention - aren't built for these emerging AI use cases. We will address: a)The power of web search for LLM-based applications; b) the need to avoid scraping of legacy search engines; c) How we're building a new category of "searcher" models; and d) What you can power with a web retrieval engine, including demos. Tech Talk: Building a Self-Improving Agent Speaker: John Gilhuly (Arize AI) Abstract: Agents are powerful—but without feedback, they're flying blind. In this talk, we’ll walk through how to build self-improving agents by closing the loop with evaluation, experimentation, tracing, and prompt optimization. You’ll learn how to capture the right telemetry, run meaningful tests, and apply insights in a way that actually improves performance over time. Whether you’re building copilots, chatbots, or autonomous workflows, this session will give you the practical tools and architecture patterns you need to make your agents smarter—automatically. Speakers and Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics Sponsors: We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsor. Sponsors will not only speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 20,000+ AI developers in New York and 500K+ worldwide. Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Meetup (April): Agentic AI
|
|
Global AI Bootcamp {Berlin} | In-person
2025-04-11 · 11:00
Global AI Bootcamp Berlin 2025For Community, By the CommunityJoin us for Global AI Bootcamp Berlin 2025, a dedicated space for Tech and AI enthusiasts, developers, and professionals to learn, share, and collaborate. This event is part of a global initiative bringing together AI experts and learners to explore the latest innovations, best practices, and real-world applications of Artificial Intelligence on Microsoft Azure. Event Details
Pre-recorded Keynote SessionHear from Scott Hanselman, Guido van Rossum, Jennifer Marsman, and Sarah Bird as they discuss AI’s impact on development, Python’s role, and the importance of ethical AI. In-person Closing KeynoteHear from Christian Heilmann, VP of DevRel at WeAreDevelopers, as he explores “Vibe coding, creativity, craft and professionalism – are we making ourselves redundant?” – a thought-provoking session on the evolving role of developers in the age of AI. Speakers & Sessions at Global AI Bootcamp Berlin 2025
What to Expect?
Agenda \| Event Schedule
(Agenda subject to updates.) Global AI BootcampThe Global AI Community connects over 60,000 AI enthusiasts worldwide, fostering learning, collaboration, and skill development. The Global AI Bootcamp is a free, community-driven event dedicated to exploring AI's transformative potential, with a focus on Azure AI and Copilots. Partners, Friends and Communities
Crew and Team
** Important Notice**Photos and videos will be taken during the event for community highlights and social media. If you prefer not to be photographed, please inform the organizers upon arrival. Register & Learn More📌 Claim your Digital Event Badge Code: MRLFHW 📌 Register for the event 📌 Meetup Event Page #GlobalAIBootcamp \| globalai.community |
Global AI Bootcamp {Berlin} | In-person
|
|
Global AI Bootcamp {Berlin} | In-person
2025-04-11 · 11:00
Global AI Bootcamp Berlin 2025For Community, By the CommunityJoin us for Global AI Bootcamp Berlin 2025, a dedicated space for Tech and AI enthusiasts, developers, and professionals to learn, share, and collaborate. This event is part of a global initiative bringing together AI experts and learners to explore the latest innovations, best practices, and real-world applications of Artificial Intelligence on Microsoft Azure. Event Details
Pre-recorded Keynote SessionHear from Scott Hanselman, Guido van Rossum, Jennifer Marsman, and Sarah Bird as they discuss AI’s impact on development, Python’s role, and the importance of ethical AI. In-person Closing KeynoteHear from Christian Heilmann, VP of DevRel at WeAreDevelopers, as he explores “Vibe coding, creativity, craft and professionalism – are we making ourselves redundant?” – a thought-provoking session on the evolving role of developers in the age of AI. Speakers & Sessions at Global AI Bootcamp Berlin 2025
What to Expect?
Agenda \| Event Schedule
(Agenda subject to updates.) Global AI BootcampThe Global AI Community connects over 60,000 AI enthusiasts worldwide, fostering learning, collaboration, and skill development. The Global AI Bootcamp is a free, community-driven event dedicated to exploring AI's transformative potential, with a focus on Azure AI and Copilots. Partners, Friends and Communities
Crew and Team
** Important Notice**Photos and videos will be taken during the event for community highlights and social media. If you prefer not to be photographed, please inform the organizers upon arrival. Register & Learn More📌 Claim your Digital Event Badge Code: MRLFHW 📌 Register for the event 📌 Meetup Event Page #GlobalAIBootcamp \| globalai.community |
Global AI Bootcamp {Berlin} | In-person
|
|
Global AI Bootcamp {Berlin} | In-person
2025-04-11 · 11:00
Global AI Bootcamp Berlin 2025For Community, By the CommunityJoin us for Global AI Bootcamp Berlin 2025, a dedicated space for Tech and AI enthusiasts, developers, and professionals to learn, share, and collaborate. This event is part of a global initiative bringing together AI experts and learners to explore the latest innovations, best practices, and real-world applications of Artificial Intelligence on Microsoft Azure. Event Details
Pre-recorded Keynote SessionHear from Scott Hanselman, Guido van Rossum, Jennifer Marsman, and Sarah Bird as they discuss AI’s impact on development, Python’s role, and the importance of ethical AI. In-person Closing KeynoteHear from Christian Heilmann, VP of DevRel at WeAreDevelopers, as he explores “Vibe coding, creativity, craft and professionalism – are we making ourselves redundant?” – a thought-provoking session on the evolving role of developers in the age of AI. Speakers & Sessions at Global AI Bootcamp Berlin 2025
What to Expect?
Agenda \| Event Schedule
(Agenda subject to updates.) Global AI BootcampThe Global AI Community connects over 60,000 AI enthusiasts worldwide, fostering learning, collaboration, and skill development. The Global AI Bootcamp is a free, community-driven event dedicated to exploring AI's transformative potential, with a focus on Azure AI and Copilots. Partners, Friends and Communities
Crew and Team
** Important Notice**Photos and videos will be taken during the event for community highlights and social media. If you prefer not to be photographed, please inform the organizers upon arrival. Register & Learn More📌 Claim your Digital Event Badge Code: MRLFHW 📌 Register for the event 📌 Meetup Event Page #GlobalAIBootcamp \| globalai.community |
Global AI Bootcamp {Berlin} | In-person
|
|
AI Seminar #2: GenAI and AI Agent with Google and Intel
2025-03-19 · 16:00
Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link). Join us for a series of AI-focused webinars designed to enhance your development skills, accelerate productivity, and explore the latest AI innovations. Whether you're building local LLMs, optimizing AI workflows, or deploying intelligent AI agents, these sessions—led by industry experts—will provide invaluable insights and hands-on experience. Each session is tailored to different skill levels, from novice to advanced developers, offering deep technical insights and real-world applications. Register for each of the sessions:
Session #2: Building and Deploying AI Agents with OPEA Speakers: Alex Sin (Intel) , Louie Tsai (Intel) Abstract: AI agents add new capabilities for responding intelligently to queries, data collection, and decision-making, assisted by additional functionality from Open Platforms for Enterprise AI (OPEA). Retrieval-Augmented Generation (RAG) bolstered by OPEA grants another level to agent design, strengthened by Intel® Gaudi® AI accelerators and Intel® Xeon® processors. This session provides guidance on designing, building, customizing, and deploying AI agents for hierarchical, multi-agent systems with greater success than non-agentic GenAI solutions Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Seminar #2: GenAI and AI Agent with Google and Intel
|
|
AI Seminar #2: GenAI and AI Agent with Google and Intel
2025-03-19 · 16:00
Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link). Join us for a series of AI-focused webinars designed to enhance your development skills, accelerate productivity, and explore the latest AI innovations. Whether you're building local LLMs, optimizing AI workflows, or deploying intelligent AI agents, these sessions—led by industry experts—will provide invaluable insights and hands-on experience. Each session is tailored to different skill levels, from novice to advanced developers, offering deep technical insights and real-world applications. Register for each of the sessions:
Session #2: Building and Deploying AI Agents with OPEA Speakers: Alex Sin (Intel) , Louie Tsai (Intel) Abstract: AI agents add new capabilities for responding intelligently to queries, data collection, and decision-making, assisted by additional functionality from Open Platforms for Enterprise AI (OPEA). Retrieval-Augmented Generation (RAG) bolstered by OPEA grants another level to agent design, strengthened by Intel® Gaudi® AI accelerators and Intel® Xeon® processors. This session provides guidance on designing, building, customizing, and deploying AI agents for hierarchical, multi-agent systems with greater success than non-agentic GenAI solutions Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Seminar #2: GenAI and AI Agent with Google and Intel
|
|
AI Seminar #2: GenAI and AI Agent with Google and Intel
2025-03-19 · 16:00
Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link). Join us for a series of AI-focused webinars designed to enhance your development skills, accelerate productivity, and explore the latest AI innovations. Whether you're building local LLMs, optimizing AI workflows, or deploying intelligent AI agents, these sessions—led by industry experts—will provide invaluable insights and hands-on experience. Each session is tailored to different skill levels, from novice to advanced developers, offering deep technical insights and real-world applications. Register for each of the sessions:
Session #2: Building and Deploying AI Agents with OPEA Speakers: Alex Sin (Intel) , Louie Tsai (Intel) Abstract: AI agents add new capabilities for responding intelligently to queries, data collection, and decision-making, assisted by additional functionality from Open Platforms for Enterprise AI (OPEA). Retrieval-Augmented Generation (RAG) bolstered by OPEA grants another level to agent design, strengthened by Intel® Gaudi® AI accelerators and Intel® Xeon® processors. This session provides guidance on designing, building, customizing, and deploying AI agents for hierarchical, multi-agent systems with greater success than non-agentic GenAI solutions Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Seminar #2: GenAI and AI Agent with Google and Intel
|
|
AI Seminar #2: GenAI and AI Agent with Google and Intel
2025-03-19 · 16:00
Important: RSVP here to receive joining link. (rsvp on meetup will NOT receive joining link). Join us for a series of AI-focused webinars designed to enhance your development skills, accelerate productivity, and explore the latest AI innovations. Whether you're building local LLMs, optimizing AI workflows, or deploying intelligent AI agents, these sessions—led by industry experts—will provide invaluable insights and hands-on experience. Each session is tailored to different skill levels, from novice to advanced developers, offering deep technical insights and real-world applications. Register for each of the sessions:
Session #2: Building and Deploying AI Agents with OPEA Speakers: Alex Sin (Intel) , Louie Tsai (Intel) Abstract: AI agents add new capabilities for responding intelligently to queries, data collection, and decision-making, assisted by additional functionality from Open Platforms for Enterprise AI (OPEA). Retrieval-Augmented Generation (RAG) bolstered by OPEA grants another level to agent design, strengthened by Intel® Gaudi® AI accelerators and Intel® Xeon® processors. This session provides guidance on designing, building, customizing, and deploying AI agents for hierarchical, multi-agent systems with greater success than non-agentic GenAI solutions Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Seminar #2: GenAI and AI Agent with Google and Intel
|