talk-data.com
People (588 results)
See all 588 →Activities & events
| Title & Speakers | Event |
|---|---|
|
Google AI DevTalk (Virtual) - Ep 20
2025-08-13 · 15:55
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Accelerate AI Innovations with Vertex Managed AI Platform Speaker: Yu-Hua Yang (Google) Abstract: The Vertex AI platform, leveraging Google's expertise, offers a unified solution for the entire AI development and deployment lifecycle, an alternative to using raw compute for AI infrastructure. This managed platform supports both predictive and generative AI, helping customers reduce technical debt and operational overhead, speed up time to market, and maximize the value derived from their AI innovations. More Virtual Sessions: with Google: With Oracle:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
Google AI DevTalk (Virtual) - Ep 20
|
|
Google AI DevTalk (Virtual) - Ep 20
2025-08-13 · 15:50
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Accelerate AI Innovations with Vertex Managed AI Platform Speaker: Yu-Hua Yang (Google) Abstract: The Vertex AI platform, leveraging Google's expertise, offers a unified solution for the entire AI development and deployment lifecycle, an alternative to using raw compute for AI infrastructure. This managed platform supports both predictive and generative AI, helping customers reduce technical debt and operational overhead, speed up time to market, and maximize the value derived from their AI innovations. More Virtual Sessions: with Google: With Oracle:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
Google AI DevTalk (Virtual) - Ep 20
|
|
AI DevTalk (Virtual) with Google Cloud - Ep 20
2025-08-13 · 15:50
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Accelerate AI Innovations with Vertex Managed AI Platform Speaker: Yu-Hua Yang (Google) Abstract: The Vertex AI platform, leveraging Google's expertise, offers a unified solution for the entire AI development and deployment lifecycle, an alternative to using raw compute for AI infrastructure. This managed platform supports both predictive and generative AI, helping customers reduce technical debt and operational overhead, speed up time to market, and maximize the value derived from their AI innovations. More Virtual Sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI DevTalk (Virtual) with Google Cloud - Ep 20
|
|
Google AI DevTalk (Virtual) - Ep 20
2025-08-13 · 15:50
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Accelerate AI Innovations with Vertex Managed AI Platform Speaker: Yu-Hua Yang (Google) Abstract: The Vertex AI platform, leveraging Google's expertise, offers a unified solution for the entire AI development and deployment lifecycle, an alternative to using raw compute for AI infrastructure. This managed platform supports both predictive and generative AI, helping customers reduce technical debt and operational overhead, speed up time to market, and maximize the value derived from their AI innovations. More Virtual Sessions: with Google: With Oracle:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
Google AI DevTalk (Virtual) - Ep 20
|
|
AI DevTalk (Virtual) with Google Cloud - Ep 20
2025-08-13 · 15:50
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Accelerate AI Innovations with Vertex Managed AI Platform Speaker: Yu-Hua Yang (Google) Abstract: The Vertex AI platform, leveraging Google's expertise, offers a unified solution for the entire AI development and deployment lifecycle, an alternative to using raw compute for AI infrastructure. This managed platform supports both predictive and generative AI, helping customers reduce technical debt and operational overhead, speed up time to market, and maximize the value derived from their AI innovations. More Virtual Sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI DevTalk (Virtual) with Google Cloud - Ep 20
|
|
AI DevTalk (Virtual) with Google Cloud - Ep 19
2025-08-13 · 15:50
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). Description: Welcome to the weekly AI virtual seminars, in collaboration with Google. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Accelerate AI Innovations with Vertex Managed AI Platform Speaker: Yu-Hua Yang (Google) Abstract: The Vertex AI platform, leveraging Google's expertise, offers a unified solution for the entire AI development and deployment lifecycle, an alternative to using raw compute for AI infrastructure. This managed platform supports both predictive and generative AI, helping customers reduce technical debt and operational overhead, speed up time to market, and maximize the value derived from their AI innovations. More Virtual Sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI DevTalk (Virtual) with Google Cloud - Ep 19
|
|
Liuhuaying Yang—This Is Not Interesting, But It Could Be (Outlier 2025)
2025-07-25 · 12:23
Liuhuaying Yang—This Is Not Interesting, But It Could Be (Outlier 2025) 🌟Outlier is a one-of-a-kind data visualization conference hosted by the Data Visualization Society. Outlier brings together all corners of the data visualization community, from artists to business intelligence developers, working in various tech stacks and media. Attendees stretch their creativity and learn from practitioners who they may not otherwise connect with. Learn more on the Outlier website: https://www.outlierconf.com/ 📈About the Data Visualization Society: The Data Visualization Society was founded to serve as a professional home for those working across the discipline. Our mission is to connect data visualizers across tech stacks, subject areas, and experience. Advance your skills and grow your network by joining our community: https://www.datavisualizationsociety.org/ |
Outlier Conference 2025 |
|
June Meetup: Testing Audio Apps Silently
2025-06-03 · 16:00
It's another iteration of our monthly meetup. This month, we are looking forward to talk by Jiajun Yang from Holoplot. See talk details below. As always, there will be time to network over drinks and pizza after the talk. Anyone interested in Audio development is welcome! Attendance is limited to people who have RSVP'd. If the event is full, please register for the waiting list in case a spot opens up. Testing Audio Apps Silently: A Walk-Through of Automated Capture Tools Holoplot is a 3D-Audio-Beamforming loudspeaker company, but its audio eco system spanned from — embedded hardware running real-time multichannel DSP, through large scale Audio-over-IP distribution, to desktop application for auralization, etc. This brings challenges to setting up silent, safe and repeatable testing. In this talk I’ll show mostly open-source, off-the-shelf tools you can use to automate the capture in a range of scenarios. No more punishing your speakers and eardrums. Jiajun Yang is a senior software development engineer in test at Holoplot. With a background in music technology, he join Holoplot in 2020 and has since built on a range of audio/software test automation development. His work focuses on Audio-over-IP, embedded system and desktop applications. |
June Meetup: Testing Audio Apps Silently
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment. RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
2025-04-24 · 17:00
This is a virtual event. Towards a Multimodal AI Agent that Can See, Talk and Act The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions. First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system. Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds. About the Speaker Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments. ConceptAttention: Interpreting the Representations of Diffusion Transformers Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment. We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models! Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing. About the Speaker Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts. RelationField: Relate Anything in Radiance Fields Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance. About the Speaker Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment. RGB-X Model Development: Exploring Four Channel ML Workflows Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats! About the Speaker Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience. |
April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
|
|
Details IAQF & Thalesians Seminar Series: Towards Professional Readiness of LLMs in Financial Regulations - A Seminar by Xiao-Yang Liu 6:00 PM Seminar Begins 7:30 PM Reception Hybrid Event Location: Fordham University McNally Amphitheater 140 West 62nd Street New York, NY 10023 Free Registration! For Virtual Attendees: Please email [email protected] for the link. Abstract: In this talk, Xiao-Yang Liu will showcase their FinGPT--an open-source counterpart of BloombergGPT, on financial regulations. In particular, the team's two-year efforts on benchmarking financial large language models, with a zooming in Financial Regulations. He will also share ongoing projects in GenAI Research on Open Finance at Columbia University. The financial industry operates within a labyrinth of complex regulations and industry standards designed to maintain market integrity and ensure reliability in financial reporting and compliance processes. Intricate financial regulations and standards have presented significant challenges for financial professionals and organizations. Large language models (LLMs), such as GPT-4o, Llama 3.1, and DeepSeek's V3/R1 models, have shown remarkable capabilities in natural language understanding and generation, making them promising for applications in the financial sector. However, current LLMs face challenges in the domain of financial regulations and industry standards. These challenges include grasping specialized regulatory language, maintaining up-to-date knowledge of evolving regulations and industry standards, and ensuring interpretability and ethical considerations in their responses. Bio: Dr. Xiao-Yang Liu graduated in Electrical Engineering of Columbia University. He is a part-time researcher at SecureFinAI Lab, Columbia University, and a faculty member in RPI's CS department. His research interests include Reinforcement Learning, Large Language Models, Quantum Computing, and applications to finance. He created the popular open-source projects, FinGPT, FinRL, and ElegantRL. |
Hybrid: Professional Readiness of LLMs in Financial Regulations - Xiao-Yang Liu
|
|
ECCV Redux: Day 4 - Nov 22
2024-11-22 · 17:00
Missed the European Conference on Computer Vision (ECCV) last month? Have no fear, we have collected some of the best research from the show into a series of online events. Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning Video Anomaly Detection (VAD) is critical for applications such as surveillance and autonomous driving. However, existing methods lack transparent reasoning, limiting public trust in real-world deployments. We introduce a rule-based reasoning framework that leverages Large Language Models (LLMs) to induce detection rules from few-shot normal samples and apply them to identify anomalies, incorporating strategies such as rule aggregation and perception smoothing to enhance robustness. The abstract nature of language enables rapid adaptation to diverse VAD scenarios, ensuring flexibility and broad applicability. ECCV 2024 Paper Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models About the Speaker Yuchen Yang is a a Ph.D. Candidate in the Department of Computer Science at Johns Hopkins University. Her research aims to deliver functional, trustworthy solutions for machine learning and AI systems. Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes. ECCV 2024 Paper Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models About the Speaker Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models. |
ECCV Redux: Day 4 - Nov 22
|
|
ECCV Redux: Day 4 - Nov 22
2024-11-22 · 17:00
Missed the European Conference on Computer Vision (ECCV) last month? Have no fear, we have collected some of the best research from the show into a series of online events. Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning Video Anomaly Detection (VAD) is critical for applications such as surveillance and autonomous driving. However, existing methods lack transparent reasoning, limiting public trust in real-world deployments. We introduce a rule-based reasoning framework that leverages Large Language Models (LLMs) to induce detection rules from few-shot normal samples and apply them to identify anomalies, incorporating strategies such as rule aggregation and perception smoothing to enhance robustness. The abstract nature of language enables rapid adaptation to diverse VAD scenarios, ensuring flexibility and broad applicability. ECCV 2024 Paper Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models About the Speaker Yuchen Yang is a a Ph.D. Candidate in the Department of Computer Science at Johns Hopkins University. Her research aims to deliver functional, trustworthy solutions for machine learning and AI systems. Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes. ECCV 2024 Paper Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models About the Speaker Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models. |
ECCV Redux: Day 4 - Nov 22
|
|
ECCV Redux: Day 4 - Nov 22
2024-11-22 · 17:00
Missed the European Conference on Computer Vision (ECCV) last month? Have no fear, we have collected some of the best research from the show into a series of online events. Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning Video Anomaly Detection (VAD) is critical for applications such as surveillance and autonomous driving. However, existing methods lack transparent reasoning, limiting public trust in real-world deployments. We introduce a rule-based reasoning framework that leverages Large Language Models (LLMs) to induce detection rules from few-shot normal samples and apply them to identify anomalies, incorporating strategies such as rule aggregation and perception smoothing to enhance robustness. The abstract nature of language enables rapid adaptation to diverse VAD scenarios, ensuring flexibility and broad applicability. ECCV 2024 Paper Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models About the Speaker Yuchen Yang is a a Ph.D. Candidate in the Department of Computer Science at Johns Hopkins University. Her research aims to deliver functional, trustworthy solutions for machine learning and AI systems. Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes. ECCV 2024 Paper Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models About the Speaker Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models. |
ECCV Redux: Day 4 - Nov 22
|
|
ECCV Redux: Day 4 - Nov 22
2024-11-22 · 17:00
Missed the European Conference on Computer Vision (ECCV) last month? Have no fear, we have collected some of the best research from the show into a series of online events. Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning Video Anomaly Detection (VAD) is critical for applications such as surveillance and autonomous driving. However, existing methods lack transparent reasoning, limiting public trust in real-world deployments. We introduce a rule-based reasoning framework that leverages Large Language Models (LLMs) to induce detection rules from few-shot normal samples and apply them to identify anomalies, incorporating strategies such as rule aggregation and perception smoothing to enhance robustness. The abstract nature of language enables rapid adaptation to diverse VAD scenarios, ensuring flexibility and broad applicability. ECCV 2024 Paper Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models About the Speaker Yuchen Yang is a a Ph.D. Candidate in the Department of Computer Science at Johns Hopkins University. Her research aims to deliver functional, trustworthy solutions for machine learning and AI systems. Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes. ECCV 2024 Paper Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models About the Speaker Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models. |
ECCV Redux: Day 4 - Nov 22
|