talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (2 results)

Showing 16 results

Activities & events

Title & Speakers Event

Retrieval-Augmented Generation (RAG) is a powerful technique for building chatbots that answer questions using your own content, but it can struggle with context and accuracy on complex sites. In this session, we demonstrate how integrating GraphRAG into the open-source AllyCat chatbot leads to more accurate and context-aware responses by leveraging graph-based retrieval.

With the addition of GraphRAG, this chat delivers more consistent and reliable answers, reducing hallucinations and improving the quality of chatbot interactions—key requirements for effective customer support, documentation, and user engagement.

This talk will include a live demo and code walkthrough, showing how developers can quickly integrate this with their own applications and adapt it to meet specific needs. By using AllyCat, developers can provide instant, accurate support to users and continuously improve their services.

About the speaker Nyah Macklin is a Senior Developer Advocate at Neo4j, specializing in GraphRAG, knowledge graphs, and AI-driven developer tooling. An internationally recognized speaker, content creator, and advocate for ethical AI governance, Nyah has built high-impact technical communities and led initiatives that advance a critical understanding of AI and its use cases. They are also the Founder & CTO of Afros in AI, a technical community dedicated to showcasing the multifaceted nature of artificial intelligence. Beyond Nyah's technical expertise, Nyah has a background in government leadership and technology policy, having served as Chief of Staff in the U.S. state government, where they helped shape tech-driven legislative initiatives and equity-driven legislation. When not immersed in their work, Nyah cares about empowering, teaching, and tutoring engineers, live-streaming technical deep dives, and building open-source tools that make software more accessible, explainable, and community-driven.

About the AI Alliance The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

Join the community Sign up for the AI Alliance newsletter (check the website footer) and join our new AI Alliance Discord.

[AI Alliance] Improving Chatbot Reliability Through Graph-Based Retrieval

Retrieval-Augmented Generation (RAG) is a powerful technique for building chatbots that answer questions using your own content, but it can struggle with context and accuracy on complex sites. In this session, we demonstrate how integrating GraphRAG into the open-source AllyCat chatbot leads to more accurate and context-aware responses by leveraging graph-based retrieval.

With the addition of GraphRAG, this chat delivers more consistent and reliable answers, reducing hallucinations and improving the quality of chatbot interactions—key requirements for effective customer support, documentation, and user engagement.

This talk will include a live demo and code walkthrough, showing how developers can quickly integrate this with their own applications and adapt it to meet specific needs. By using AllyCat, developers can provide instant, accurate support to users and continuously improve their services.

About the speaker Nyah Macklin is a Senior Developer Advocate at Neo4j, specializing in GraphRAG, knowledge graphs, and AI-driven developer tooling. An internationally recognized speaker, content creator, and advocate for ethical AI governance, Nyah has built high-impact technical communities and led initiatives that advance a critical understanding of AI and its use cases. They are also the Founder & CTO of Afros in AI, a technical community dedicated to showcasing the multifaceted nature of artificial intelligence. Beyond Nyah's technical expertise, Nyah has a background in government leadership and technology policy, having served as Chief of Staff in the U.S. state government, where they helped shape tech-driven legislative initiatives and equity-driven legislation. When not immersed in their work, Nyah cares about empowering, teaching, and tutoring engineers, live-streaming technical deep dives, and building open-source tools that make software more accessible, explainable, and community-driven.

About the AI Alliance The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

Join the community Sign up for the AI Alliance newsletter (check the website footer) and join our new AI Alliance Discord.

[AI Alliance] Improving Chatbot Reliability Through Graph-Based Retrieval

Retrieval-Augmented Generation (RAG) is a powerful technique for building chatbots that answer questions using your own content, but it can struggle with context and accuracy on complex sites. In this session, we demonstrate how integrating GraphRAG into the open-source AllyCat chatbot leads to more accurate and context-aware responses by leveraging graph-based retrieval.

With the addition of GraphRAG, this chat delivers more consistent and reliable answers, reducing hallucinations and improving the quality of chatbot interactions—key requirements for effective customer support, documentation, and user engagement.

This talk will include a live demo and code walkthrough, showing how developers can quickly integrate this with their own applications and adapt it to meet specific needs. By using AllyCat, developers can provide instant, accurate support to users and continuously improve their services.

About the speaker Nyah Macklin is a Senior Developer Advocate at Neo4j, specializing in GraphRAG, knowledge graphs, and AI-driven developer tooling. An internationally recognized speaker, content creator, and advocate for ethical AI governance, Nyah has built high-impact technical communities and led initiatives that advance a critical understanding of AI and its use cases. They are also the Founder & CTO of Afros in AI, a technical community dedicated to showcasing the multifaceted nature of artificial intelligence. Beyond Nyah's technical expertise, Nyah has a background in government leadership and technology policy, having served as Chief of Staff in the U.S. state government, where they helped shape tech-driven legislative initiatives and equity-driven legislation. When not immersed in their work, Nyah cares about empowering, teaching, and tutoring engineers, live-streaming technical deep dives, and building open-source tools that make software more accessible, explainable, and community-driven.

About the AI Alliance The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

Join the community Sign up for the AI Alliance newsletter (check the website footer) and join our new AI Alliance Discord.

[AI Alliance] Improving Chatbot Reliability Through Graph-Based Retrieval

Retrieval-Augmented Generation (RAG) is a powerful technique for building chatbots that answer questions using your own content, but it can struggle with context and accuracy on complex sites. In this session, we demonstrate how integrating GraphRAG into the open-source AllyCat chatbot leads to more accurate and context-aware responses by leveraging graph-based retrieval.

With the addition of GraphRAG, this chat delivers more consistent and reliable answers, reducing hallucinations and improving the quality of chatbot interactions—key requirements for effective customer support, documentation, and user engagement.

This talk will include a live demo and code walkthrough, showing how developers can quickly integrate this with their own applications and adapt it to meet specific needs. By using AllyCat, developers can provide instant, accurate support to users and continuously improve their services.

About the speaker Nyah Macklin is a Senior Developer Advocate at Neo4j, specializing in GraphRAG, knowledge graphs, and AI-driven developer tooling. An internationally recognized speaker, content creator, and advocate for ethical AI governance, Nyah has built high-impact technical communities and led initiatives that advance a critical understanding of AI and its use cases. They are also the Founder & CTO of Afros in AI, a technical community dedicated to showcasing the multifaceted nature of artificial intelligence. Beyond Nyah's technical expertise, Nyah has a background in government leadership and technology policy, having served as Chief of Staff in the U.S. state government, where they helped shape tech-driven legislative initiatives and equity-driven legislation. When not immersed in their work, Nyah cares about empowering, teaching, and tutoring engineers, live-streaming technical deep dives, and building open-source tools that make software more accessible, explainable, and community-driven.

About the AI Alliance The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

Join the community Sign up for the AI Alliance newsletter (check the website footer) and join our new AI Alliance Discord.

[AI Alliance] Improving Chatbot Reliability Through Graph-Based Retrieval

Retrieval-Augmented Generation (RAG) is a powerful technique for building chatbots that answer questions using your own content, but it can struggle with context and accuracy on complex sites. In this session, we demonstrate how integrating GraphRAG into the open-source AllyCat chatbot leads to more accurate and context-aware responses by leveraging graph-based retrieval.

With the addition of GraphRAG, this chat delivers more consistent and reliable answers, reducing hallucinations and improving the quality of chatbot interactions—key requirements for effective customer support, documentation, and user engagement.

This talk will include a live demo and code walkthrough, showing how developers can quickly integrate this with their own applications and adapt it to meet specific needs. By using AllyCat, developers can provide instant, accurate support to users and continuously improve their services.

About the speaker Nyah Macklin is a Senior Developer Advocate at Neo4j, specializing in GraphRAG, knowledge graphs, and AI-driven developer tooling. An internationally recognized speaker, content creator, and advocate for ethical AI governance, Nyah has built high-impact technical communities and led initiatives that advance a critical understanding of AI and its use cases. They are also the Founder & CTO of Afros in AI, a technical community dedicated to showcasing the multifaceted nature of artificial intelligence. Beyond Nyah's technical expertise, Nyah has a background in government leadership and technology policy, having served as Chief of Staff in the U.S. state government, where they helped shape tech-driven legislative initiatives and equity-driven legislation. When not immersed in their work, Nyah cares about empowering, teaching, and tutoring engineers, live-streaming technical deep dives, and building open-source tools that make software more accessible, explainable, and community-driven.

About the AI Alliance The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

Join the community Sign up for the AI Alliance newsletter (check the website footer) and join our new AI Alliance Discord.

[AI Alliance] Improving Chatbot Reliability Through Graph-Based Retrieval

Retrieval-Augmented Generation (RAG) is a powerful technique for building chatbots that answer questions using your own content, but it can struggle with context and accuracy on complex sites. In this session, we demonstrate how integrating GraphRAG into the open-source AllyCat chatbot leads to more accurate and context-aware responses by leveraging graph-based retrieval.

With the addition of GraphRAG, this chat delivers more consistent and reliable answers, reducing hallucinations and improving the quality of chatbot interactions—key requirements for effective customer support, documentation, and user engagement.

This talk will include a live demo and code walkthrough, showing how developers can quickly integrate this with their own applications and adapt it to meet specific needs. By using AllyCat, developers can provide instant, accurate support to users and continuously improve their services.

About the speaker Nyah Macklin is a Senior Developer Advocate at Neo4j, specializing in GraphRAG, knowledge graphs, and AI-driven developer tooling. An internationally recognized speaker, content creator, and advocate for ethical AI governance, Nyah has built high-impact technical communities and led initiatives that advance a critical understanding of AI and its use cases. They are also the Founder & CTO of Afros in AI, a technical community dedicated to showcasing the multifaceted nature of artificial intelligence. Beyond Nyah's technical expertise, Nyah has a background in government leadership and technology policy, having served as Chief of Staff in the U.S. state government, where they helped shape tech-driven legislative initiatives and equity-driven legislation. When not immersed in their work, Nyah cares about empowering, teaching, and tutoring engineers, live-streaming technical deep dives, and building open-source tools that make software more accessible, explainable, and community-driven.

About the AI Alliance The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

Join the community Sign up for the AI Alliance newsletter (check the website footer) and join our new AI Alliance Discord.

[AI Alliance] Improving Chatbot Reliability Through Graph-Based Retrieval

Retrieval-Augmented Generation (RAG) is a powerful technique for building chatbots that answer questions using your own content, but it can struggle with context and accuracy on complex sites. In this session, we demonstrate how integrating GraphRAG into the open-source AllyCat chatbot leads to more accurate and context-aware responses by leveraging graph-based retrieval.

With the addition of GraphRAG, this chat delivers more consistent and reliable answers, reducing hallucinations and improving the quality of chatbot interactions—key requirements for effective customer support, documentation, and user engagement.

This talk will include a live demo and code walkthrough, showing how developers can quickly integrate this with their own applications and adapt it to meet specific needs. By using AllyCat, developers can provide instant, accurate support to users and continuously improve their services.

About the speaker Nyah Macklin is a Senior Developer Advocate at Neo4j, specializing in GraphRAG, knowledge graphs, and AI-driven developer tooling. An internationally recognized speaker, content creator, and advocate for ethical AI governance, Nyah has built high-impact technical communities and led initiatives that advance a critical understanding of AI and its use cases. They are also the Founder & CTO of Afros in AI, a technical community dedicated to showcasing the multifaceted nature of artificial intelligence. Beyond Nyah's technical expertise, Nyah has a background in government leadership and technology policy, having served as Chief of Staff in the U.S. state government, where they helped shape tech-driven legislative initiatives and equity-driven legislation. When not immersed in their work, Nyah cares about empowering, teaching, and tutoring engineers, live-streaming technical deep dives, and building open-source tools that make software more accessible, explainable, and community-driven.

About the AI Alliance The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.

Join the community Sign up for the AI Alliance newsletter (check the website footer) and join our new AI Alliance Discord.

[AI Alliance] Improving Chatbot Reliability Through Graph-Based Retrieval

This is a virtual event.

Register for the Zoom

Towards a Multimodal AI Agent that Can See, Talk and Act

The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions.

First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system.

Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds.

About the Speaker

Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments.

ConceptAttention: Interpreting the Representations of Diffusion Transformers

Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment.

We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models!

Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing.

About the Speaker

Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts.

RelationField: Relate Anything in Radiance Fields

Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance.

About the Speaker

Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment

RGB-X Model Development: Exploring Four Channel ML Workflows

Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats!

About the Speaker

Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience.

April 24, 2025 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Towards a Multimodal AI Agent that Can See, Talk and Act

The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions.

First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system.

Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds.

About the Speaker

Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments.

ConceptAttention: Interpreting the Representations of Diffusion Transformers

Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment.

We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models!

Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing.

About the Speaker

Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts.

RelationField: Relate Anything in Radiance Fields

Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance.

About the Speaker

Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment

RGB-X Model Development: Exploring Four Channel ML Workflows

Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats!

About the Speaker

Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience.

April 24, 2025 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Towards a Multimodal AI Agent that Can See, Talk and Act

The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions.

First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system.

Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds.

About the Speaker

Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments.

ConceptAttention: Interpreting the Representations of Diffusion Transformers

Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment.

We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models!

Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing.

About the Speaker

Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts.

RelationField: Relate Anything in Radiance Fields

Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance.

About the Speaker

Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment.

RGB-X Model Development: Exploring Four Channel ML Workflows

Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats!

About the Speaker

Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience.

April 24, 2025 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Towards a Multimodal AI Agent that Can See, Talk and Act

The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions.

First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system.

Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds.

About the Speaker

Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments.

ConceptAttention: Interpreting the Representations of Diffusion Transformers

Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment.

We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models!

Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing.

About the Speaker

Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts.

RelationField: Relate Anything in Radiance Fields

Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance.

About the Speaker

Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment

RGB-X Model Development: Exploring Four Channel ML Workflows

Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats!

About the Speaker

Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience.

April 24, 2025 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Towards a Multimodal AI Agent that Can See, Talk and Act

The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions.

First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system.

Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds.

About the Speaker

Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments.

ConceptAttention: Interpreting the Representations of Diffusion Transformers

Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment.

We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models!

Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing.

About the Speaker

Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts.

RelationField: Relate Anything in Radiance Fields

Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance.

About the Speaker

Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment

RGB-X Model Development: Exploring Four Channel ML Workflows

Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats!

About the Speaker

Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience.

April 24, 2025 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Towards a Multimodal AI Agent that Can See, Talk and Act

The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions.

First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system.

Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds.

About the Speaker

Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments.

ConceptAttention: Interpreting the Representations of Diffusion Transformers

Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment.

We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models!

Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing.

About the Speaker

Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts.

RelationField: Relate Anything in Radiance Fields

Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance.

About the Speaker

Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment

RGB-X Model Development: Exploring Four Channel ML Workflows

Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats!

About the Speaker

Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience.

April 24, 2025 - AI, Machine Learning and Computer Vision Meetup

This is a virtual event.

Register for the Zoom

Towards a Multimodal AI Agent that Can See, Talk and Act

The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions.

First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system.

Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds.

About the Speaker

Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments.

ConceptAttention: Interpreting the Representations of Diffusion Transformers

Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment.

We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models!

Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing.

About the Speaker

Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts.

RelationField: Relate Anything in Radiance Fields

Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance.

About the Speaker

Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment.

RGB-X Model Development: Exploring Four Channel ML Workflows

Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats!

About the Speaker

Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience.

April 24, 2025 - AI, Machine Learning and Computer Vision Meetup
Erum Afzal – guest @ Omdena / Omdena Academy

We talked about:

Erum's Background Omdena Academy and Erum’s Role There Omdena’s Community and Projects Course Development and Structure at Omdena Academy Student and Instructor Engagement Engagement and Motivation The Role of Teaching in Community Building The Importance of Communities for Career Building Advice for Aspiring Instructors and Freelancers DS and ML Talent Market Saturation Resources for Learning AI and Community Building Erum’s Resource Recommendations

Links:

LinkedIn: https://www.linkedin.com/in/erum-afzal-64827b24/

Twitter:  https://twitter.com/Erum55449739

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

AI/ML Data Engineering GitHub HTML
DataTalks.Club

TBA - Erum Afzal

About the event

Outline:

  • Teaching at and founding a school
  • Content creation and target audience
  • Community building and management

About the speaker:

Erum is an enthusiastic speaker, mentor, and lead ML Engineer with a passion for Data Science and Machine Learning. She holds an MS in Information Technology from NUST, Islamabad, Pakistan. Currently, she is a researcher at Justus Liebig University, Germany, pursuing her PhD in AI solutions for teacher training. Erum is also associated with various international bodies in the field of Data Science and Machine Learning. She serves as a Teaching Expert at Women in AI Academy (Germany), where she instructs courses on Data Science and Machine Learning and leads Omdena Academy at Omdena, contributing to numerous projects and receiving accolades as part of the AI wonder girls team.

Previously, Erum taught a deep learning course at Eskewlab Philippines in collaboration with Omdena. She led the WWCode Data Science track, organizing boot camps and workshops in Data Science. Furthermore, Erum served as a Trainer at AIDA Lab, Prince Sultan University, Kingdom of Saudi Arabia, where she was a master trainer of AI courses and conducted research.

​​DataTalks.Club is the place to talk about data. Join our slack community!

Community Building and Teaching in AI & Tech
Showing 16 results