Title & Speakers	Event
Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00 Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Time and Location Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom! Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes. About the Author Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties. Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery. About the Speaker Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring. GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions. About the Speaker Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/ HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction. About the Speaker Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.	Jan 15 - Best of NeurIPS (Day 2)
Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00 Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Time and Location Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom! Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes. About the Author Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties. Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery. About the Speaker Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring. GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions. About the Speaker Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/ HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction. About the Speaker Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.	Jan 15 - Best of NeurIPS (Day 2)
Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00 Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Time and Location Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom! Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes. About the Author Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties. Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery. About the Speaker Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring. GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions. About the Speaker Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/ HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction. About the Speaker Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.	Jan 15 - Best of NeurIPS (Day 2)
Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00 Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Time and Location Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom! Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes. About the Author Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties. Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery. About the Speaker Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring. GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions. About the Speaker Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/ HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction. About the Speaker Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.	Jan 15 - Best of NeurIPS (Day 2)
Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00 Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Time and Location Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom! Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes. About the Author Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties. Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery. About the Speaker Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring. GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions. About the Speaker Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/ HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction. About the Speaker Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.	Jan 15 - Best of NeurIPS (Day 2)
Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00 Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Time and Location Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom! Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes. About the Author Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties. Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery. About the Speaker Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring. GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions. About the Speaker Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/ HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction. About the Speaker Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.	Jan 15 - Best of NeurIPS (Day 2)
Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00 Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Time and Location Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom! Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes. About the Author Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties. Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery. About the Speaker Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring. GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions. About the Speaker Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/ HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction. About the Speaker Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.	Jan 15 - Best of NeurIPS (Day 2)
Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00 Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. Time and Location Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom! Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes. About the Author Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties. Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery. About the Speaker Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring. GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively. We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions. About the Speaker Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/ HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction. About the Speaker Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.	Jan 15 - Best of NeurIPS (Day 2)
Building LLM applications with Python 2026-01-05 · 18:00 Overview Students, developers, and anyone interested in getting started with theory and practice on building LLM-based applications with Python. Who is this for? Undeniably, large language models (LLMs) are at the centre of a modern gold-rush in technology. Students, developers, and anyone interested in getting started with theory and practice on building LLM-based applications with Python. Who is leading the session? The session is led by Dr. Stelios Sotiriadis, CEO of Warestack, Associate Professor and MSc Programme Director at Birkbeck, University of London. His expertise includes cloud computing, distributed systems, and AI engineering. Stelios holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked with Huawei, IBM, Autodesk, and several startups. Since 2018 he has taught at Birkbeck and, in 2021, founded Warestack, building software for startups globally. What we’ll cover A practical introduction on the basics of local models and cloud APIs to build real software systems. You will learn: Introduction to natural language processing LLMs theory and intuition Agents are and how to build them Running local models with Ollama (free and offline) Calling local models using Python Building a ChatGPT-like chatbot with Python libraries Requirements A laptop with Python (Windows, macOS, or Linux) Visual Studio Code installed Python pip installed At least 10 GB free disk space At least 8 GB RAM This space is needed for running local models. You may also use the lab computers if your device doesn’t meet the requirements. Format A 1.5-hours live session including: Interactive theory Hands-on coding Step-by-step exercises The session will run in person, with streaming available for remote attendees. Prerequisites You should be comfortable writing Python scripts (basic to intermediate level).	Building LLM applications with Python
Hands-On LLM Engineering with Python (Part 1) 2025-12-18 · 18:00 REGISTER BELOW FOR MORE AVAILABLE DATES! ↓↓↓↓↓ https://luma.com/stelios ----------------------------------------------------------------------------------- Who is this for? Students, developers, and anyone interested in using Large Language Models (LLMs) to build real software solutions with Python. Tired of vibe coding with AI tools? Want to actually understand and own your code, instead of relying on black-box magic? This session shows you how to build LLM systems properly, with full control and clear engineering principles. Who is leading the session? The session is led by Dr. Stelios Sotiriadis, CEO of Warestack, Associate Professor and MSc Programme Director at Birkbeck, University of London, specialising in cloud computing, distributed systems, and AI engineering. Stelios holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked on industry and research projects with Huawei, IBM, Autodesk, and multiple startups. Since moving to London in 2018, he has been teaching at Birkbeck. In 2021, he founded Warestack, building software for startups around the world. What we’ll cover? A hands-on introduction to building software with LLMs using Python, Ollama, and LiteLLM, including: How LLMs, embeddings, and agents work. Calling local models with Ollama or cloud models (OpenAI, Gemini and more). Using LiteLLM for custom prompts and tool-calling. Building simple agents from scratch. Introduction to RAG (Retrieval-Augmented Generation). Working with vector databases (ChromaDB) and vector similarity search library (FAISS). Storing, searching, and retrieving embeddings. Introduction to Streamlit for interactive data apps. End-to-end examples you can run on your own machine. This session focuses on theory, fundamentals and real code you can re-use. Why LiteLLM? LiteLLM gives you low-level control to build custom LLM solutions your own way, without a heavy framework like LangChain, so you understand how everything works and design your own architecture. A dedicated LangChain session will follow for those who want to go further. What are the requirements? Bring a laptop with Python installed (Windows, macOS, or Linux), along with Visual Studio Code or a similar IDE, with at least 10GB of free disk space and 8GB of RAM*. This space is needed for running local models during the workshop.* If you don’t have a suitable laptop, please contact Stelios ([email protected]) before registering. What is the format? A 3-hour live session with: Interactive theory blocks Hands-on coding Step-by-step exercises Small group support Three 10-minute breaks Q&A and class quizzes This is a highly practical, hands-on class focused on code and building working LLM systems. What are the prerequisites? A good understanding of programming with Python is required (basic to intermediate level). I assume you are already comfortable writing Python scripts. What comes after? Participants will receive an optional mini capstone project with one-to-one personalised feedback. Is it just one session? This is the first session in a new sequence on applied AI, covering agents, RAG systems, vector databases, and production-ready LLM workflows. Later sessions will dive deeper into topics such as embeddings with deep neural networks, LangChain, advanced retrieval, and multi-agent architectures. You can decide afterwards whether you’d like to join future sessions. How many participants? To keep this interactive, only 15 spots are available. Please register as soon as possible.	Hands-On LLM Engineering with Python (Part 1)
Use MCP Toolbox and Gemini CLI to Build LookML 2025-12-12 · 18:15 Mike DeAngelo – Developer Relations Engineer @ Google (Looker group) The rise of agentic AI systems presents a new paradigm for business intelligence, shifting from human-driven dashboards to AI-driven, conversational data exploration. However, a critical challenge remains: how can we securely and reliably connect Large Language Models (LLMs) to the curated, governed data locked within enterprise BI platforms? This session introduces the Looker MCP Server, a new service designed to bridge this exact gap. Built upon the robust foundation of the MCP Toolbox for Databases, the Looker MCP Server exposes Looker's powerful semantic model as a stable, easy-to-use set of tools for AI agents. By providing a secure and scalable API, it empowers developers to build a new class of "Agentic BI" applications that can independently query data, analyze trends, and deliver insights, all while inheriting Looker's existing governance and data permissions. This talk is for AI developers, data platform owners, and BI practitioners who are looking to leverage their existing investment in Looker to build the next generation of data-driven AI applications. Attendees will leave with a clear understanding of how to use the Looker MCP Server to safely unlock their enterprise data for agentic AI systems. mcp toolbox for databases looker mcp server gemini cli lookml agentic bi	NYC - Agentic AI Meetup with Google Cloud
Event DSC DACH 25 2025-12-10
Building Responsible AI Agents in Decentralized Open-Source Environments \| K.Dasgupta \| DSC DACH 25 2025-12-10 · 15:28 In his tech tutorial, Krishnendu guided participants through building a complete responsible AI pipeline using open-source large language models like LLaMA, Gemma, and GPT-OSS. He demonstrated how to create and orchestrate AI agents using frameworks such as Model Context Protocol (MCP), Agent-to-Agent (A2A), AutoGen, and Google ADK, while ensuring privacy and transparency. The session included live coding with tools like vLLM, Hugging Face Transformers, and Presidio, covering use case identification, agent building, orchestration, and privacy safeguards. By the end, attendees gained practical starter code, a minimal end-to-end implementation pipeline, and guidelines for developing decentralized, transparent, and privacy-preserving AI agents. This tutorial by Krishnendu Dasgupta was held on October 14th at DSC DACH 25 in Vienna. Follow us on social media : LinkedIn: https://www.linkedin.com/company/11184830/admin/ Instagram: https://www.instagram.com/datasciconf/ Facebook page: https://www.facebook.com/DataSciConference Website: https://datasciconference.com/	YouTube
A Practical Starter's Guide to building LLM based projects \| Marcin S. \| DSC DACH 25 2025-12-10 · 15:28 In his tech tutorial, Marcin showed how to go beyond creating prompts for ChatGPT and build full applications leveraging generative AI. He covered the fundamentals of large language models (LLMs), introduced LangChain, and demonstrated techniques like question answering over documents and creating reasoning agents. The session also addressed advanced methods and practical challenges of deploying LLMs in production. By the end, participants with Python experience gained hands-on knowledge to develop GPT-driven applications while understanding potential pitfalls and limitations. This tutorial by Marcin Szymaniuk was held on October 14th at DSC DACH 25 in Vienna. Follow us on social media : LinkedIn: https://www.linkedin.com/company/11184830/admin/ Instagram: https://www.instagram.com/datasciconf/ Facebook page: https://www.facebook.com/DataSciConference Website: https://datasciconference.com/	YouTube

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management? 2025-12-09 · 18:45 Advances in large language models (LLMs) have propelled a recent flurry of AI tools for data management and operations. For example, AI-powered code assistants leverage LLMs to generate code for dataflow pipelines. RAG pipelines enable LLMs to ground responses with relevant information from external data sources. Data agents leverage LLMs to turn natural language questions into data-driven answers and actions. While challenges remain, these advances are opening exciting new opportunities for data scientists and engineers. In this talk, we will examine recent advances, along with some still incubating in research labs, with the goal of understanding where this is all heading, and present our perspective on what’s next for AI in data management and data operations. AI/ML Data Management Dataflow LLM RAG	PyData Boston 2025 Video
GenerationAI Conference for Agentic Enterprise (Free Tickets) 2025-12-09 · 08:00 This is a paid conference, the organizer is offering our members FREE tickets. Be sure to register at AICamp website to claim your free ticket. Welcome to the Agentic Enterprise: GenAI, APIs, MCP, Agents GenerationAI 2025 is the premier conference dedicated to the forefront of GenAI APIs and Large Language Models (LLM) APIs. This event is your gateway to explore the latest advancements, innovations, and applications in the world of AI, featuring insights from leading experts and visionaries. Why attending: Keynote speakers: Hear from top representatives of major GenAI and LLM API providers such as Google, Microsoft, Nvidia, OpenAI, Mistral, and Anthropic. Discover their latest developments and strategic visions shaping the future of AI. Industry leaders: Gain invaluable perspectives from GenAI leaders at renowned companies like BNP Paribas, Renault, Malt, VLC, Oympe who will share their experiences and the impact of AI within their organizations. Innovative tooling companies: Engage with pioneering researchers and entrepreneurs from cutting-edge GenAI tooling companies like ZML, Pleias, and Alinia.ai. Learn about the newest tools and technologies driving the AI revolution. Join us at GenerationAI 2025 to connect with industry leaders, innovators, and enthusiasts who are shaping the future of AI. Whether you’re an AI professional, researcher, developer, or simply passionate about the possibilities of AI, this conference is the perfect opportunity to expand your knowledge and network with the best in the field. Conference details and Register at AICamp website to claim your free ticket.	GenerationAI Conference for Agentic Enterprise (Free Tickets)
GenerationAI Conference for Agentic Enterprise (Free Tickets) 2025-12-09 · 08:00 This is a paid conference, the organizer is offering our members FREE tickets. Be sure to register at AICamp website to claim your free ticket. Welcome to the Agentic Enterprise: GenAI, APIs, MCP, Agents GenerationAI 2025 is the premier conference dedicated to the forefront of GenAI APIs and Large Language Models (LLM) APIs. This event is your gateway to explore the latest advancements, innovations, and applications in the world of AI, featuring insights from leading experts and visionaries. Why attending: Keynote speakers: Hear from top representatives of major GenAI and LLM API providers such as Google, Microsoft, Nvidia, OpenAI, Mistral, and Anthropic. Discover their latest developments and strategic visions shaping the future of AI. Industry leaders: Gain invaluable perspectives from GenAI leaders at renowned companies like BNP Paribas, Renault, Malt, VLC, Oympe who will share their experiences and the impact of AI within their organizations. Innovative tooling companies: Engage with pioneering researchers and entrepreneurs from cutting-edge GenAI tooling companies like ZML, Pleias, and Alinia.ai. Learn about the newest tools and technologies driving the AI revolution. Join us at GenerationAI 2025 to connect with industry leaders, innovators, and enthusiasts who are shaping the future of AI. Whether you’re an AI professional, researcher, developer, or simply passionate about the possibilities of AI, this conference is the perfect opportunity to expand your knowledge and network with the best in the field. Conference details and Register at AICamp website to claim your free ticket.	GenerationAI Conference for Agentic Enterprise (Free Tickets)
AWS re:Invent 2025 - Accelerate analytics and AI w/ an open and secure lakehouse architecture-ANT309 2025-12-05 · 04:19 Data lakes, data warehouses, or both? Join this session to explore how to build a unified, open, and secure data lakehouse architecture, fully compatible with Apache Iceberg, in Amazon SageMaker. Learn how the lakehouse breaks down data silos and opens your data estate offering flexibility to use your preferred query engines and tools that accelerate time to insights. Learn about recent launches that improve data interoperability and performance, and enable large language models (LLMs) and AI agents to interact with your data. Discover robust security features, including consistent fine-grained access controls, attribute-based access control, and tag-based access control that help democratize data without compromises. Learn more: More AWS events: https://go.aws/3kss9CP Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4 ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster. AWSreInvent #AWSreInvent2025 #AWS Agile/Scrum AI/ML Analytics AWS Cloud Computing Data Lakehouse Iceberg LLM Amazon SageMaker Cyber Security	AWS re:Invent 2024 YouTube
AI Agents in 2025: Beyond Chatbots to Autonomous Workflows 2025-12-04 · 19:10 Stephen Toriola – Software & AI Engineer @ Compare the Market This talk explores AI agents as the next step beyond prompt‑by‑prompt assistants. Modern AI agents use large language models plus planning, tool‑calling, and memory to execute multi‑step workflows, not just answer isolated questions. The session explains, in accessible terms, what makes something an “agent” rather than a simple chatbot: the ability to decompose tasks, call APIs or tools, and maintain context over time. It then surveys real use cases, from automating repetitive knowledge‑work tasks to orchestrating complex enterprise workflows that blend human decisions with autonomous actions. For the technical audience, the talk briefly outlines typical agent architectures and how they integrate with RAG, vector search, and existing backend services. For everyone else, it focuses on capabilities, limitations, and where AI agents are realistically being used in 2025. Attendees will understand what AI agents can and cannot do today, and how they differ from the hype. AI/ML API RAG	#1 - London - Data & Agentic AI in Financial Services - Brainstation
Crafting with AI #17 2025-12-04 · 17:45 Nouvelle édition, même ADN : concret, zéro bullshit. Au menu : 2 talks qui collent au terrain. 1) Évaluation continue d’agents IA en production (Wakam) Déployer des agents IA en production c'est bien, mais comment s'assurer qu'ils ne régressent pas au fil du temps ? Chez Wakam, on a construit une plateforme complète d'évaluation qui combine : Génération automatique de datasets synthétiques via des agents IA spécialisés dans Dust Validation humaine par les experts métiers dans une interface dédiée via Retool Orchestration des évaluations via Prefect Monitoring continu dans Langfuse On vous montrera notre architecture et stack (Dust.tt), Retool, Prefect, Langfuse) et comment elle résout deux pain points majeurs : l'absence d'évaluation native dans les plateformes d'agents SaaS et la complexité de maintenir des datasets à jour. Démo incluse sur notre cas d'usage RH. Speakers: Wided Ahlem Touhami, Hamza Ben Marzouk, Ouarda Boumansour (Équipe AI Engineering, Wakam) 2) Du “large” au “small” : pourquoi les Small Language Models changent la donne Les LLMs sont impressionnants, mais coûteux et lourds à opérer et parfois disproportionnés par rapport aux besoins réels. Les Small Language Models proposent une alternative plus légère, adaptable dans de bonnes conditions, moins chère et beaucoup plus simple à déployer, tout en couvrant une large partie des cas d’usage opérationnels. Voyons ensemble comment et pourquoi “small” devient souvent le choix le plus pragmatique… et parfois le plus stratégique. Speaker: Olivier Bergeret (Head of Data) Lieu: Thiga, 23 rue Taitbout, 75009 Paris Accueil: 18:45 Talks: 19:00 puis Q&A Apéro: networking sur place Places limitées. Merci à Thiga pour l’accueil.	Crafting with AI #17
Let’s Talk Tech: How to Build and Govern Agentic Systems 2025-11-26 · 17:30 Join us on November 26 to dive into the best practices for building agentic AI safely and effectively with Guillaume Laforge, Developer Advocate at Google, and Maxime Appé, Manager, Product Management at Dataiku. 6:30 to 7:15 pm: Guillaume Laforge, Google AI Agents: The New Frontier for LLMs Know your way around large language models? Mastered retrieval-augmented generation to help an LLM search your documents? It’s time to take the next step with AI agents. In this session, you’ll learn what makes a system “agentic”, the limitations of LLMs, and how to build different types of agents in Java using LangChain4j and the Agent Development Kit (ADK). Expect practical examples of agent patterns that go beyond a simple LLM call to respond intelligently, take action, and adapt to user needs. Building retrieval-augmented generation (RAG) apps is just the start. Learn what comes next with Agents. 7:15 to 8 pm: Maxime Appé, Dataiku Navigating the AI Safety Landscape: Learning from CeSIA’s Bootcamp AI is evolving fast, and safety research is racing to keep up. In this session, we’ll break down key insights from the AI Safety Bootcamp hosted by the Centre for AI Security. We’ll explore how to navigate AI’s rapid progress, unpacking what “intelligence” or “model agency” really mean, before turning to the risks and mitigation strategies that define the AI safety landscape today. 8 - 10PM: Meet & greet with drinks Dataiku will only use your personal information to provide the product or service you requested and contact you with related content that may interest you. You may unsubscribe from these communications at any time. For more information on unsubscribing and how we protect and respect your privacy, check out our Privacy Policy.	Let’s Talk Tech: How to Build and Govern Agentic Systems

Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00

Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Time and Location

Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom!

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

About the Author

Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

About the Speaker

Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

About the Speaker

Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild 

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

About the Speaker

Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

Jan 15 - Best of NeurIPS (Day 2)

Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00

Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Time and Location

Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom!

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

About the Author

Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

About the Speaker

Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

About the Speaker

Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild 

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

About the Speaker

Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

Jan 15 - Best of NeurIPS (Day 2)

Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00

Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Time and Location

Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom!

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

About the Author

Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

About the Speaker

Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

About the Speaker

Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild 

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

About the Speaker

Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

Jan 15 - Best of NeurIPS (Day 2)

Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00

Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Time and Location

Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom!

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

About the Author

Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

About the Speaker

Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

About the Speaker

Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild 

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

About the Speaker

Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

Jan 15 - Best of NeurIPS (Day 2)

Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00

Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Time and Location

Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom!

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

About the Author

Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

About the Speaker

Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

About the Speaker

Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild 

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

About the Speaker

Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

Jan 15 - Best of NeurIPS (Day 2)

Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00

Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Time and Location

Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom!

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

About the Author

Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

About the Speaker

Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

About the Speaker

Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild 

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

About the Speaker

Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

Jan 15 - Best of NeurIPS (Day 2)

Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00

Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Time and Location

Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom!

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

About the Author

Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

About the Speaker

Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

About the Speaker

Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild 

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

About the Speaker

Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

Jan 15 - Best of NeurIPS (Day 2)

Jan 15 - Best of NeurIPS (Day 2) 2026-01-15 · 17:00

Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Time and Location

Jan 15, 2026 9:00-11:00 AM Pacific Online. Register for the Zoom!

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

About the Author

Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

About the Speaker

Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

About the Speaker

Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild 

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

About the Speaker

Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

Jan 15 - Best of NeurIPS (Day 2)

Building LLM applications with Python 2026-01-05 · 18:00

Overview

Students, developers, and anyone interested in getting started with theory and practice on building LLM-based applications with Python.

Who is this for?

Undeniably, large language models (LLMs) are at the centre of a modern gold-rush in technology.

Students, developers, and anyone interested in getting started with theory and practice on building LLM-based applications with Python.

Who is leading the session?

The session is led by Dr. Stelios Sotiriadis, CEO of Warestack, Associate Professor and MSc Programme Director at Birkbeck, University of London. His expertise includes cloud computing, distributed systems, and AI engineering.

Stelios holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked with Huawei, IBM, Autodesk, and several startups. Since 2018 he has taught at Birkbeck and, in 2021, founded Warestack, building software for startups globally.

What we’ll cover

A practical introduction on the basics of local models and cloud APIs to build real software systems. You will learn:

Introduction to natural language processing
LLMs theory and intuition
Agents are and how to build them
Running local models with Ollama (free and offline)
Calling local models using Python
Building a ChatGPT-like chatbot with Python libraries

Requirements

A laptop with Python (Windows, macOS, or Linux)
Visual Studio Code installed
Python pip installed
At least 10 GB free disk space
At least 8 GB RAM

This space is needed for running local models.

You may also use the lab computers if your device doesn’t meet the requirements.

Format

A 1.5-hours live session including:

Interactive theory
Hands-on coding
Step-by-step exercises

The session will run in person, with streaming available for remote attendees.

Prerequisites You should be comfortable writing Python scripts (basic to intermediate level).

Building LLM applications with Python

Hands-On LLM Engineering with Python (Part 1) 2025-12-18 · 18:00

REGISTER BELOW FOR MORE AVAILABLE DATES! ↓↓↓↓↓ https://luma.com/stelios

-----------------------------------------------------------------------------------

Who is this for?

Students, developers, and anyone interested in using Large Language Models (LLMs) to build real software solutions with ** Python.

Tired of vibe coding with AI tools? Want to actually understand and own your code, instead of relying on black-box magic? This session shows you how to build LLM systems properly, with full control and clear engineering principles. Who is leading the session?

The session is led by Dr. Stelios Sotiriadis, CEO of Warestack, Associate Professor and MSc Programme Director at Birkbeck, University of London, specialising in cloud computing, distributed systems, and AI engineering.

Stelios holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked on industry and research projects with Huawei, IBM, Autodesk, and multiple startups. Since moving to London in 2018, he has been teaching at Birkbeck. In 2021, he founded Warestack, building software for startups around the world. What we’ll cover?

A hands-on introduction to building software with LLMs using Python, Ollama, and LiteLLM, including:

How LLMs, embeddings, and agents work.
Calling local models with Ollama or cloud models (OpenAI, Gemini and more).
Using LiteLLM for custom prompts and tool-calling.
Building simple agents from scratch.
Introduction to RAG (Retrieval-Augmented Generation).
Working with vector databases (ChromaDB) and vector similarity search library (FAISS).
Storing, searching, and retrieving embeddings.
Introduction to Streamlit for interactive data apps.
End-to-end examples you can run on your own machine.

This session focuses on theory, fundamentals and real code you can re-use.

Why LiteLLM?

LiteLLM gives you low-level control to build custom LLM solutions your own way, without a heavy framework like LangChain, so you understand how everything works and design your own architecture. A dedicated LangChain session will follow for those who want to go further.

What are the requirements?

Bring a laptop with Python installed (Windows, macOS, or Linux), along with Visual Studio Code or a similar IDE, with at least 10GB of free disk space and 8GB of RAM.

This space is needed for running local models during the workshop. If you don’t have a suitable laptop, please contact Stelios ([email protected]) before registering.

What is the format?

A 3-hour live session with:

Interactive theory blocks
Hands-on coding
Step-by-step exercises
Small group support
Three 10-minute breaks
Q&A and class quizzes

This is a highly practical, hands-on class focused on code and building working LLM systems.

What are the prerequisites?

A good understanding of programming with Python is required (basic to intermediate level). I assume you are already comfortable writing Python scripts.

What comes after?

Participants will receive an optional mini capstone project with one-to-one personalised feedback.

Is it just one session?

This is the first session in a new sequence on applied AI, covering agents, RAG systems, vector databases, and production-ready LLM workflows. Later sessions will dive deeper into topics such as embeddings with deep neural networks, LangChain, advanced retrieval, and multi-agent architectures.

You can decide afterwards whether you’d like to join future sessions.

How many participants?

To keep this interactive, only 15 spots are available. Please register as soon as possible.

Hands-On LLM Engineering with Python (Part 1)

Use MCP Toolbox and Gemini CLI to Build LookML 2025-12-12 · 18:15

Mike DeAngelo – Developer Relations Engineer @ Google (Looker group)

The rise of agentic AI systems presents a new paradigm for business intelligence, shifting from human-driven dashboards to AI-driven, conversational data exploration. However, a critical challenge remains: how can we securely and reliably connect Large Language Models (LLMs) to the curated, governed data locked within enterprise BI platforms?

This session introduces the Looker MCP Server, a new service designed to bridge this exact gap. Built upon the robust foundation of the MCP Toolbox for Databases, the Looker MCP Server exposes Looker's powerful semantic model as a stable, easy-to-use set of tools for AI agents. By providing a secure and scalable API, it empowers developers to build a new class of "Agentic BI" applications that can independently query data, analyze trends, and deliver insights, all while inheriting Looker's existing governance and data permissions.

This talk is for AI developers, data platform owners, and BI practitioners who are looking to leverage their existing investment in Looker to build the next generation of data-driven AI applications. Attendees will leave with a clear understanding of how to use the Looker MCP Server to safely unlock their enterprise data for agentic AI systems.

mcp toolbox for databases looker mcp server gemini cli lookml agentic bi

NYC - Agentic AI Meetup with Google Cloud

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management? 2025-12-09 · 18:45

Advances in large language models (LLMs) have propelled a recent flurry of AI tools for data management and operations. For example, AI-powered code assistants leverage LLMs to generate code for dataflow pipelines. RAG pipelines enable LLMs to ground responses with relevant information from external data sources. Data agents leverage LLMs to turn natural language questions into data-driven answers and actions. While challenges remain, these advances are opening exciting new opportunities for data scientists and engineers. In this talk, we will examine recent advances, along with some still incubating in research labs, with the goal of understanding where this is all heading, and present our perspective on what’s next for AI in data management and data operations.

AI/ML Data Management Dataflow LLM RAG

PyData Boston 2025

Video

GenerationAI Conference for Agentic Enterprise (Free Tickets) 2025-12-09 · 08:00

This is a paid conference, the organizer is offering our members FREE tickets. Be sure to register at AICamp website to claim your free ticket.

Welcome to the Agentic Enterprise: GenAI, APIs, MCP, Agents

GenerationAI 2025 is the premier conference dedicated to the forefront of GenAI APIs and Large Language Models (LLM) APIs. This event is your gateway to explore the latest advancements, innovations, and applications in the world of AI, featuring insights from leading experts and visionaries.

Why attending:

Keynote speakers: Hear from top representatives of major GenAI and LLM API providers such as Google, Microsoft, Nvidia, OpenAI, Mistral, and Anthropic. Discover their latest developments and strategic visions shaping the future of AI.
Industry leaders: Gain invaluable perspectives from GenAI leaders at renowned companies like BNP Paribas, Renault, Malt, VLC, Oympe who will share their experiences and the impact of AI within their organizations.
Innovative tooling companies: Engage with pioneering researchers and entrepreneurs from cutting-edge GenAI tooling companies like ZML, Pleias, and Alinia.ai. Learn about the newest tools and technologies driving the AI revolution.

Join us at GenerationAI 2025 to connect with industry leaders, innovators, and enthusiasts who are shaping the future of AI. Whether you’re an AI professional, researcher, developer, or simply passionate about the possibilities of AI, this conference is the perfect opportunity to expand your knowledge and network with the best in the field.

Conference details and Register at AICamp website to claim your free ticket.

GenerationAI Conference for Agentic Enterprise (Free Tickets)

GenerationAI Conference for Agentic Enterprise (Free Tickets) 2025-12-09 · 08:00

This is a paid conference, the organizer is offering our members FREE tickets. Be sure to register at AICamp website to claim your free ticket.

Welcome to the Agentic Enterprise: GenAI, APIs, MCP, Agents

GenerationAI 2025 is the premier conference dedicated to the forefront of GenAI APIs and Large Language Models (LLM) APIs. This event is your gateway to explore the latest advancements, innovations, and applications in the world of AI, featuring insights from leading experts and visionaries.

Why attending:

Keynote speakers: Hear from top representatives of major GenAI and LLM API providers such as Google, Microsoft, Nvidia, OpenAI, Mistral, and Anthropic. Discover their latest developments and strategic visions shaping the future of AI.
Industry leaders: Gain invaluable perspectives from GenAI leaders at renowned companies like BNP Paribas, Renault, Malt, VLC, Oympe who will share their experiences and the impact of AI within their organizations.
Innovative tooling companies: Engage with pioneering researchers and entrepreneurs from cutting-edge GenAI tooling companies like ZML, Pleias, and Alinia.ai. Learn about the newest tools and technologies driving the AI revolution.

Join us at GenerationAI 2025 to connect with industry leaders, innovators, and enthusiasts who are shaping the future of AI. Whether you’re an AI professional, researcher, developer, or simply passionate about the possibilities of AI, this conference is the perfect opportunity to expand your knowledge and network with the best in the field.

Conference details and Register at AICamp website to claim your free ticket.

GenerationAI Conference for Agentic Enterprise (Free Tickets)

AWS re:Invent 2025 - Accelerate analytics and AI w/ an open and secure lakehouse architecture-ANT309 2025-12-05 · 04:19

Data lakes, data warehouses, or both? Join this session to explore how to build a unified, open, and secure data lakehouse architecture, fully compatible with Apache Iceberg, in Amazon SageMaker. Learn how the lakehouse breaks down data silos and opens your data estate offering flexibility to use your preferred query engines and tools that accelerate time to insights. Learn about recent launches that improve data interoperability and performance, and enable large language models (LLMs) and AI agents to interact with your data. Discover robust security features, including consistent fine-grained access controls, attribute-based access control, and tag-based access control that help democratize data without compromises.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Agile/Scrum AI/ML Analytics AWS Cloud Computing Data Lakehouse Iceberg LLM Amazon SageMaker Cyber Security

AWS re:Invent 2024

YouTube

AI Agents in 2025: Beyond Chatbots to Autonomous Workflows 2025-12-04 · 19:10

Stephen Toriola – Software & AI Engineer @ Compare the Market

This talk explores AI agents as the next step beyond prompt‑by‑prompt assistants. Modern AI agents use large language models plus planning, tool‑calling, and memory to execute multi‑step workflows, not just answer isolated questions. The session explains, in accessible terms, what makes something an “agent” rather than a simple chatbot: the ability to decompose tasks, call APIs or tools, and maintain context over time. It then surveys real use cases, from automating repetitive knowledge‑work tasks to orchestrating complex enterprise workflows that blend human decisions with autonomous actions. For the technical audience, the talk briefly outlines typical agent architectures and how they integrate with RAG, vector search, and existing backend services. For everyone else, it focuses on capabilities, limitations, and where AI agents are realistically being used in 2025. Attendees will understand what AI agents can and cannot do today, and how they differ from the hype.

AI/ML API RAG

#1 - London - Data & Agentic AI in Financial Services - Brainstation

Crafting with AI #17 2025-12-04 · 17:45

Nouvelle édition, même ADN : concret, zéro bullshit. Au menu : 2 talks qui collent au terrain.

1) Évaluation continue d’agents IA en production (Wakam) Déployer des agents IA en production c'est bien, mais comment s'assurer qu'ils ne régressent pas au fil du temps ? Chez Wakam, on a construit une plateforme complète d'évaluation qui combine :

Génération automatique de datasets synthétiques via des agents IA spécialisés dans Dust
Validation humaine par les experts métiers dans une interface dédiée via Retool
Orchestration des évaluations via Prefect
Monitoring continu dans Langfuse

On vous montrera notre architecture et stack (Dust.tt), Retool, Prefect, Langfuse) et comment elle résout deux pain points majeurs : l'absence d'évaluation native dans les plateformes d'agents SaaS et la complexité de maintenir des datasets à jour. Démo incluse sur notre cas d'usage RH.

Speakers: Wided Ahlem Touhami, Hamza Ben Marzouk, Ouarda Boumansour (Équipe AI Engineering, Wakam)

2) Du “large” au “small” : pourquoi les Small Language Models changent la donne Les LLMs sont impressionnants, mais coûteux et lourds à opérer et parfois disproportionnés par rapport aux besoins réels. Les Small Language Models proposent une alternative plus légère, adaptable dans de bonnes conditions, moins chère et beaucoup plus simple à déployer, tout en couvrant une large partie des cas d’usage opérationnels. Voyons ensemble comment et pourquoi “small” devient souvent le choix le plus pragmatique… et parfois le plus stratégique.

Speaker: Olivier Bergeret (Head of Data)

Lieu: Thiga, 23 rue Taitbout, 75009 Paris Accueil: 18:45 Talks: 19:00 puis Q&A Apéro: networking sur place Places limitées. Merci à Thiga pour l’accueil.

Crafting with AI #17

Let’s Talk Tech: How to Build and Govern Agentic Systems 2025-11-26 · 17:30

Join us on November 26 to dive into the best practices for building agentic AI safely and effectively with Guillaume Laforge, Developer Advocate at Google, and Maxime Appé, Manager, Product Management at Dataiku.

6:30 to 7:15 pm: Guillaume Laforge, Google AI Agents: The New Frontier for LLMs

Know your way around large language models? Mastered retrieval-augmented generation to help an LLM search your documents? It’s time to take the next step with AI agents.

In this session, you’ll learn what makes a system “agentic”, the limitations of LLMs, and how to build different types of agents in Java using LangChain4j and the Agent Development Kit (ADK). Expect practical examples of agent patterns that go beyond a simple LLM call to respond intelligently, take action, and adapt to user needs. Building retrieval-augmented generation (RAG) apps is just the start. Learn what comes next with Agents.

7:15 to 8 pm: Maxime Appé, Dataiku Navigating the AI Safety Landscape: Learning from CeSIA’s Bootcamp

AI is evolving fast, and safety research is racing to keep up. In this session, we’ll break down key insights from the AI Safety Bootcamp hosted by the Centre for AI Security. We’ll explore how to navigate AI’s rapid progress, unpacking what “intelligence” or “model agency” really mean, before turning to the risks and mitigation strategies that define the AI safety landscape today.

8 - 10PM: Meet & greet with drinks

Dataiku will only use your personal information to provide the product or service you requested and contact you with related content that may interest you. You may unsubscribe from these communications at any time. For more information on unsubscribing and how we protect and respect your privacy, check out our Privacy Policy.

Let’s Talk Tech: How to Build and Govern Agentic Systems

talk-data.com

People (6 results)

Activities & events

AWSreInvent #AWSreInvent2025 #AWS