talk-data.com

People (147 results)

See all 147 →

Chen Zhu

author

17 activities

Zhamak Dehghani

creator of Data Mesh · Nextdata

12 activities

Wei-Dong Zhu

author

11 activities

Activities & events

Title & Speakers	Event
ODSC AI East 2026 \| The #1 AI Builders Conference 2026-04-28 · 13:00 THIS IS PAID EVENT. PRE-REGISTRATION IS REQUIRED. RSVP here - https://luma.com/fzypluc8 Use code - COMMUNITYEAST2026 - for an extra discount. Where the global AI community unites to build the future. Get ready to deepen your expertise, forge powerful connections, and stay at the absolute forefront of artificial intelligence. ODSC AI East is returning to Boston, MA at the Hynes Convention Center, April 28–30, for three days dedicated to practical, hands-on learning and community growth. And/OR you may join Virtual Conference. More details here - https://odsc.ai/east/ First Speakers Announcement Adam Tauman Kalai, Safety and Ethics Researcher, OpenAI Jonathan Frankle, Chief AI Scientist, Databricks Amy Hodler, Executive Director, GraphGeeks.org David Campbell, Head of AI Security, Scale AI Ben Armstrong\, PhD\, Executive Director \\| Co-Lead\, Work of the Future\, MIT Ari Morcos, PhD, CEO and Co-founder, Datology AI Sydney Runkle, Software Engineer, LangChain Noah Giansiracusa, PhD, Assoc. Prof. of Math and DS, Bentley University Karen Zhou, Member of Technical Staff, Anthropic Joseph Fuller\, Prof. \\| Co-head\, Future of Work Project\, Harvard Business School Denise Gosnell, Co-Founder & CEO, Data Driven Intuition Rajiv Shah, PhD, Chief Evangelist, Contextual AI Claire (Yunzhu) Zhao\, Director\, Group Lead \\| AI/ML\, AQDS\, Pfizer David Hoyle, PhD Research Data Science Specialist, dunnhumby Dr. Denis Garagić, Chief Technology Officer & Co-founder, Palladyne AI Jeremiah Lowin, Founder & CEO, Prefect Veronika Durgin, VP of Data Xiong Liu Director, Data Science and AI, Novartis Mohammad Soltanieh-ha, PhD, Clinical Assistant Professor, Boston University This is more than just a conference; it’s the essential event for data science practitioners, AI builders, technical leaders, and anyone looking to pivot into an AI-driven career. With 300+ hours of content from 250+ expert speakers, you'll gain job-ready skills and strategic insights you can implement immediately. Why You Need to Be Here: Learning and Value Technical Tracks: Data Engineering \\| Physical AI \\| AI for BioPharma & Health \\| LLMs\, GenAI & RAG \\| Agentic AI & Workflow Automation \\| Keynotes & Industry Leadership \\| Data Science & Machine Learning & MLOps \\| AI Engineering & AIOps Non-Technical Tracks: AI Strategy \\| AI Risk & Governance \\| Agentic AI for Enterprise \\| AI Products & Innovation \\| AI & Future of work \\| Executive Track \\| AI Founder Track At ODSC AI East, we prioritize tangible value and immersive learning, ensuring you walk away certified and skilled. Our expansive agenda is packed with cutting-edge workshops and deep-dive tutorials across the most in-demand domains: Hands-On Training: Master crucial skills in LLMs, Generative AI & RAG, AI Engineering & MLOps, Data Science, and Machine Learning with expert-led, hands-on sessions. Focused Tracks: Explore specialized content in Agentic AI & Workflow Automation, Data Engineering, and AI for Robotics. Learn from Engineers Who Ship: Our speakers are the industry's top practitioners and researchers, renowned for bringing real-world innovations and strategies to the stage. Beyond the Talks: Our Expanded Community Events The true value of ODSC AI East is the opportunity to connect and collaborate. Your pass unlocks a rich ecosystem of co-located events and unique networking opportunities designed for every career level: AI X Leadership Summit (Co-located): A curated forum for executives, business leaders, and technical managers to gain practical insights on shaping, governing, and scaling their organization’s AI strategy. AI Mini-Bootcamp: Kickstart your journey or reinforce fundamentals with our intensive, pre-conference AI training and certification tracks. AI Expo & Demo Hall: Explore the latest advancements, witness live product demos, and meet technical experts from leading AI companies and innovative startups. This is the place to gain insight into build vs. buy decisions for enterprise AI solutions. Networking+ Activities & Events: Challenge yourself to grow your network at our dedicated events, including the Main Network Reception, targeted Lunch Meetups, and exclusive VIP Networking sessions, forging invaluable connections across the global community. Join us in celebrating our community's pursuit of knowledge, inclusivity, and fairness as we work together to move the world of data science forward. Ready to build better AI? Find your perfect pass and secure your spot at ODSC AI East 2026 today! Useful Links Free access to more talks/trainings: Ai+ Training platform ODSC blog: https://opendatascience.com/ Slack Channel: https://hubs.li/Q03SyZP80 Code of conduct: https://odsc.ai/code-of-conduct/	ODSC AI East 2026 \| The #1 AI Builders Conference
Event Nov 24 - Best of ICCV (Day 4) 2025-11-24
DuoLoRA: Cycle-consistent and Rank-disentangled Content-Style Personalization 2025-11-24 · 17:00 Aniket Roy – PhD student in Computer Science @ Johns Hopkins University We tackle the challenge of jointly personalizing content and style from a few examples. A promising approach is to train separate Low-Rank Adapters (LoRA) and merge them effectively, preserving both content and style. Existing methods, such as ZipLoRA, treat content and style as independent entities, merging them by learning masks in LoRA's output dimensions. However, content and style are intertwined, not independent. To address this, we propose DuoLoRA, a content-style personalization framework featuring three key components: (i) rank-dimension mask learning, (ii) effective merging via layer priors, and (iii) Constyle loss, which leverages cycle-consistency in the merging process. First, we introduce ZipRank, which performs content-style merging within the rank dimension, offering adaptive rank flexibility and significantly reducing the number of learnable parameters. Additionally, we incorporate SDXL layer priors to apply implicit rank constraints informed by each layer's content-style bias and adaptive merger initialization, enhancing the integration of content and style. To further refine the merging process, we introduce Constyle loss, which leverages the cycle-consistency between content and style. Our experimental results demonstrate that DuoLoRA outperforms state-of-the-art content-style merging methods across multiple benchmarks.
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting 2025-11-24 · 17:00 Alexey Kravets – PhD student in AI @ University of Bath CLIP is a foundational model with transferable classification performance in the few-shot setting. Several methods have shown improved performance of CLIP using few-shot examples. However, so far, all these techniques have been benchmarked using standard few-shot datasets. We argue that this mode of evaluation does not provide a true indication of the inductive generalization ability using few-shot examples. As most datasets have been seen by the CLIP model, the resultant setting can be termed as partially transductive. To solve this, we propose a pipeline that uses an unlearning technique to obtain true inductive baselines. In this new inductive setting, the methods show a significant drop in performance (-55% on average among 13 baselines with multiple datasets). We validate the unlearning technique using oracle baselines. An improved few-shot classification technique is proposed that consistently obtains state-of-the-art performance over 13 other recent baseline methods on a comprehensive analysis with 5880 experiments - varying the datasets, differing number of few-shot examples, unlearning setting, and with different seeds. Thus, we identify the issue with the evaluation of CLIP-based few-shot classification, provide a solution using unlearning, propose new benchmarks, and provide an improved method. Oracle

UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields 2025-11-24 · 17:00 Fabian Perez – computer science student @ Universidad Industrial de Santander (UIS) Neural Radiance Field (NeRF)-based segmentation methods focus on object semantics and rely solely on RGB data, lacking intrinsic material properties. We introduce UnMix-NeRF, a framework that integrates spectral unmixing into NeRF, enabling joint hyperspectral novel view synthesis and unsupervised material segmentation. Our method models spectral reflectance via diffuse and specular components, where a learned dictionary of global endmembers represents pure material signatures, and per-point abundances capture their distribution. For material segmentation, we use spectral signature predictions along learned endmembers, allowing unsupervised material clustering. Additionally, UnMix-NeRF enables scene editing by modifying learned endmember dictionaries for flexible material-based appearance manipulation. Extensive experiments validate our approach, demonstrating superior spectral reconstruction and material segmentation to existing methods.	Nov 24 - Best of ICCV (Day 4)
DuoLoRA: Cycle-consistent and Rank-disentangled Content-Style Personalization 2025-11-24 · 17:00 Aniket Roy – PhD student in Computer Science @ Johns Hopkins University We tackle the challenge of jointly personalizing content and style from a few examples. A promising approach is to train separate Low-Rank Adapters (LoRA) and merge them effectively, preserving both content and style. Existing methods, such as ZipLoRA, treat content and style as independent entities, merging them by learning masks in LoRA's output dimensions. However, content and style are intertwined, not independent. To address this, we propose DuoLoRA, a content-style personalization framework featuring three key components: (i) rank-dimension mask learning, (ii) effective merging via layer priors, and (iii) Constyle loss, which leverages cycle-consistency in the merging process. First, we introduce ZipRank, which performs content-style merging within the rank dimension, offering adaptive rank flexibility and significantly reducing the number of learnable parameters. Additionally, we incorporate SDXL layer priors to apply implicit rank constraints informed by each layer's content-style bias and adaptive merger initialization, enhancing the integration of content and style. To further refine the merging process, we introduce Constyle loss, which leverages the cycle-consistency between content and style. Our experimental results demonstrate that DuoLoRA outperforms state-of-the-art content-style merging methods across multiple benchmarks. lora ziplora rank-dimension mask learning constyle loss sdxl layer priors ziprank	Nov 24 - Best of ICCV (Day 4)
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models 2025-11-24 · 17:00 Shijie Zhou – final-year PhD candidate @ UCLA Are Vision-Language Models Ready for Physical AI? Humans easily understand how objects move, rotate, and shift while current AI models that connect vision and language still make mistakes in what seem like simple situations: deciding “left” versus “right” when something is moving, recognizing how perspective changes, or keeping track of motion over time. To reveal these kinds of limitations, we created VLM4D, a testing suite made up of real-world and synthetic videos, each paired with questions about motion, rotation, perspective, and continuity. When we put modern vision-language models through these challenges, they performed far below human levels, especially when visual cues must be combined or the sequence of events must be maintained. But there is hope: new methods such as reconstructing visual features in 4D and fine-tuning focused on space and time show noticeable improvement, bringing us closer to AI that truly understands a dynamic physical world. AI/ML	Nov 24 - Best of ICCV (Day 4)
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting 2025-11-24 · 17:00 Alexey Kravets – PhD student in AI @ University of Bath CLIP is a foundational model with transferable classification performance in the few-shot setting. Several methods have shown improved performance of CLIP using few-shot examples. However, so far, all these techniques have been benchmarked using standard few-shot datasets. We argue that this mode of evaluation does not provide a true indication of the inductive generalization ability using few-shot examples. As most datasets have been seen by the CLIP model, the resultant setting can be termed as partially transductive. To solve this, we propose a pipeline that uses an unlearning technique to obtain true inductive baselines. In this new inductive setting, the methods show a significant drop in performance (-55% on average among 13 baselines with multiple datasets). We validate the unlearning technique using oracle baselines. An improved few-shot classification technique is proposed that consistently obtains state-of-the-art performance over 13 other recent baseline methods on a comprehensive analysis with 5880 experiments - varying the datasets, differing number of few-shot examples, unlearning setting, and with different seeds. Thus, we identify the issue with the evaluation of CLIP-based few-shot classification, provide a solution using unlearning, propose new benchmarks, and provide an improved method. Oracle	Nov 24 - Best of ICCV (Day 4)
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting 2025-11-24 · 17:00 Alexey Kravets – PhD student in AI @ University of Bath CLIP is a foundational model with transferable classification performance in the few-shot setting. Several methods have shown improved performance of CLIP using few-shot examples. However, so far, all these techniques have been benchmarked using standard few-shot datasets. We argue that this mode of evaluation does not provide a true indication of the inductive generalization ability using few-shot examples. As most datasets have been seen by the CLIP model, the resultant setting can be termed as partially transductive. To solve this, we propose a pipeline that uses an unlearning technique to obtain true inductive baselines. In this new inductive setting, the methods show a significant drop in performance (-55% on average among 13 baselines with multiple datasets). We validate the unlearning technique using oracle baselines. An improved few-shot classification technique is proposed that consistently obtains state-of-the-art performance over 13 other recent baseline methods on a comprehensive analysis with 5880 experiments - varying the datasets, differing number of few-shot examples, unlearning setting, and with different seeds. Thus, we identify the issue with the evaluation of CLIP-based few-shot classification, provide a solution using unlearning, propose new benchmarks, and provide an improved method. Oracle	Nov 24 - Best of ICCV (Day 4)
Forecasting Continuous Non-Conservative Dynamical Systems in SO(3) 2025-11-24 · 17:00 Lennart Bastian – PhD candidate at TU Munich's CAMP lab @ TU Munich, CAMP Lab Tracking and forecasting the rotation of objects is fundamental in computer vision and robotics, yet SO(3) extrapolation remains challenging as (1) sensor observations can be noisy and sparse, (2) motion patterns can be governed by complex dynamics, and (3) application settings can demand long-term forecasting. This work proposes modeling continuous-time rotational object dynamics on SO(3) using Neural Controlled Differential Equations guided by Savitzky-Golay paths. Unlike existing methods that rely on simplified motion assumptions, our method learns a general latent dynamical system of the underlying object trajectory while respecting the geometric structure of rotations. Experimental results on real-world data demonstrate compelling forecasting capabilities compared to existing approaches.	Nov 24 - Best of ICCV (Day 4)
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models 2025-11-24 · 17:00 Shijie Zhou – final-year PhD candidate @ UCLA Are Vision-Language Models Ready for Physical AI? Humans easily understand how objects move, rotate, and shift while current AI models that connect vision and language still make mistakes in what seem like simple situations: deciding left versus right when something is moving, recognizing how perspective changes, or keeping track of motion over time. To reveal these kinds of limitations, we created VLM4D, a testing suite made up of real-world and synthetic videos, each paired with questions about motion, rotation, perspective, and continuity. When we put modern vision-language models through these challenges, they performed far below human levels, especially when visual cues must be combined or the sequence of events must be maintained. But there is hope: new methods such as reconstructing visual features in 4D and fine-tuning focused on space and time show noticeable improvement, bringing us closer to AI that truly understands a dynamic physical world. AI/ML	Nov 24 - Best of ICCV (Day 4)
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields 2025-11-24 · 17:00 Fabian Perez – computer science student @ Universidad Industrial de Santander (UIS) Neural Radiance Field (NeRF)-based segmentation methods focus on object semantics and rely solely on RGB data, lacking intrinsic material properties. We introduce UnMix-NeRF, a framework that integrates spectral unmixing into NeRF, enabling joint hyperspectral novel view synthesis and unsupervised material segmentation. Our method models spectral reflectance via diffuse and specular components, where a learned dictionary of global endmembers represents pure material signatures, and per-point abundances capture their distribution. For material segmentation, we use spectral signature predictions along learned endmembers, allowing unsupervised material clustering. Additionally, UnMix-NeRF enables scene editing by modifying learned endmember dictionaries for flexible material-based appearance manipulation. Extensive experiments validate our approach, demonstrating superior spectral reconstruction and material segmentation to existing methods.	Nov 24 - Best of ICCV (Day 4)
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields 2025-11-24 · 17:00 Fabian Perez – computer science student @ Universidad Industrial de Santander (UIS) Neural Radiance Field (NeRF)-based segmentation methods focus on object semantics and rely solely on RGB data, lacking intrinsic material properties. We introduce UnMix-NeRF, a framework that integrates spectral unmixing into NeRF, enabling joint hyperspectral novel view synthesis and unsupervised material segmentation. Our method models spectral reflectance via diffuse and specular components, where a learned dictionary of global endmembers represents pure material signatures, and per-point abundances capture their distribution. For material segmentation, we use spectral signature predictions along learned endmembers, allowing unsupervised material clustering. Additionally, UnMix-NeRF enables scene editing by modifying learned endmember dictionaries for flexible material-based appearance manipulation. Extensive experiments validate our approach, demonstrating superior spectral reconstruction and material segmentation to existing methods.	Nov 24 - Best of ICCV (Day 4)
DuoLoRA: Cycle-consistent and Rank-disentangled Content-Style Personalization 2025-11-24 · 17:00 Aniket Roy – PhD student in Computer Science @ Johns Hopkins University We tackle the challenge of jointly personalizing content and style from a few examples. A promising approach is to train separate Low-Rank Adapters (LoRA) and merge them effectively, preserving both content and style. Existing methods, such as ZipLoRA, treat content and style as independent entities, merging them by learning masks in LoRA's output dimensions. However, content and style are intertwined, not independent. To address this, we propose DuoLoRA, a content-style personalization framework featuring three key components: (i) rank-dimension mask learning, (ii) effective merging via layer priors, and (iii) Constyle loss, which leverages cycle-consistency in the merging process. First, we introduce ZipRank, which performs content-style merging within the rank dimension, offering adaptive rank flexibility and significantly reducing the number of learnable parameters. Additionally, we incorporate SDXL layer priors to apply implicit rank constraints informed by each layer's content-style bias and adaptive merger initialization, enhancing the integration of content and style. To further refine the merging process, we introduce Constyle loss, which leverages the cycle-consistency between content and style. Our experimental results demonstrate that DuoLoRA outperforms state-of-the-art content-style merging methods across multiple benchmarks.	Nov 24 - Best of ICCV (Day 4)
Forecasting Continuous Non-Conservative Dynamical Systems in SO(3) 2025-11-24 · 17:00 Lennart Bastian – PhD candidate at TU Munich's CAMP lab @ TU Munich, CAMP Lab Tracking and forecasting the rotation of objects is fundamental in computer vision and robotics, yet SO(3) extrapolation remains challenging as (1) sensor observations can be noisy and sparse, (2) motion patterns can be governed by complex dynamics, and (3) application settings can demand long-term forecasting. This work proposes modeling continuous-time rotational object dynamics on SO(3) using Neural Controlled Differential Equations guided by Savitzky-Golay paths. Unlike existing methods that rely on simplified motion assumptions, our method learns a general latent dynamical system of the underlying object trajectory while respecting the geometric structure of rotations. Experimental results on real-world data demonstrate compelling forecasting capabilities compared to existing approaches.	Nov 24 - Best of ICCV (Day 4)
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models 2025-11-24 · 17:00 Shijie Zhou – final-year PhD candidate @ UCLA Are Vision-Language Models Ready for Physical AI? Humans easily understand how objects move, rotate, and shift while current AI models that connect vision and language still make mistakes in what seem like simple situations: deciding “left” versus “right” when something is moving, recognizing how perspective changes, or keeping track of motion over time. To reveal these kinds of limitations, we created VLM4D, a testing suite made up of real-world and synthetic videos, each paired with questions about motion, rotation, perspective, and continuity. When we put modern vision-language models through these challenges, they performed far below human levels, especially when visual cues must be combined or the sequence of events must be maintained. But there is hope: new methods such as reconstructing visual features in 4D and fine-tuning focused on space and time show noticeable improvement, bringing us closer to AI that truly understands a dynamic physical world. AI/ML	Nov 24 - Best of ICCV (Day 4)
Event Nov 24 - Best of ICCV (Day 4) 2025-11-24
DuoLoRA: Cycle-consistent and Rank-disentangled Content-Style Personalization 2025-11-24 · 17:00 Aniket Roy – PhD student in Computer Science @ Johns Hopkins University We tackle the challenge of jointly personalizing content and style from a few examples. A promising approach is to train separate Low-Rank Adapters (LoRA) and merge them effectively, preserving both content and style. Existing methods, such as ZipLoRA, treat content and style as independent entities, merging them by learning masks in LoRA's output dimensions. However, content and style are intertwined, not independent. To address this, we propose DuoLoRA, a content-style personalization framework featuring three key components: (i) rank-dimension mask learning, (ii) effective merging via layer priors, and (iii) Constyle loss, which leverages cycle-consistency in the merging process. First, we introduce ZipRank, which performs content-style merging within the rank dimension, offering adaptive rank flexibility and significantly reducing the number of learnable parameters. Additionally, we incorporate SDXL layer priors to apply implicit rank constraints informed by each layer's content-style bias and adaptive merger initialization, enhancing the integration of content and style. To further refine the merging process, we introduce Constyle loss, which leverages the cycle-consistency between content and style. Our experimental results demonstrate that DuoLoRA outperforms state-of-the-art content-style merging methods across multiple benchmarks.
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models 2025-11-24 · 17:00 Shijie Zhou – final-year PhD candidate @ UCLA Are Vision-Language Models Ready for Physical AI? Humans easily understand how objects move, rotate, and shift while current AI models that connect vision and language still make mistakes in what seem like simple situations: deciding “left” versus “right” when something is moving, recognizing how perspective changes, or keeping track of motion over time. To reveal these kinds of limitations, we created VLM4D, a testing suite made up of real-world and synthetic videos, each paired with questions about motion, rotation, perspective, and continuity. When we put modern vision-language models through these challenges, they performed far below human levels, especially when visual cues must be combined or the sequence of events must be maintained. But there is hope: new methods such as reconstructing visual features in 4D and fine-tuning focused on space and time show noticeable improvement, bringing us closer to AI that truly understands a dynamic physical world. AI/ML
Rethinking Few Shot CLIP Benchmarks: A Critical Analysis in the Inductive Setting 2025-11-24 · 17:00 Alexey Kravets – PhD student in AI @ University of Bath CLIP is a foundational model with transferable classification performance in the few-shot setting. Several methods have shown improved performance of CLIP using few-shot examples. However, so far, all these techniques have been benchmarked using standard few-shot datasets. We argue that this mode of evaluation does not provide a true indication of the inductive generalization ability using few-shot examples. As most datasets have been seen by the CLIP model, the resultant setting can be termed as partially transductive. To solve this, we propose a pipeline that uses an unlearning technique to obtain true inductive baselines. In this new inductive setting, the methods show a significant drop in performance (-55% on average among 13 baselines with multiple datasets). We validate the unlearning technique using oracle baselines. An improved few-shot classification technique is proposed that consistently obtains state-of-the-art performance over 13 other recent baseline methods on a comprehensive analysis with 5880 experiments - varying the datasets, differing number of few-shot examples, unlearning setting, and with different seeds. Thus, we identify the issue with the evaluation of CLIP-based few-shot classification, provide a solution using unlearning, propose new benchmarks, and provide an improved method. Oracle

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models 2025-11-24 · 17:00 Shijie Zhou – final-year PhD candidate @ UCLA Are Vision-Language Models Ready for Physical AI? Humans easily understand how objects move, rotate, and shift while current AI models that connect vision and language still make mistakes in what seem like simple situations: deciding “left” versus “right” when something is moving, recognizing how perspective changes, or keeping track of motion over time. To reveal these kinds of limitations, we created VLM4D, a testing suite made up of real-world and synthetic videos, each paired with questions about motion, rotation, perspective, and continuity. When we put modern vision-language models through these challenges, they performed far below human levels, especially when visual cues must be combined or the sequence of events must be maintained. But there is hope: new methods such as reconstructing visual features in 4D and fine-tuning focused on space and time show noticeable improvement, bringing us closer to AI that truly understands a dynamic physical world. AI/ML	Nov 24 - Best of ICCV (Day 4)
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models 2025-11-24 · 17:00 Shijie Zhou – final-year PhD candidate @ UCLA Are Vision-Language Models Ready for Physical AI? Humans easily understand how objects move, rotate, and shift while current AI models that connect vision and language still make mistakes in what seem like simple situations: deciding “left” versus “right” when something is moving, recognizing how perspective changes, or keeping track of motion over time. To reveal these kinds of limitations, we created VLM4D, a testing suite made up of real-world and synthetic videos, each paired with questions about motion, rotation, perspective, and continuity. When we put modern vision-language models through these challenges, they performed far below human levels, especially when visual cues must be combined or the sequence of events must be maintained. But there is hope: new methods such as reconstructing visual features in 4D and fine-tuning focused on space and time show noticeable improvement, bringing us closer to AI that truly understands a dynamic physical world. vision-language models 4d feature reconstruction space-time fine-tuning dynamic scene understanding multimodal ai	Nov 24 - Best of ICCV (Day 4)

People (147 results)

Activities & events

Where the global AI community unites to build the future.

First Speakers Announcement

Why You Need to Be Here: Learning and Value

Beyond the Talks: Our Expanded Community Events

Useful Links