Company

Alex Legal

Speakers

1

Activities

1

Speakers from Alex Legal

moetez kdayem MLOps Engineer 1

Talks & appearances

1 activities from Alex Legal speakers

Advancing Deep Learning and Vision Efficiency with Mamba, VMamba, and Vim.

· PyData Montreal Meetup #30 (in-person | en personne)

talk

moetez kdayem (MLOps Engineer)

Transformers are foundational in deep learning but face computational inefficiencies with long sequences. Inspired by continuous systems, Mamba, a simplified sequence model that makes State Space Models parameters dynamic and uses a hardware-aware parallel algorithm, achieving up to 5× faster inference than Transformers and linear scaling in sequence length. Mamba excels in language, audio, and genomics tasks without the need for attention mechanisms or MLP blocks. Building on Mamba, it was adapted for vision tasks where challenges like position sensitivity and global context are crucial. VMamba employs Visual State-Space (VSS) blocks and a 2D Selective Scan (SS2D) module to handle visual data efficiently, setting new benchmarks in computational efficiency and performance. Similarly, Vim (Vision Mamba) uses bidirectional Mamba blocks with position embeddings, outperforming models like DeiT without relying on self-attention, highlighting the versatility of state-space models in vision applications.