Abstract: The talk introduces Any Compression via Iterative Pruning (ACIP), a novel approach designed to give users intuitive control over the compression-performance trade-off. ACIP uses a single gradient descent run of iterative pruning to establish a global parameter ranking, enabling immediate materialization of models of any target size. It demonstrates strong predictive performance on downstream tasks without costly fine-tuning and achieves state-of-the-art compression for open-weight LLMs, often complementing common quantization techniques.
talk-data.com
Company
Merantix Momentum
Speakers
1
Activities
1
Speakers from Merantix Momentum
Talks & appearances
1 activities from Merantix Momentum speakers