Abstract: The talk introduces Any Compression via Iterative Pruning (ACIP), a novel approach designed to give users intuitive control over the compression-performance trade-off. ACIP uses a single gradient descent run of iterative pruning to establish a global parameter ranking, enabling immediate materialization of models of any target size. It demonstrates strong predictive performance on downstream tasks without costly fine-tuning and achieves state-of-the-art compression for open-weight LLMs, often complementing common quantization techniques.
talk-data.com
Topic
quantization
1
tagged
Activity Trend
2
peak/qtr
2020-Q1
2026-Q1
Filtering by:
Dr. Martin Genzel
×