talk-data.com
PyData
talk
2025-11-08 at 18:10
Practical Quantization in Keras: Running Large Models on Small Devices
Event:
PyData Seattle 2025
Speakers
Topics
Description
Large language models are often too large to run on personal machines, requiring specialized hardware with massive memory. Quantization provides a way to shrink models, speed them up, and reduce memory usage - all while retaining most of their accuracy.
This talk introduces the fundamentals of neural network quantization, key techniques, and demonstrates how to apply them using Keras’s extensible quantization framework.