1 article tagged #quantization.
Techniques to reduce cost, latency, and resources needed to run language models in production, from quantization to distributed serving.