Jonatan Matajonmatum.com

concepts notes experiments essays

© 2026 Jonatan Mata. All rights reserved.v2.1.1

#optimization

2 articles tagged #optimization.

Inference Optimization
Techniques to reduce cost, latency, and resources needed to run language models in production, from quantization to distributed serving.
seed #inference #optimization #quantization #latency #serving #llm #performance
Prompt Caching
Technique that stores the internal computation of reused prompt prefixes across LLM calls, reducing costs by up to 90% and latency by up to 85% in applications with repetitive context.
evergreen #prompt-caching #llm #cost-reduction #latency #anthropic #openai #optimization