2 articles tagged #performance.
Techniques to reduce cost, latency, and resources needed to run language models in production, from quantization to distributed serving.
React paradigm where components execute on the server, sending only HTML to the client, reducing the JavaScript bundle and improving performance.