ML Notes

❯

❯

SGLang

14 Mar 20251 min read

SGLang is a LLM inference engine introduced by LMSYS in January 2024.

It replaced vLLM as the inference engine that powers the Chatbot Arena.

At launch, the main contributions were as follows:

radix-attention - a technique for automatic and efficient KV cache reuse across multiple LLM generation calls
a domain-specific language (Python based) to control the generation process via structured-output

References

Fast and Expressive LLM Inference with RadixAttention and SGLang

Graph View

Created with Quartz v4.5.0 © 2025