ML Notes

home

❯

inference

❯

SGLang

SGLang

14 Mar 20251 min read

SGLang is a LLM inference engine introduced by LMSYS in January 2024.

It replaced vLLM as the inference engine that powers the Chatbot Arena.

At launch, the main contributions were as follows:

  • radix-attention - a technique for automatic and efficient KV cache reuse across multiple LLM generation calls
  • a domain-specific language (Python based) to control the generation process via structured-output

References

  • Fast and Expressive LLM Inference with RadixAttention and SGLang

Graph View

Created with Quartz v4.5.0 © 2025