SGLang is a LLM inference engine introduced by LMSYS in January 2024.
It replaced vLLM as the inference engine that powers the Chatbot Arena.
At launch, the main contributions were as follows:
- radix-attention - a technique for automatic and efficient KV cache reuse across multiple LLM generation calls
- a domain-specific language (Python based) to control the generation process via structured-output
References