Skip to content

Commit c1b82ce

Browse files
authored
Adds community post "SGLang Join PyTorch Ecosystem: Efficient LLM Serving Engine" (#1941)
* Adds community post "SGLang Join PyTorch Ecosystem: Efficient LLM Serving Engine" Signed-off-by: Chris Abraham <[email protected]> * fix Signed-off-by: Chris Abraham <[email protected]> * fix typo Signed-off-by: Chris Abraham <[email protected]> * update date Signed-off-by: Chris Abraham <[email protected]> --------- Signed-off-by: Chris Abraham <[email protected]>
1 parent e13f037 commit c1b82ce

File tree

4 files changed

+113
-0
lines changed

4 files changed

+113
-0
lines changed
+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
title: "SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine"
3+
author: "SGLang Team"
4+
ext_url: /blog/sglang-joins-pytorch/
5+
date: Mar 19, 2025
6+
---
7+
8+
We’re thrilled to announce that the SGLang project has been integrated into the PyTorch ecosystem! This integration ensures that SGLang aligns with PyTorch’s standards and practices, providing developers with a reliable and community-supported framework for fast and flexible serving of LLMs.
+105
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
---
2+
layout: blog_detail
3+
title: "SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine"
4+
author: "SGLang Team"
5+
hidden: true
6+
---
7+
8+
9+
![sglang logo](/assets/images/sglang-join-pytorch/fg1.png){:style="max-width:400px; display: block; margin-left: auto; margin-right: auto"}
10+
11+
12+
We’re thrilled to announce that the SGLang project has been integrated into the PyTorch ecosystem! This integration ensures that SGLang aligns with PyTorch’s standards and practices, providing developers with a reliable and community-supported framework for fast and flexible serving of LLMs.
13+
14+
To view the PyTorch Ecosystem, see the [PyTorch Landscape](https://landscape.pytorch.org/) and learn more about how projects can [join the PyTorch Ecosystem](https://github.com/pytorch-fdn/ecosystem).
15+
16+
17+
## About SGLang
18+
19+
SGLang is a fast-serving engine for large language models and vision language models. It makes the interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
20+
21+
The core features include:
22+
23+
* Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, and quantization (FP8/INT4/AWQ/GPTQ).
24+
* Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
25+
* Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, Qwen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
26+
* Active Community: SGLang is open source and backed by an active community with industry adoption.
27+
28+
SGLang is famous for its fast speed. It can often significantly outperform other state-of-the-art frameworks in terms of serving throughput and latency. You can learn more about the underlying techniques from the past release blog posts: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/).
29+
30+
SGLang has been widely adopted by leading industry companies and frontier research labs. For example, xAI uses SGLang to serve its flagship model, [Grok 3](https://grok.com/), which is currently the best model according to the Chatbot Arena leaderboard. Microsoft Azure uses SGLang to serve [DeepSeek R1](https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/running-deepseek-r1-on-a-single-ndv5-mi300x-vm/4372726) on AMD GPUs, which is currently the best open source model.
31+
32+
33+
## Serving DeepSeek Models
34+
35+
You can easily launch a Docker container to serve a DeepSeek model with the following command:
36+
37+
```
38+
# Pull the latest image
39+
docker pull lmsysorg/sglang:latest
40+
41+
# Launch a server
42+
docker run --gpus all --shm-size 32g -p 30000:30000 -v ~/.cache/huggingface:/root/.cache/huggingface --ipc=host --network=host --privileged lmsysorg/sglang:latest \
43+
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code --port 30000
44+
```
45+
46+
Then you can query the server with the OpenAI-compatible API
47+
48+
```
49+
import openai
50+
client = openai.Client(base_url=f"http://127.0.0.1:30000/v1", api_key="None")
51+
52+
response = client.chat.completions.create(
53+
model="deepseek-ai/DeepSeek-V3",
54+
messages=[
55+
{"role": "user", "content": "List 3 countries and their capitals."},
56+
],
57+
temperature=0,
58+
max_tokens=64,
59+
)
60+
```
61+
62+
The server launch command above works for 8xH200. You can find detailed instructions for other hardware (MI300X, H100, A100, H20, L40S) at https://docs.sglang.ai/references/deepseek.html.
63+
64+
SGLang integrates DeepSeek-specific optimizations, such as MLA throughput optimizations, MLA-optimized kernels, data-parallel attention, multi-token prediction, and DeepGemm, making it the top choice for serving DeepSeek models by dozens of [companies](https://x.com/lmsysorg/status/1887262321636221412), including AMD, NVIDIA, and many cloud providers. The team is actively working on integrating more optimizations following the 2025 H1 roadmap below.
65+
66+
67+
## Serving Llama Models
68+
69+
Similarly, you can launch the server for a Llama 3.1 text model with:
70+
71+
```
72+
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct
73+
```
74+
75+
Or a Llama 3.2 multimodal model with:
76+
77+
```
78+
python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-11B-Vision-Instruct --chat-template=llama_3_vision
79+
```
80+
81+
82+
## Roadmap
83+
84+
This year, the SGLang team will continue to push the boundaries of system efficiency. You can find the roadmap of 2025H1 [here](https://github.com/sgl-project/sglang/issues/4042). The focus is
85+
86+
- Throughput-oriented large-scale deployment similar to the DeepSeek inference system
87+
- Long context optimizations
88+
- Low latency speculative decoding
89+
- Reinforcement learning training framework integration
90+
- Kernel optimizations
91+
92+
## Community
93+
94+
SGLang has been deployed to large-scale production, generating trillions of tokens every day. It has an active community with over three hundred contributors on GitHub. It is supported by the following institutions: AMD, Atlas Cloud, Baseten, Cursor, DataCrunch, Etched, Hyperbolic, iFlytek, Jam & Tea Studios, LinkedIn, LMSYS, Meituan, Nebius, Novita AI, NVIDIA, RunPod, Stanford, UC Berkeley, UCLA, xAI, and 01.AI.
95+
96+
97+
![logos](/assets/images/sglang-join-pytorch/fg2.png){:style="width:100%;"}
98+
99+
100+
101+
## Conclusion
102+
103+
We’re excited to welcome SGLang to the PyTorch ecosystem. SGLang accelerates the serving of large language and vision language models. It’s widely adopted by industry, powering the large-scale online serving of frontier models like Grok and DeepSeek.
104+
105+
We invite you to explore the [SGLang GitHub repo](https://github.com/sgl-project/sglang/tree/main), join the [community on Slack](https://slack.mindee.com/), and reach out to [[email protected]](mailto:[email protected]) for inquiries or collaboration opportunities. Together, we can make powerful AI models accessible to everyone.
286 KB
Loading
234 KB
Loading

0 commit comments

Comments
 (0)