Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
20453ce
[test] Lower number of top logprobs to get rid of `-inf` (#3212)
ByronHsu Jan 30, 2025
c38b5fb
update 3rdparty and rms norm for sgl-kernel (#3213)
zhyncs Jan 30, 2025
468d23c
update setup for sgl-kernel (#3214)
zhyncs Jan 30, 2025
222ce6f
add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216)
zhyncs Jan 30, 2025
e81d7f1
add tensorrt_llm moe_gemm as 3rdparty (#3217)
zhyncs Jan 30, 2025
9602c2a
keep the parts needed for moe_kernels (#3218)
zhyncs Jan 30, 2025
cde4bbd
docs: add Novita for adoption and sponsorship (#3227)
Ying1123 Jan 31, 2025
9829e77
Docs: Update supported models with Mistral 3 (#3229)
ravi03071991 Jan 31, 2025
3ee6223
revert the MoE dependence (#3230)
zhyncs Jan 31, 2025
734daed
[fix] Clamp logprob with dtype min to prevent `-inf` (#3224)
ByronHsu Jan 31, 2025
c02e313
Fix block wise fp8 torch compile (#3232)
ispobock Jan 31, 2025
b49d6d0
support 12.5 CUDA runtime (#3231)
zhyncs Jan 31, 2025
cf0f7ea
chore: bump v0.4.2.post1 (#3233)
zhyncs Jan 31, 2025
656f7fc
Docs: Quick fix for Speculative_decoding doc (#3228)
jhinpan Jan 31, 2025
7811bfd
compatible with flashinfer v0.2 (#3235)
zhyncs Jan 31, 2025
1ebe1d6
Optimize MoE topk with torch compile (#3236)
ispobock Jan 31, 2025
34e405e
update sgl-kernel version for sglang (#3238)
zhyncs Jan 31, 2025
7876279
update cutlass dependency (#3240)
zhyncs Jan 31, 2025
7b020cc
add tuning block wise fp8 (#3242)
zhyncs Jan 31, 2025
d7c0b32
[Docs] Add more details to profiling docs (#3221)
Edenzzzz Jan 31, 2025
5317902
Add test for fp8 torch compile (#3246)
ispobock Feb 1, 2025
17dbf97
update ENV to ROCm dockers (#3248)
HaiShaw Feb 1, 2025
4eb4b40
update and simplify CustomOp (#3249)
zhyncs Feb 1, 2025
8db776f
support QuickGELU (#3250)
zhyncs Feb 1, 2025
ad67409
add contact us in README (#3251)
zhyncs Feb 1, 2025
f2b3a31
Update README
zhyncs Feb 1, 2025
959dca4
use srt VocabParallelEmbedding (#3252)
zhyncs Feb 1, 2025
d9eb935
Tune paged attention parameters for AMD GPU. (#3255)
whchung Feb 2, 2025
c27c378
docs/accuracy evaluation (#3114)
simveit Feb 2, 2025
55f5fc6
Docs: Update accuracy evaluation (#3261)
zhaochenyang20 Feb 2, 2025
566d61d
ROCm: bump 6.3.0 (#3259)
HaiShaw Feb 2, 2025
28b0a62
Bug: Fix min_p sampling crash when using flashinfer backend (#3207)
zifeitong Feb 2, 2025
455bfe8
Add a Doc about guide on nvidia jetson #3182 (#3205)
lycanlancelot Feb 3, 2025
3c8ac78
optimize test_fused_moe style (#3268)
BBuf Feb 3, 2025
013021b
refactor EAGLE 2 (#3269)
zhyncs Feb 3, 2025
00fa7d0
add copyright for sgl-kernel (#3270)
zhyncs Feb 3, 2025
d54cee1
adding Triton configs for DeepSeekV3 on Blackwell (#3272)
kushanam Feb 3, 2025
897e2e2
add Nebius for Adoption and Sponsorship (#3274)
zhyncs Feb 3, 2025
4b6f62e
add Atlas Cloud for Adoption and Sponsorship (#3276)
zhyncs Feb 3, 2025
7b5a374
Update server args doc (#3273)
simveit Feb 3, 2025
70817a7
[Feature] Define backends and add Triton backend for Lora (#3161)
Fridge003 Feb 4, 2025
d39899e
upgrade flashinfer v0.2.0.post2 (#3288)
zhyncs Feb 4, 2025
2c1a695
ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287)
HaiShaw Feb 4, 2025
a07364c
Update Triton decode backend interface (#3292)
ispobock Feb 4, 2025
6186a8f
update flashinfer install index url (#3293)
zhyncs Feb 4, 2025
c7256ca
[ROCm] Add tuning configs for AMD Radeon Graphics. (#3294)
whchung Feb 4, 2025
c2723a4
[ROCm] Manually unroll _w8a8_block_fp8_matmul kernel on AMD GPU. (#3299)
whchung Feb 4, 2025
4885b90
Use forward_cuda to execute custom op for hip platform (#3305)
kkHuang-amd Feb 5, 2025
7ab8494
[ROCm] Logic to decide whether to used manually unrolled kernel. (#3306)
whchung Feb 5, 2025
76fa2d1
Fix lora flashinfer import bug on ROCM (#3312)
Fridge003 Feb 5, 2025
7aad8d1
chore: bump v0.4.2.post2 (#3313)
zhyncs Feb 5, 2025
de55333
Update Triton extend backend interface (#3309)
ispobock Feb 5, 2025
883a547
Merge branch 'main' into deepauto/feat/update-upstream
daniel-geon-park Feb 5, 2025
6f1509b
fix bug
daniel-geon-park Feb 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .clang-format-ignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sgl-kernel/3rdparty/tensorrt_llm/*
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@
- [ ] Format your code according to the [Code Formatting with Pre-Commit](https://docs.sglang.ai/references/contribution_guide.html#code-formatting-with-pre-commit).
- [ ] Add unit tests as outlined in the [Running Unit Tests](https://docs.sglang.ai/references/contribution_guide.html#running-unit-tests-adding-to-ci).
- [ ] Update documentation / docstrings / example tutorials as needed, according to [Writing Documentation](https://docs.sglang.ai/references/contribution_guide.html#writing-documentation-running-docs-ci).
- [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to [Benchmark and Profiling](https://docs.sglang.ai/references/benchmark_and_profiling.html).
- [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to [Benchmark and Profiling](https://docs.sglang.ai/references/benchmark_and_profiling.html) and [Accuracy Results](https://docs.sglang.ai/references/accuracy_evaluation.html).
16 changes: 8 additions & 8 deletions .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:

- name: Install dependencies
env:
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.4/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.4/flashinfer' }}
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.5/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.5/flashinfer' }}
run: |
bash scripts/ci_install_dependency.sh

Expand All @@ -60,7 +60,7 @@ jobs:

- name: Install dependencies
env:
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.4/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.4/flashinfer' }}
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.5/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.5/flashinfer' }}
run: |
bash scripts/ci_install_dependency.sh

Expand All @@ -84,7 +84,7 @@ jobs:

- name: Install dependencies
env:
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.4/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.4/flashinfer' }}
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.5/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.5/flashinfer' }}
run: |
bash scripts/ci_install_dependency.sh

Expand Down Expand Up @@ -121,7 +121,7 @@ jobs:

- name: Install dependencies
env:
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.4/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.4/flashinfer' }}
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.5/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.5/flashinfer' }}
run: |
bash scripts/ci_install_dependency.sh

Expand Down Expand Up @@ -165,7 +165,7 @@ jobs:

- name: Install dependencies
env:
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.4/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.4/flashinfer' }}
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.5/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.5/flashinfer' }}
run: |
bash scripts/ci_install_dependency.sh

Expand Down Expand Up @@ -196,7 +196,7 @@ jobs:

- name: Install dependencies
env:
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.4/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.4/flashinfer' }}
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.5/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.5/flashinfer' }}
run: |
bash scripts/ci_install_dependency.sh

Expand Down Expand Up @@ -234,7 +234,7 @@ jobs:

- name: Install dependencies
env:
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.4/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.4/flashinfer' }}
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.5/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.5/flashinfer' }}
run: |
bash scripts/ci_install_dependency.sh

Expand All @@ -258,7 +258,7 @@ jobs:

- name: Install dependencies
env:
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.4/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.4/flashinfer' }}
FLASHINFER_REPO: ${{ inputs.version == 'nightly' && 'https://flashinfer.ai/whl/nightly/cu124/torch2.5/flashinfer' || 'https://flashinfer.ai/whl/cu124/torch2.5/flashinfer' }}
run: |
bash scripts/ci_install_dependency.sh

Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/release-docker-amd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
environment: 'prod'
strategy:
matrix:
rocm_version: ['6.2.0']
rocm_version: ['6.3.0']
build_type: ['all', 'srt']
steps:
- name: Checkout repository
Expand All @@ -41,8 +41,8 @@ jobs:
run: |
version=$(cat python/sglang/version.py | cut -d'"' -f2)

if [ "${{ matrix.rocm_version }}" = "6.2.0" ]; then
rocm_tag="rocm620"
if [ "${{ matrix.rocm_version }}" = "6.3.0" ]; then
rocm_tag="rocm630"
else
echo "Unsupported ROCm version"
exit 1
Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/release-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
environment: 'prod'
strategy:
matrix:
cuda_version: ['11.8.0', '12.1.1', '12.4.1']
cuda_version: ['11.8.0', '12.1.1', '12.4.1', '12.5.1']
build_type: ['all', 'srt']
steps:
- name: Delete huge unnecessary tools folder
Expand All @@ -39,6 +39,8 @@ jobs:
cuda_tag="cu121"
elif [ "${{ matrix.cuda_version }}" = "12.4.1" ]; then
cuda_tag="cu124"
elif [ "${{ matrix.cuda_version }}" = "12.5.1" ]; then
cuda_tag="cu125"
else
echo "Unsupported CUDA version"
exit 1
Expand All @@ -58,7 +60,7 @@ jobs:
docker build . -f docker/Dockerfile --build-arg CUDA_VERSION=${{ matrix.cuda_version }} --build-arg BUILD_TYPE=${{ matrix.build_type }} -t lmsysorg/sglang:${tag}${tag_suffix} --no-cache
docker push lmsysorg/sglang:${tag}${tag_suffix}

if [ "${{ matrix.cuda_version }}" = "12.4.1" ]; then
if [ "${{ matrix.cuda_version }}" = "12.5.1" ]; then
docker tag lmsysorg/sglang:${tag}${tag_suffix} lmsysorg/sglang:latest${tag_suffix}
docker push lmsysorg/sglang:latest${tag_suffix}
fi
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,11 @@ Learn more in the release blogs: [v0.2 blog](https://lmsys.org/blog/2024-07-25-s
[Development Roadmap (2024 Q4)](https://github.com/sgl-project/sglang/issues/1487)

## Adoption and Sponsorship
The project is supported by (alphabetically): AMD, Baseten, Cursor, DataCrunch, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, LMSYS.org, Meituan, NVIDIA, RunPod, Stanford, UC Berkeley, UCLA, xAI, 01.AI.
The project is supported by (alphabetically): AMD, Atlas Cloud, Baseten, Cursor, DataCrunch, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, LMSYS CORP, Meituan, Nebius, Novita AI, NVIDIA, RunPod, Stanford, UC Berkeley, UCLA, xAI, 01.AI.

## Contact Us

For enterprises interested in adopting or deploying SGLang at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at contact@sglang.ai.

## Acknowledgment and Citation
We learned the design and reused code from the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), and [LMQL](https://github.com/eth-sri/lmql). Please cite the paper, [SGLang: Efficient Execution of Structured Language Model Programs](https://arxiv.org/abs/2312.07104), if you find the project useful.
Loading
Loading