layout
homepage

Shang Yang

I am a second-year Ph.D. student at HAN LAB of MIT EECS, advised by Prof. Song Han. Before that, I received my Bachelor degree with highest honor from the Department of Electronic Engineering, Tsinghua University, China, where I was fortunate to be advised by Prof. Yu Wang.

My long-term goal is to build efficient machine learning systems for applications at different scales, especially the Large Language Models (LLMs). Recently, I am activately working on the efficient inference systems for LLMs/VLMs.

News

[2025/02] 🏆 Both QServe amd LServe have been accepted by MLSys 2025!
[2025/02] 🔥 We released LServe, substantially accelerating long-sequence LLM inference with Unified Sparse Attention.
[2024/05] 🔥 We released QServe, an efficient large-scale LLM serving framework with W4A8KV4 Quantization.
[2024/05] 🏆 AWQ&TinyChat receives the Best Paper Award of MLSys 2024!
[2024/03] We have released an updated version of TinyChat. Visual Language Models (e.g. VILA) are supported! Play with our demo!
[2024/02] 🔥 AWQ is accepted by MLSys 2024!
[2023/10] 🔥 I presented TorchSparse++ at MICRO 2023! See the video and slides here!

{% include_relative _includes/publications.md %}

{% include_relative _includes/blogs.md %}

{% include_relative _includes/services.md %}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Shang Yang

News

Files

index.md

Latest commit

History

index.md

File metadata and controls

Shang Yang

News