Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 2.32 KB

index.md

File metadata and controls

25 lines (17 loc) · 2.32 KB
layout
homepage

Shang Yang

I am a second-year Ph.D. student at HAN LAB of MIT EECS, advised by Prof. Song Han. Before that, I received my Bachelor degree with highest honor from the Department of Electronic Engineering, Tsinghua University, China, where I was fortunate to be advised by Prof. Yu Wang.

My long-term goal is to build efficient machine learning systems for applications at different scales, especially the Large Language Models (LLMs). Recently, I am activately working on the efficient inference systems for LLMs/VLMs.

News

  • [2025/02] 🏆 Both QServe amd LServe have been accepted by MLSys 2025!
  • [2025/02] 🔥 We released LServe, substantially accelerating long-sequence LLM inference with Unified Sparse Attention.
  • [2024/05] 🔥 We released QServe, an efficient large-scale LLM serving framework with W4A8KV4 Quantization.
  • [2024/05] 🏆 AWQ&TinyChat receives the Best Paper Award of MLSys 2024!
  • [2024/03] We have released an updated version of TinyChat. Visual Language Models (e.g. VILA) are supported! Play with our demo!
  • [2024/02] 🔥 AWQ is accepted by MLSys 2024!
  • [2023/10] 🔥 I presented TorchSparse++ at MICRO 2023! See the video and slides here!

{% include_relative _includes/publications.md %}

{% include_relative _includes/blogs.md %}

{% include_relative _includes/services.md %}