layout |
---|
homepage |
I am a second-year Ph.D. student at HAN LAB of MIT EECS, advised by Prof. Song Han. Before that, I received my Bachelor degree with highest honor from the Department of Electronic Engineering, Tsinghua University, China, where I was fortunate to be advised by Prof. Yu Wang.
My long-term goal is to build efficient machine learning systems for applications at different scales, especially the Large Language Models (LLMs). Recently, I am activately working on the efficient inference systems for LLMs/VLMs.
- [2025/02] 🏆 Both QServe amd LServe have been accepted by MLSys 2025!
- [2025/02] 🔥 We released LServe, substantially accelerating long-sequence LLM inference with Unified Sparse Attention.
- [2024/05] 🔥 We released QServe, an efficient large-scale LLM serving framework with W4A8KV4 Quantization.
- [2024/05] 🏆 AWQ&TinyChat receives the Best Paper Award of MLSys 2024!
- [2024/03] We have released an updated version of TinyChat. Visual Language Models (e.g. VILA) are supported! Play with our demo!
- [2024/02] 🔥 AWQ is accepted by MLSys 2024!
- [2023/10] 🔥 I presented TorchSparse++ at MICRO 2023! See the video and slides here!
{% include_relative _includes/publications.md %}
{% include_relative _includes/blogs.md %}
{% include_relative _includes/services.md %}