βIf this work is helpful for you, please help star this repo. Thanks!π€
1οΈβ£ VAR exhibits scale and spatial redundancy, causing high GPU memory consumption.
2οΈβ£ The proposed method enables MVAR generation without relying on KV cache during inference.
- 2025-05-20: Our MVAR paper has been published on arXiv.
Our MVAR introduces the scale and spatial Markovian assumpation which only adopt adjacent preceding scale for next-scale prediction and restricts the attention of each token to a localized neighborhood of size k at corresponding positions on adjacent scales.
- π Paper available on arXiv
- π§ Codebase under preparation
- π Planned improvements and model refinement
Our MVAR model achieves a 3.0Γ reduction in GPU memory footprint compared to VAR. Detailed results can be found in the paper.
Please cite us if our work is useful for your research.
@article{zhang2025mvar,
title={MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning},
author={Zhang, Jinhua and Long, Wei and Han, Minghao and You, Weiyi and Gu, Shuhang},
journal={arXiv preprint arXiv:2505.12742},
year={2025}
}
If you have any questions, feel free to approach me at [email protected]