It seems that vllm does not support the InfLLM-v2 attention kernel. Could you tell me how much of a performance improvement InfLLM-v2 offers, and what the trade-offs are? For example, any performance loss? How can CPM.cu be set up to run an OpenAI-compatible API service like vllm?