Change the repository type filter
All
Repositories list
28 repositories
Pixelle-Video
Public🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
- Awesome Unified Multimodal Models
Marco-Longspeech
PublicMarco-LLM
PublicMarco-MT
PublicComfyUI-Copilot
PublicOvis-Image
PublicOvis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stringent computational con…Pixelle-MCP
Public- An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
Agentic-ADK
PublicDiffusion-SDPO
PublicDiffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models- CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation (ICML2025)
TeEFusion
PublicTeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance (ICCV 2025)flashinfer
Public- 🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.
Wings
PublicThe code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]M3Bench
PublicAutoGPTQ
Public