AIDC-AI

All

28 repositories

Pixelle-Video
Public
🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine
tts image-generation video-generation
tts image-generation video-generation aigc comfyui
Python
•
Apache License 2.0
•517•3.1k•37•2•Updated Mar 8, 2026Mar 8, 2026
Pixelle-Studio
Public
Your Zero-Code AI File Expert
agent ai skills
agent ai skills filesystem mcp assistant llm tooluse
Python
•
Apache License 2.0
•1•5•0•0•Updated Feb 27, 2026Feb 27, 2026
Marco-o1
Public
An Open Large Reasoning Model for Real-World Solutions
Python
•
Other
•80•1.5k•10•0•Updated Feb 13, 2026Feb 13, 2026
Marco-DeepResearch
Public
Marco Search Agent for Realistic and Challenging Agentic Search
Python
•
Apache License 2.0
•22•254•3•0•Updated Feb 12, 2026Feb 12, 2026
Ovis
Public
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
chatbot multimodality multimodal
chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3
Python
•
Apache License 2.0
•84•1.4k•79•3•Updated Feb 11, 2026Feb 11, 2026
Awesome-Unified-Multimodal-Models
Public
Awesome Unified Multimodal Models
multimodal-models text-to-image-generation vision-language-model
multimodal-models text-to-image-generation vision-language-model multimodal-large-language-models unified-multimodal-models
37•1.1k•6•7•Updated Feb 6, 2026Feb 6, 2026
Marco-Longspeech
Public
Apache License 2.0
•0•1•0•0•Updated Feb 6, 2026Feb 6, 2026
Marco-LLM
Public
Multilingual and Multiculture Benchmark and LLM
Python
•0•13•0•0•Updated Feb 3, 2026Feb 3, 2026
Marco-MT
Public
3•22•0•0•Updated Feb 3, 2026Feb 3, 2026
Omni-View
Public
Python
•
Apache License 2.0
•4•104•1•0•Updated Jan 27, 2026Jan 27, 2026
ComfyUI-Copilot
Public
An AI-powered custom node for ComfyUI designed to enhance workflow automation and provide intelligent assistance
agent flux ai
agent flux ai copilot rag gpt-4 stable-diffusion comfyui llm-agent deepseek
TypeScript
•
MIT License
•287•4.7k•33•3•Updated Jan 12, 2026Jan 12, 2026
Ovis-Image
Public
Ovis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stringent computational con…
image-generation text-to-image
image-generation text-to-image
Python
•
Apache License 2.0
•19•307•4•0•Updated Dec 21, 2025Dec 21, 2025
Pixelle-MCP
Public
An Open-Source Multimodal AIGC Solution based on ComfyUI + MCP + LLM https://pixelle.ai
Python
•
MIT License
•123•934•9•6•Updated Dec 17, 2025Dec 17, 2025
Marco-Voice
Public
A Unified Framework for Expressive Speech Synthesis with Voice Cloning
Python
•
Apache License 2.0
•36•408•7•0•Updated Dec 3, 2025Dec 3, 2025
Ovis-U1
Public
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
image-editing text-to-image multimodal-large-language-models
image-editing text-to-image multimodal-large-language-models
Python
•
Apache License 2.0
•14•452•3•0•Updated Dec 2, 2025Dec 2, 2025
Agentic-ADK
Public
Agentic ADK is an Agent application development framework launched by Alibaba International AI Business, based on Google-ADK and Ali-LangEngine.
Java
•
Apache License 2.0
•123•656•13•4•Updated Nov 24, 2025Nov 24, 2025
Diffusion-SDPO
Public
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models
text-to-image diffusion-model dpo
text-to-image diffusion-model dpo flowmatching
Python
•
Apache License 2.0
•1•21•3•0•Updated Nov 11, 2025Nov 11, 2025
CHATS
Public
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation (ICML2025)
text-to-image dpo sdxl
text-to-image dpo sdxl
Python
•
Apache License 2.0
•2•114•1•0•Updated Aug 19, 2025Aug 19, 2025
TeEFusion
Public
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance (ICCV 2025)
text-to-image distillation-model sd3
text-to-image distillation-model sd3 classifier-free-guidance
Python
•
Other
•2•9•2•0•Updated Jul 25, 2025Jul 25, 2025
flashinfer
Public
FlashInfer: Kernel Library for LLM Serving
Cuda
•
Apache License 2.0
•803•1•0•0•Updated Jul 15, 2025Jul 15, 2025
UNIC-Adapter
Public
Python
•
MIT License
•0•10•1•0•Updated Jul 10, 2025Jul 10, 2025
Parrot
Public
🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.
multilingual mixture-of-experts vision-language-model
multilingual mixture-of-experts vision-language-model multimodal-large-language-models
Python
•
Apache License 2.0
•3•77•1•0•Updated Jun 12, 2025Jun 12, 2025
TransBench
Public
2•39•4•0•Updated May 29, 2025May 29, 2025
TG-LLaVA
Public
Python
•
Apache License 2.0
•0•9•0•0•Updated Jan 14, 2025Jan 14, 2025
Wings
Public
The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]
deep-learning mllm multimodal-large-language-models
deep-learning mllm multimodal-large-language-models multimodal-llm text-only-forgetting
Python
•
Apache License 2.0
•0•26•1•0•Updated Dec 28, 2024Dec 28, 2024
M3Bench
Public
Python
•
Apache License 2.0
•5•2•0•0•Updated Dec 15, 2024Dec 15, 2024
Meissonic
Public
Python
•
Other
•0•3•0•0•Updated Nov 14, 2024Nov 14, 2024
AutoGPTQ
Public
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Python
•
Other
•533•3•0•0•Updated Nov 4, 2024Nov 4, 2024