Skip to content

Commit 75b660e

Browse files
Cosmos text to video example.
1 parent be1ba46 commit 75b660e

File tree

4 files changed

+706
-0
lines changed

4 files changed

+706
-0
lines changed

README.md

+2
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ Here are some more advanced examples:
6262

6363
[Hunyuan Video](hunyuan_video)
6464

65+
[Nvidia Cosmos](cosmos)
66+
6567
[Audio Models](audio)
6668

6769

cosmos/README.md

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Nvidia Cosmos Models
2+
3+
[Nvidia Cosmos](https://www.nvidia.com/en-us/ai/cosmos/) is a family of "World Models". ComfyUI currently supports specifically the 7B and 14B text to video diffusion models and the 7B and 14B image to video diffusion models.
4+
5+
## Files to Download
6+
7+
You will first need:
8+
9+
#### Text encoder and VAE:
10+
11+
[oldt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/comfyanonymous/cosmos_1.0_text_encoder_and_VAE_ComfyUI/tree/main/text_encoders) goes in: ComfyUI/models/text_encoders/
12+
13+
[cosmos_cv8x8x8_1.0.safetensors](https://huggingface.co/comfyanonymous/cosmos_1.0_text_encoder_and_VAE_ComfyUI/blob/main/vae/cosmos_cv8x8x8_1.0.safetensors) goes in: ComfyUI/models/vae/
14+
15+
Note: oldt5_xxl is not the same as the t5xxl used in flux and other models.
16+
oldt5_xxl is t5xxl 1.0 while the one used in flux and others is t5xxl 1.1
17+
18+
#### Video Models
19+
20+
The video models can be found [in safetensors format here.](https://huggingface.co/mcmonkey/cosmos-1.0/tree/main)
21+
22+
The workflows on this page use [Cosmos-1_0-Diffusion-7B-Text2World.safetensors](https://huggingface.co/mcmonkey/cosmos-1.0/blob/main/Cosmos-1_0-Diffusion-7B-Text2World.safetensors) and [Cosmos-1_0-Diffusion-7B-Video2World.safetensors](https://huggingface.co/mcmonkey/cosmos-1.0/blob/main/Cosmos-1_0-Diffusion-7B-Video2World.safetensors)
23+
24+
These files go in: ComfyUI/models/diffusion_models
25+
26+
Note: "Text to World" means Text to video and "Video to World" means image/video to video.
27+
28+
If you want the original diffusion models in .pt format instead of the repacked safetensors the official links are: [7B-Text2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Text2World) [7B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Video2World) [14B-Text2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Text2World) [14B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Video2World)
29+
30+
## Workflows
31+
32+
### Text to Video
33+
34+
This workflow requires the 7B text to video model that you can download above.
35+
36+
![Example](text_to_video_cosmos_7B.webp)
37+
38+
[Workflow in Json format](text_to_video_cosmos_7B.json)
39+
40+
### Image to Video
41+

0 commit comments

Comments
 (0)