Skip to content

Conversation

wwwjn
Copy link
Contributor

@wwwjn wwwjn commented Oct 12, 2025

FLUX.1 model is a diffusion model, which is different from language models and needs to extend train.py as needed.

  • dataset.py
    • moved to datasets/flux_dataset.py
  • tokenizer.py
    • Keep it under models/flux folder
  • integration_test.py
    • moved to tests/integration_tests/flux.py
    • Because FLUX used a separate train.py and run_train.sh, I kept a copy of run_tests() instead of generalizing integration_tests/run_test.py
  • train.py
    • Keep it under models/flux folder
  • validate.py
  • sample.py
    • moved to inference/

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 12, 2025
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some comments.

parser.add_argument(
"--config_path",
default="./torchtitan/experiments/flux/train_configs/debug_model.toml",
default="./tests/integration_tests/base_config.toml",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this I think you'll need to have the custom job config import in all FLUX OverrideDefinitions. BTW I plan to refactor this so that we don't need to always specify the custom impot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice catch! I double check our current design - we used the base_config.toml to enable corss-model tests are consistent, but for flux model it has a separate integration test file, so it's ok to directly reuse the debug model, instead of adding bunch of parameters that are not contained in (not only the custom_job_config, but also inference, validation, etc). WDYT?

)

self.hf_module = self.hf_module.eval().requires_grad_(False)
# This is to make sure the encoders works with FSDP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you share more context around this change? I vaguely remember someone requested this in an issue but we couldn't reproduce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a new change need to run FLUX model with HuggingFace encoders. In my recent test, I find the HF encoder parameters are all stride tensor, which is not compatible with FSDP. So I manually change the encoder's parameters to continuous tensor.

@tianyu-l
Copy link
Contributor

could you also rebase onto #1871

@wwwjn
Copy link
Contributor Author

wwwjn commented Oct 14, 2025

could you also rebase onto #1871

Sure, will rebase later

@wwwjn wwwjn force-pushed the graduate-flux branch 2 times, most recently from 26ef236 to 3f607a9 Compare October 19, 2025 04:56
@wwwjn wwwjn changed the title [WIP] Graduate flux from experiment folder to core torchtitan Graduate flux from experiment folder to core torchtitan Oct 19, 2025
@wwwjn wwwjn requested a review from tianyu-l October 19, 2025 05:01
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. Thanks for the refactor! Had some minor comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more logical organization of files, you think we can put clip and t5 config.json into tests/assets/flux_test_encoders/?

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to rebase onto #1851

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants