f5_tts_mlx

Mar 19, 2025

47ea442 · Mar 19, 2025

Name	Name	Last commit message	Last commit date
parent directory ..
tests	tests	Add a script entrypoint for easier invocation.	Oct 14, 2024
README.md	README.md	Update README.md	Jan 22, 2025
__init__.py	__init__.py	Add a script entrypoint for easier invocation.	Oct 14, 2024
audio.py	audio.py	Reorganization and fix recompilation every step in the training loop.	Dec 6, 2024
cfm.py	cfm.py	Support quantized weights.	Mar 19, 2025
convnext_v2.py	convnext_v2.py	Reorganization and fix recompilation every step in the training loop.	Dec 6, 2024
data.py	data.py	Reorganization and fix recompilation every step in the training loop.	Dec 6, 2024
dit.py	dit.py	Add v1 model support.	Mar 18, 2025
duration.py	duration.py	Add v1 model support.	Mar 18, 2025
duration_trainer.py	duration_trainer.py	Reorganization and fix recompilation every step in the training loop.	Dec 6, 2024
generate.py	generate.py	Add v1 model support.	Mar 18, 2025
rope.py	rope.py	Reorganization and fix recompilation every step in the training loop.	Dec 6, 2024
trainer.py	trainer.py	Reorganization and fix recompilation every step in the training loop.	Dec 6, 2024
utils.py	utils.py	Add v1 model support.	Mar 18, 2025

README.md

Required Parameters

--text

string

Provide the text that you want to generate.

Optional Parameters

--duration

float

Specify the length of the generated audio in seconds.

--estimate-duration

bool, default: false

If true, estimate the duration using a heuristic based on the text instead of the duration predictor model.

--speed

float, default: 1.0

Speaking speed modifier, used when an exact duration is not specified.

--model

string, default: "lucasnewman/f5-tts-mlx"

Specify a custom model to use for generation. If not provided, the script will use the default model.

--ref-audio

string, default: "tests/test_en_1_ref_short.wav"

Provide a reference audio file path to help guide the generation.

--ref-text

string, default: "Some call me nature, others call me mother nature."

Provide a caption for the reference audio.

--output

string, default: None

Specify the output path where the generated audio will be saved. If not specified, audio will play as it's generated.

--cfg

float, default: 2.0

Specifies the strength used for classifier free guidance

--method

str, default: "rk4"

Specify the sampling method for the ODE. Options are "euler", "midpoint", and "rk4".

--steps

int, default: 8

Specify the number of steps used to sample the neural ODE. Lower steps trade off quality for latency.

--sway-coef

float, default: -1.0

Set the sway sampling coefficient. The best values according to the paper are in the range of [-1.0...1.0].

--seed

int, default: None (random)

Set a random seed for reproducible results.

--q

int, default: None

Number of bits to use for quantization. 4 and 8 are supported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

f5_tts_mlx

f5_tts_mlx

README.md

Required Parameters

Optional Parameters

Files

f5_tts_mlx

Directory actions

More options

Directory actions

More options

Latest commit

History

f5_tts_mlx

Folders and files

parent directory

README.md

Required Parameters

Optional Parameters