Problem with training #255

diegobernagozzi · 2025-03-04T15:54:12Z

Hi everyone,

i'm trying to train melotts on italian language, but i don't understand why training does not work. When i run this command:

python3 preprocess_text.py --metadata data/example/metadata.list

everything works fine, even after my italian language modification, but when i run this command:

bash train.sh data/example/config.json 1

it seems the command never stop running, and the training always stay to 0:

0it [00:00, ?it/s]

Train.log says this:

25-03-04 16:25:18,787	example	INFO	{'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 52, 'epochs': 10000, 'learning_rate': 0.0003, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 6, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 16384, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'skip_optimizer': True}, 'data': {'training_files': 'data/example/train.list', 'validation_files': 'data/example/val.list', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1, 'cleaned_text': True, 'spk2id': {'Italian': 0}}, 'model': {'use_spk_conditioned_encoder': True, 'use_noise_scaled_mas': True, 'use_mel_posterior_encoder': False, 'use_duration_discriminator': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'n_layers_trans_flow': 3, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'num_languages': 9, 'num_tones': 17, 'symbols': ['_', '"', '(', ')', '*', '/', ':', 'AA', 'E', 'EE', 'En', 'N', 'OO', 'Q', 'V', '[', '\\', ']', '^', 'a', 'a:', 'aa', 'ae', 'ah', 'ai', 'an', 'ang', 'ao', 'aw', 'ay', 'b', 'by', 'c', 'ch', 'd', 'dh', 'dy', 'e', 'e:', 'eh', 'ei', 'en', 'eng', 'er', 'ey', 'f', 'g', 'gy', 'h', 'hh', 'hy', 'i', 'i0', 'i:', 'ia', 'ian', 'iang', 'iao', 'ie', 'ih', 'in', 'ing', 'iong', 'ir', 'iu', 'iy', 'j', 'jh', 'k', 'ky', 'l', 'm', 'my', 'n', 'ng', 'ny', 'o', 'o:', 'ong', 'ou', 'ow', 'oy', 'p', 'py', 'q', 'r', 'ry', 's', 'sh', 't', 'th', 'ts', 'ty', 'u', 'u:', 'ua', 'uai', 'uan', 'uang', 'uh', 'ui', 'un', 'uo', 'uw', 'v', 'van', 've', 'vn', 'w', 'x', 'y', 'z', 'zh', 'zy', '~', 'æ', 'ç', 'ð', 'ø', 'ŋ', 'œ', 'ɐ', 'ɑ', 'ɒ', 'ɔ', 'ɕ', 'ə', 'ɛ', 'ɜ', 'ɡ', 'ɣ', 'ɥ', 'ɦ', 'ɪ', 'ɫ', 'ɬ', 'ɭ', 'ɯ', 'ɲ', 'ɵ', 'ɸ', 'ɹ', 'ɾ', 'ʁ', 'ʃ', 'ʊ', 'ʌ', 'ʎ', 'ʏ', 'ʑ', 'ʒ', 'ʝ', 'ʲ', 'ˈ', 'ˌ', 'ː', '̃', '̩', 'β', 'θ', 'ᄀ', 'ᄁ', 'ᄂ', 'ᄃ', 'ᄄ', 'ᄅ', 'ᄆ', 'ᄇ', 'ᄈ', 'ᄉ', 'ᄊ', 'ᄋ', 'ᄌ', 'ᄍ', 'ᄎ', 'ᄏ', 'ᄐ', 'ᄑ', 'ᄒ', 'ᅡ', 'ᅢ', 'ᅣ', 'ᅤ', 'ᅥ', 'ᅦ', 'ᅧ', 'ᅨ', 'ᅩ', 'ᅪ', 'ᅫ', 'ᅬ', 'ᅭ', 'ᅮ', 'ᅯ', 'ᅰ', 'ᅱ', 'ᅲ', 'ᅳ', 'ᅴ', 'ᅵ', 'ᆨ', 'ᆫ', 'ᆮ', 'ᆯ', 'ᆷ', 'ᆸ', 'ᆼ', 'ㄸ', '!', '?', '…', ',', '.', "'", '-', '¿', '¡', 'SP', 'UNK', 'ɛ', 'ɔ', 'dz', 'dʒ', 'ʎ', 'ɲ', 'ŋ', 'ʃ', 'ts', 'tʃ', ' ͡ '], 'model_dir': './logs/example', 'pretrain_G': None, 'pretrain_D': None, 'pretrain_dur': None, 'port': 10000}
2025-03-04 16:25:18,788	example	WARNING	/home/ecuser/MeloTTS/melo is not a git repository, therefore hash value comparison will be ignored.
2025-03-04 16:25:21,891	example	ERROR	enc_p.emb.weight is not in the checkpoint
2025-03-04 16:25:21,891	example	ERROR	enc_p.tone_emb.weight is not in the checkpoint
2025-03-04 16:25:21,891	example	ERROR	enc_p.language_emb.weight is not in the checkpoint
2025-03-04 16:25:21,892	example	ERROR	emb_g.weight is not in the checkpoint
2025-03-04 16:25:21,942	example	INFO	Loaded checkpoint '/home/ecuser/.cache/cached_path/73ad3d5a37c82356ed81630b0a435b4b376ca49523854fe2b8302609fd71c193.133b77b9d9162e348486a0a0778fa47d726930e3ec12ea5e2684c0c919743a65' (iteration 0)
2025-03-04 16:25:22,032	example	INFO	Loaded checkpoint '/home/ecuser/.cache/cached_path/c3d3c787a8711093a79ee95f091a35de75e527b6e8e28424ad7010f6e86cce58.e5f88bb1eca17c37beb511b15a932e84fdc8b66d8a8d5c5075334650425954f2' (iteration 0)
2025-03-04 16:25:22,040	example	INFO	Loaded checkpoint '/home/ecuser/.cache/cached_path/c7b373ab8939eb672a985a802d21420534ca0cd43fa4aecf4fa6088a569ee2a1.ce7a8153914d9727ebc28e4b4e3d31eed35aa0b4e3d125eb54e6f8363968dd7a' (iteration 0)
2025-03-04 16:25:22,239	example	INFO	====> Epoch: 1
2025-03-04 16:25:47,138	example	INFO	====> Epoch: 2
2025-03-04 16:25:47,143	example	INFO	====> Epoch: 3
2025-03-04 16:25:47,147	example	INFO	====> Epoch: 4
2025-03-04 16:25:47,152	example	INFO	====> Epoch: 5
2025-03-04 16:25:47,155	example	INFO	====> Epoch: 6
2025-03-04 16:25:47,160	example	INFO	====> Epoch: 7
2025-03-04 16:25:47,164	example	INFO	====> Epoch: 8
2025-03-04 16:25:47,168	example	INFO	====> Epoch: 9
2025-03-04 16:25:47,172	example	INFO	====> Epoch: 10

Any suggestions?

TheSweetestGirlInTheUniverse · 2025-03-09T12:40:53Z

I noticed that your eval_interval is set to 1000, which means the model will be saved every 1000 batches.

diegobernagozzi · 2025-03-10T17:49:16Z

Hi @TheSweetestGirlInTheUniverse

i resolved the previous error by changing in train.py the value in this list:

train_sampler = DistributedBucketSampler(
        train_dataset,
        hps.train.batch_size,
        [1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800],
        num_replicas=n_gpus,
        rank=rank,
        shuffle=True,
    )

now the training starts in the right way and creates checkpoints.
But now i face into another error: when i try to run infer.py with one of the checkpoints generated by training, the output.wav generated produce only unintelligible sounds. I don't understand why.

Here it's mine config.json:

{
  "train": {
    "log_interval": 200,
    "eval_interval": 1000,
    "seed": 52,
    "epochs": 10000,
    "learning_rate": 0.0003,
    "betas": [
      0.8,
      0.99
    ],
    "eps": 1e-09,
    "batch_size": 6,
    "fp16_run": false,
    "lr_decay": 0.999875,
    "segment_size": 16384,
    "init_lr_ratio": 1,
    "warmup_epochs": 0,
    "c_mel": 45,
    "c_kl": 1.0,
    "skip_optimizer": true
  },
  "data": {
    "training_files": "data/example/train.list",
    "validation_files": "data/example/val.list",
    "max_wav_value": 32768.0,
    "sampling_rate": 44100,
    "filter_length": 2048,
    "hop_length": 512,
    "win_length": 2048,
    "n_mel_channels": 128,
    "mel_fmin": 0.0,
    "mel_fmax": null,
    "add_blank": true,
    "n_speakers": 1,
    "cleaned_text": true,
    "spk2id": {
      "Italian": 0
    }
  },
  "model": {
    "use_spk_conditioned_encoder": true,
    "use_noise_scaled_mas": true,
    "use_mel_posterior_encoder": false,
    "use_duration_discriminator": true,
    "inter_channels": 192,
    "hidden_channels": 192,
    "filter_channels": 768,
    "n_heads": 2,
    "n_layers": 6,
    "n_layers_trans_flow": 3,
    "kernel_size": 3,
    "p_dropout": 0.1,
    "resblock": "1",
    "resblock_kernel_sizes": [
      3,
      7,
      11
    ],
    "resblock_dilation_sizes": [
      [
        1,
        3,
        5
      ],
      [
        1,
        3,
        5
      ],
      [
        1,
        3,
        5
      ]
    ],
    "upsample_rates": [
      8,
      8,
      2,
      2,
      2
    ],
    "upsample_initial_channel": 512,
    "upsample_kernel_sizes": [
      16,
      16,
      8,
      2,
      2
    ],
    "n_layers_q": 3,
    "use_spectral_norm": false,
    "gin_channels": 256
  },
  "num_languages": 9,
  "num_tones": 17,
  "symbols": [
    "_",
    "\"",
    "(",
    ")",
    "*",
    "/",
    ":",
    "AA",
    "E",
    "EE",
    "En",
    "N",
    "OO",
    "Q",
    "V",
    "[",
    "\\",
    "]",
    "^",
    "a",
    "a:",
    "aa",
    "ae",
    "ah",
    "ai",
    "an",
    "ang",
    "ao",
    "aw",
    "ay",
    "b",
    "by",
    "c",
    "ch",
    "d",
    "dh",
    "dy",
    "e",
    "e:",
    "eh",
    "ei",
    "en",
    "eng",
    "er",
    "ey",
    "f",
    "g",
    "gy",
    "h",
    "hh",
    "hy",
    "i",
    "i0",
    "i:",
    "ia",
    "ian",
    "iang",
    "iao",
    "ie",
    "ih",
    "in",
    "ing",
    "iong",
    "ir",
    "iu",
    "iy",
    "j",
    "jh",
    "k",
    "ky",
    "l",
    "m",
    "my",
    "n",
    "ng",
    "ny",
    "o",
    "o:",
    "ong",
    "ou",
    "ow",
    "oy",
    "p",
    "py",
    "q",
    "r",
    "ry",
    "s",
    "sh",
    "t",
    "th",
    "ts",
    "ty",
    "u",
    "u:",
    "ua",
    "uai",
    "uan",
    "uang",
    "uh",
    "ui",
    "un",
    "uo",
    "uw",
    "v",
    "van",
    "ve",
    "vn",
    "w",
    "x",
    "y",
    "z",
    "zh",
    "zy",
    "~",
    "æ",
    "ç",
    "ð",
    "ø",
    "ŋ",
    "œ",
    "ɐ",
    "ɑ",
    "ɒ",
    "ɔ",
    "ɕ",
    "ə",
    "ɛ",
    "ɜ",
    "ɡ",
    "ɣ",
    "ɥ",
    "ɦ",
    "ɪ",
    "ɫ",
    "ɬ",
    "ɭ",
    "ɯ",
    "ɲ",
    "ɵ",
    "ɸ",
    "ɹ",
    "ɾ",
    "ʁ",
    "ʃ",
    "ʊ",
    "ʌ",
    "ʎ",
    "ʏ",
    "ʑ",
    "ʒ",
    "ʝ",
    "ʲ",
    "ˈ",
    "ˌ",
    "ː",
    "̃",
    "̩",
    "β",
    "θ",
    "ᄀ",
    "ᄁ",
    "ᄂ",
    "ᄃ",
    "ᄄ",
    "ᄅ",
    "ᄆ",
    "ᄇ",
    "ᄈ",
    "ᄉ",
    "ᄊ",
    "ᄋ",
    "ᄌ",
    "ᄍ",
    "ᄎ",
    "ᄏ",
    "ᄐ",
    "ᄑ",
    "ᄒ",
    "ᅡ",
    "ᅢ",
    "ᅣ",
    "ᅤ",
    "ᅥ",
    "ᅦ",
    "ᅧ",
    "ᅨ",
    "ᅩ",
    "ᅪ",
    "ᅫ",
    "ᅬ",
    "ᅭ",
    "ᅮ",
    "ᅯ",
    "ᅰ",
    "ᅱ",
    "ᅲ",
    "ᅳ",
    "ᅴ",
    "ᅵ",
    "ᆨ",
    "ᆫ",
    "ᆮ",
    "ᆯ",
    "ᆷ",
    "ᆸ",
    "ᆼ",
    "ㄸ",
    "!",
    "?",
    "…",
    ",",
    ".",
    "'",
    "-",
    "¿",
    "¡",
    "SP",
    "UNK",
    "dz",
    "dʒ",
    "tʃ",
    " ͡ "
  ]
}

that's one of mine train.list row

../../audio_1/Untitled_MIC_1_960.wav|Italian|IT|Ogni regione italiana ha le sue tradizioni culinarie|_ ˈ o ɲ ɲ i r e ˈ d ͡ ʒ o n e i t a ˈ l j a n a a l e ˈ s u e t r a d i ˈ t ͡ s j o n i k u l i ˈ n a r j e _|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|1 5 9 9 1 2 4 13 5 5 1

thanks in advance,

Diego

TheSweetestGirlInTheUniverse · 2025-03-12T11:05:37Z

@diegobernagozzi
Hi Diego:
I'm not sure where the issue is, but here are some ideas.
First, Check the language-related preprocessing, such as whether BERT features support Italian or try not using BERT features. Also, make sure to add Italian-related symbols in symbols.py.
When using infer.py, verify that the preprocessing is correct. Otherwise, even if training is correct, you may still get wrong results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with training #255

Problem with training #255

diegobernagozzi commented Mar 4, 2025

TheSweetestGirlInTheUniverse commented Mar 9, 2025

diegobernagozzi commented Mar 10, 2025

TheSweetestGirlInTheUniverse commented Mar 12, 2025 •

edited

Loading

Problem with training #255

Problem with training #255

Comments

diegobernagozzi commented Mar 4, 2025

TheSweetestGirlInTheUniverse commented Mar 9, 2025

diegobernagozzi commented Mar 10, 2025

TheSweetestGirlInTheUniverse commented Mar 12, 2025 • edited Loading

TheSweetestGirlInTheUniverse commented Mar 12, 2025 •

edited

Loading