Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with training #255

Open
diegobernagozzi opened this issue Mar 4, 2025 · 3 comments
Open

Problem with training #255

diegobernagozzi opened this issue Mar 4, 2025 · 3 comments

Comments

@diegobernagozzi
Copy link

Hi everyone,

i'm trying to train melotts on italian language, but i don't understand why training does not work. When i run this command:

python3 preprocess_text.py --metadata data/example/metadata.list

everything works fine, even after my italian language modification, but when i run this command:

bash train.sh data/example/config.json 1

it seems the command never stop running, and the training always stay to 0:

0it [00:00, ?it/s]

Train.log says this:

25-03-04 16:25:18,787	example	INFO	{'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 52, 'epochs': 10000, 'learning_rate': 0.0003, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 6, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 16384, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'skip_optimizer': True}, 'data': {'training_files': 'data/example/train.list', 'validation_files': 'data/example/val.list', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1, 'cleaned_text': True, 'spk2id': {'Italian': 0}}, 'model': {'use_spk_conditioned_encoder': True, 'use_noise_scaled_mas': True, 'use_mel_posterior_encoder': False, 'use_duration_discriminator': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'n_layers_trans_flow': 3, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'num_languages': 9, 'num_tones': 17, 'symbols': ['_', '"', '(', ')', '*', '/', ':', 'AA', 'E', 'EE', 'En', 'N', 'OO', 'Q', 'V', '[', '\\', ']', '^', 'a', 'a:', 'aa', 'ae', 'ah', 'ai', 'an', 'ang', 'ao', 'aw', 'ay', 'b', 'by', 'c', 'ch', 'd', 'dh', 'dy', 'e', 'e:', 'eh', 'ei', 'en', 'eng', 'er', 'ey', 'f', 'g', 'gy', 'h', 'hh', 'hy', 'i', 'i0', 'i:', 'ia', 'ian', 'iang', 'iao', 'ie', 'ih', 'in', 'ing', 'iong', 'ir', 'iu', 'iy', 'j', 'jh', 'k', 'ky', 'l', 'm', 'my', 'n', 'ng', 'ny', 'o', 'o:', 'ong', 'ou', 'ow', 'oy', 'p', 'py', 'q', 'r', 'ry', 's', 'sh', 't', 'th', 'ts', 'ty', 'u', 'u:', 'ua', 'uai', 'uan', 'uang', 'uh', 'ui', 'un', 'uo', 'uw', 'v', 'van', 've', 'vn', 'w', 'x', 'y', 'z', 'zh', 'zy', '~', 'æ', 'ç', 'ð', 'ø', 'ŋ', 'œ', 'ɐ', 'ɑ', 'ɒ', 'ɔ', 'ɕ', 'ə', 'ɛ', 'ɜ', 'ɡ', 'ɣ', 'ɥ', 'ɦ', 'ɪ', 'ɫ', 'ɬ', 'ɭ', 'ɯ', 'ɲ', 'ɵ', 'ɸ', 'ɹ', 'ɾ', 'ʁ', 'ʃ', 'ʊ', 'ʌ', 'ʎ', 'ʏ', 'ʑ', 'ʒ', 'ʝ', 'ʲ', 'ˈ', 'ˌ', 'ː', '̃', '̩', 'β', 'θ', 'ᄀ', 'ᄁ', 'ᄂ', 'ᄃ', 'ᄄ', 'ᄅ', 'ᄆ', 'ᄇ', 'ᄈ', 'ᄉ', 'ᄊ', 'ᄋ', 'ᄌ', 'ᄍ', 'ᄎ', 'ᄏ', 'ᄐ', 'ᄑ', 'ᄒ', 'ᅡ', 'ᅢ', 'ᅣ', 'ᅤ', 'ᅥ', 'ᅦ', 'ᅧ', 'ᅨ', 'ᅩ', 'ᅪ', 'ᅫ', 'ᅬ', 'ᅭ', 'ᅮ', 'ᅯ', 'ᅰ', 'ᅱ', 'ᅲ', 'ᅳ', 'ᅴ', 'ᅵ', 'ᆨ', 'ᆫ', 'ᆮ', 'ᆯ', 'ᆷ', 'ᆸ', 'ᆼ', 'ㄸ', '!', '?', '…', ',', '.', "'", '-', '¿', '¡', 'SP', 'UNK', 'ɛ', 'ɔ', 'dz', 'dʒ', 'ʎ', 'ɲ', 'ŋ', 'ʃ', 'ts', 'tʃ', ' ͡ '], 'model_dir': './logs/example', 'pretrain_G': None, 'pretrain_D': None, 'pretrain_dur': None, 'port': 10000}
2025-03-04 16:25:18,788	example	WARNING	/home/ecuser/MeloTTS/melo is not a git repository, therefore hash value comparison will be ignored.
2025-03-04 16:25:21,891	example	ERROR	enc_p.emb.weight is not in the checkpoint
2025-03-04 16:25:21,891	example	ERROR	enc_p.tone_emb.weight is not in the checkpoint
2025-03-04 16:25:21,891	example	ERROR	enc_p.language_emb.weight is not in the checkpoint
2025-03-04 16:25:21,892	example	ERROR	emb_g.weight is not in the checkpoint
2025-03-04 16:25:21,942	example	INFO	Loaded checkpoint '/home/ecuser/.cache/cached_path/73ad3d5a37c82356ed81630b0a435b4b376ca49523854fe2b8302609fd71c193.133b77b9d9162e348486a0a0778fa47d726930e3ec12ea5e2684c0c919743a65' (iteration 0)
2025-03-04 16:25:22,032	example	INFO	Loaded checkpoint '/home/ecuser/.cache/cached_path/c3d3c787a8711093a79ee95f091a35de75e527b6e8e28424ad7010f6e86cce58.e5f88bb1eca17c37beb511b15a932e84fdc8b66d8a8d5c5075334650425954f2' (iteration 0)
2025-03-04 16:25:22,040	example	INFO	Loaded checkpoint '/home/ecuser/.cache/cached_path/c7b373ab8939eb672a985a802d21420534ca0cd43fa4aecf4fa6088a569ee2a1.ce7a8153914d9727ebc28e4b4e3d31eed35aa0b4e3d125eb54e6f8363968dd7a' (iteration 0)
2025-03-04 16:25:22,239	example	INFO	====> Epoch: 1
2025-03-04 16:25:47,138	example	INFO	====> Epoch: 2
2025-03-04 16:25:47,143	example	INFO	====> Epoch: 3
2025-03-04 16:25:47,147	example	INFO	====> Epoch: 4
2025-03-04 16:25:47,152	example	INFO	====> Epoch: 5
2025-03-04 16:25:47,155	example	INFO	====> Epoch: 6
2025-03-04 16:25:47,160	example	INFO	====> Epoch: 7
2025-03-04 16:25:47,164	example	INFO	====> Epoch: 8
2025-03-04 16:25:47,168	example	INFO	====> Epoch: 9
2025-03-04 16:25:47,172	example	INFO	====> Epoch: 10

Any suggestions?

@TheSweetestGirlInTheUniverse

I noticed that your eval_interval is set to 1000, which means the model will be saved every 1000 batches.

@diegobernagozzi
Copy link
Author

Hi @TheSweetestGirlInTheUniverse

i resolved the previous error by changing in train.py the value in this list:

train_sampler = DistributedBucketSampler(
        train_dataset,
        hps.train.batch_size,
        [1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800],
        num_replicas=n_gpus,
        rank=rank,
        shuffle=True,
    )

now the training starts in the right way and creates checkpoints.
But now i face into another error: when i try to run infer.py with one of the checkpoints generated by training, the output.wav generated produce only unintelligible sounds. I don't understand why.

Here it's mine config.json:

{
  "train": {
    "log_interval": 200,
    "eval_interval": 1000,
    "seed": 52,
    "epochs": 10000,
    "learning_rate": 0.0003,
    "betas": [
      0.8,
      0.99
    ],
    "eps": 1e-09,
    "batch_size": 6,
    "fp16_run": false,
    "lr_decay": 0.999875,
    "segment_size": 16384,
    "init_lr_ratio": 1,
    "warmup_epochs": 0,
    "c_mel": 45,
    "c_kl": 1.0,
    "skip_optimizer": true
  },
  "data": {
    "training_files": "data/example/train.list",
    "validation_files": "data/example/val.list",
    "max_wav_value": 32768.0,
    "sampling_rate": 44100,
    "filter_length": 2048,
    "hop_length": 512,
    "win_length": 2048,
    "n_mel_channels": 128,
    "mel_fmin": 0.0,
    "mel_fmax": null,
    "add_blank": true,
    "n_speakers": 1,
    "cleaned_text": true,
    "spk2id": {
      "Italian": 0
    }
  },
  "model": {
    "use_spk_conditioned_encoder": true,
    "use_noise_scaled_mas": true,
    "use_mel_posterior_encoder": false,
    "use_duration_discriminator": true,
    "inter_channels": 192,
    "hidden_channels": 192,
    "filter_channels": 768,
    "n_heads": 2,
    "n_layers": 6,
    "n_layers_trans_flow": 3,
    "kernel_size": 3,
    "p_dropout": 0.1,
    "resblock": "1",
    "resblock_kernel_sizes": [
      3,
      7,
      11
    ],
    "resblock_dilation_sizes": [
      [
        1,
        3,
        5
      ],
      [
        1,
        3,
        5
      ],
      [
        1,
        3,
        5
      ]
    ],
    "upsample_rates": [
      8,
      8,
      2,
      2,
      2
    ],
    "upsample_initial_channel": 512,
    "upsample_kernel_sizes": [
      16,
      16,
      8,
      2,
      2
    ],
    "n_layers_q": 3,
    "use_spectral_norm": false,
    "gin_channels": 256
  },
  "num_languages": 9,
  "num_tones": 17,
  "symbols": [
    "_",
    "\"",
    "(",
    ")",
    "*",
    "/",
    ":",
    "AA",
    "E",
    "EE",
    "En",
    "N",
    "OO",
    "Q",
    "V",
    "[",
    "\\",
    "]",
    "^",
    "a",
    "a:",
    "aa",
    "ae",
    "ah",
    "ai",
    "an",
    "ang",
    "ao",
    "aw",
    "ay",
    "b",
    "by",
    "c",
    "ch",
    "d",
    "dh",
    "dy",
    "e",
    "e:",
    "eh",
    "ei",
    "en",
    "eng",
    "er",
    "ey",
    "f",
    "g",
    "gy",
    "h",
    "hh",
    "hy",
    "i",
    "i0",
    "i:",
    "ia",
    "ian",
    "iang",
    "iao",
    "ie",
    "ih",
    "in",
    "ing",
    "iong",
    "ir",
    "iu",
    "iy",
    "j",
    "jh",
    "k",
    "ky",
    "l",
    "m",
    "my",
    "n",
    "ng",
    "ny",
    "o",
    "o:",
    "ong",
    "ou",
    "ow",
    "oy",
    "p",
    "py",
    "q",
    "r",
    "ry",
    "s",
    "sh",
    "t",
    "th",
    "ts",
    "ty",
    "u",
    "u:",
    "ua",
    "uai",
    "uan",
    "uang",
    "uh",
    "ui",
    "un",
    "uo",
    "uw",
    "v",
    "van",
    "ve",
    "vn",
    "w",
    "x",
    "y",
    "z",
    "zh",
    "zy",
    "~",
    "æ",
    "ç",
    "ð",
    "ø",
    "ŋ",
    "œ",
    "ɐ",
    "ɑ",
    "ɒ",
    "ɔ",
    "ɕ",
    "ə",
    "ɛ",
    "ɜ",
    "ɡ",
    "ɣ",
    "ɥ",
    "ɦ",
    "ɪ",
    "ɫ",
    "ɬ",
    "ɭ",
    "ɯ",
    "ɲ",
    "ɵ",
    "ɸ",
    "ɹ",
    "ɾ",
    "ʁ",
    "ʃ",
    "ʊ",
    "ʌ",
    "ʎ",
    "ʏ",
    "ʑ",
    "ʒ",
    "ʝ",
    "ʲ",
    "ˈ",
    "ˌ",
    "ː",
    "̃",
    "̩",
    "β",
    "θ",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "!",
    "?",
    "",
    ",",
    ".",
    "'",
    "-",
    "¿",
    "¡",
    "SP",
    "UNK",
    "dz",
    "",
    "",
    " ͡ "
  ]
}

that's one of mine train.list row

../../audio_1/Untitled_MIC_1_960.wav|Italian|IT|Ogni regione italiana ha le sue tradizioni culinarie|_ ˈ o ɲ ɲ i r e ˈ d ͡ ʒ o n e i t a ˈ l j a n a a l e ˈ s u e t r a d i ˈ t ͡ s j o n i k u l i ˈ n a r j e _|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|1 5 9 9 1 2 4 13 5 5 1

thanks in advance,

Diego

@TheSweetestGirlInTheUniverse
Copy link

TheSweetestGirlInTheUniverse commented Mar 12, 2025

@diegobernagozzi
Hi Diego:
I'm not sure where the issue is, but here are some ideas.
First, Check the language-related preprocessing, such as whether BERT features support Italian or try not using BERT features. Also, make sure to add Italian-related symbols in symbols.py.
When using infer.py, verify that the preprocessing is correct. Otherwise, even if training is correct, you may still get wrong results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants