Skip to content

Releases: DrewThomasson/ebook2audiobook

V25.2.18

19 Feb 03:32
cd48073
Compare
Choose a tag to compare

CHANGELOG

version 25.2.18:

  • version structure is now based on YEAR.MONTH.PATCH_NUMBER

  • Now no need to have admin privileges on Windows to install ebook2audiobook packages (replaced chocolatey by scoop)

  • added MPS processor

  • added custom models dropdown list

  • added voices dropdown list and play button to listen each of them

  • added voice extractor for upload voices (separate vocals from background and music)

  • added delete button for voices, custom models and audiobooks list

  • added builtin voices to the voices list and can be used for all TTS models

  • added "--output_dir" for custom output folder in headless mode

  • added directory options for ebook upload batch files in gradio/gui mode

  • added new output audio format ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'].
    More can be added on demand.

  • added running conversion cancellation via the ebook upload gradio component (when the "X" is clicked)

  • new global config settings:
    tmp_expire = for inactive session before cleanup, in days
    max_custom_model: max custom model on list (by session id)
    max_custom_voices: max custom voice on list (by session id)
    tts_default_settings: fine tuned XTTS default parameters
    (refer to ./lib/conf.py for all new configuration settings)

  • gradio GUI settings are now saved and restored on refresh and browser exit

  • resume conversion in headless and gradio GUI mode, when client page/connection lost or reloaded
    (however the user should restart the process manually with the same session id)

  • Math symbols and numbers to phonemes are now on all TTS engines
    (non covered languages are pronounced with the default_language_code set in ./lib/conf.py.
    PR are welcome to fix missing translations)

  • audio filtering, normalization and improvement of all upload voices and final audiobook
    to have the best sound presence and clarity.

  • fixed custom model upload

  • fixed missing pages in conversion

  • fixed modules and libraries missing during the installation (regex, mecab etc..)

  • various gradio design improvements

  • optimized multi language sentence splitting to minimize hallucinations and unnatural pauses

  • now numbers and maths symbols are said for fairseq and XTTSv2

  • the TTS model is now loaded once in the script and for all users using the same model

  • added coqui-tts built-in voices for all TTS engines and as standard in all languages

  • added new modal alerts for info, error, exception and warnings

  • removed docker_utils which was a docker with ffmpeg and calibre only

  • removed fine tuned parameters as it caused worse results than better

  • optimized sentences splitting

  • Many more fixes and new features, but don't remember all.... see by yourself ;)

Currently in development:

  • added Terminal output console to gradio/gui
  • implement more TTS engines (list not decided yet)
  • apprise notification
  • implement chapter summarizing to create background music and sounds
  • implement indices in the metadata for each sentence in the final file
    to eventually improve the pronounciation and replace it with the new sentence.
  • add built-in voice list of xttsv2
  • add czhech, croatian and others with cv/vits
  • add music interlude between chapters
  • adding chapters name (if chapters well detected) in place of number in the final metadata
  • split the output in multiple file if > 12hours # chapters as final
  • installation of the right torch and cuda version if GPU available so deepspeed can be used
  • automatic user crash bug report by email via a URL request
  • create a legends.py file for all gradio/gui legends to manage multilanguage
  • mark each sentence number in the metadata with the timecode so
    the user would be able to re*convert one sentence before to export the audiobook
    (it requires to not delete the ebook temp folder)
  • use "websocat" in "cmd.exe" and "bash/zsh" script to connect in headless mode via gradio and avoid tts load at each command

V2.0: Tons of improvements and support for 1,107+ languages! 🤯

25 Dec 08:14
d7aed38
Compare
Choose a tag to compare
  • New Improved v2.0 gui
  • Easy access to fine-tuned models
  • Loading bar actually works 🤯
  • Support for 1,107+ languages! 🤯
  • Single run/installer script Mac windows and linux locally
  • THANK YOU @ROBERT-MCDOWELL

What's Changed

New Contributors

Full Changelog: 1.2.1...2.0

V1.2.1

11 Oct 20:18
Compare
Choose a tag to compare

Fixed custom model loading issue.

What's Changed

New Contributors

Full Changelog: 1.2...1.2.1

V1.2

09 Oct 03:30
c68d44e
Compare
Choose a tag to compare

New and improved App

  • Single app that runs in gui or headless mode

  • Fixed Sentence splitting for all 16 languages

New and Improved Web GUI

demo_web_gui

Added these parameters for headless mode:

usage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE]
              [--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL]
              [--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG]
              [--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL]
              [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY]
              [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
              [--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING]

Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the
Gradio interface or run the script in headless mode for direct conversion.

options:
  -h, --help            show this help message and exit
  --share SHARE         Set to True to enable a public shareable Gradio link. Defaults
                        to False.
  --headless HEADLESS   Set to True to run in headless mode without the Gradio
                        interface. Defaults to False.
  --ebook EBOOK         Path to the ebook file for conversion. Required in headless
                        mode.
  --voice VOICE         Path to the target voice file for TTS. Optional, uses a default
                        voice if not provided.
  --language LANGUAGE   Language for the audiobook conversion. Options: en, es, fr, de,
                        it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to
                        English (en).
  --use_custom_model USE_CUSTOM_MODEL
                        Set to True to use a custom TTS model. Defaults to False. Must
                        be True to use custom models, otherwise you'll get an error.
  --custom_model CUSTOM_MODEL
                        Path to the custom model file (.pth). Required if using a custom
                        model.
  --custom_config CUSTOM_CONFIG
                        Path to the custom config file (config.json). Required if using
                        a custom model.
  --custom_vocab CUSTOM_VOCAB
                        Path to the custom vocab file (vocab.json). Required if using a
                        custom model.
  --custom_model_url CUSTOM_MODEL_URL
                        URL to download the custom model as a zip file. Optional, but
                        will be used if provided. Examples include David Attenborough's
                        model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor
                        ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr
                        ue'. More XTTS fine-tunes can be found on my Hugging Face at
                        'https://huggingface.co/drewThomasson'.
  --temperature TEMPERATURE
                        Temperature for the model. Defaults to 0.65. Higher Tempatures
                        will lead to more creative outputs IE: more Hallucinations.
                        Lower Tempatures will be more monotone outputs IE: less
                        Hallucinations.
  --length_penalty LENGTH_PENALTY
                        A length penalty applied to the autoregressive decoder. Defaults
                        to 1.0.
  --repetition_penalty REPETITION_PENALTY
                        A penalty that prevents the autoregressive decoder from
                        repeating itself. Defaults to 2.0.
  --top_k TOP_K         Top-k sampling. Lower values mean more likely outputs and
                        increased audio generation speed. Defaults to 50.
  --top_p TOP_P         Top-p sampling. Lower values mean more likely outputs and
                        increased audio generation speed. Defaults to 0.8.
  --speed SPEED         Speed factor for the speech generation. IE: How fast the
                        Narrerator will speak. Defaults to 1.0.
  --enable_text_splitting ENABLE_TEXT_SPLITTING
                        Enable splitting text into sentences. Defaults to True.

Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice
--language en --use_custom_model True --custom_model model.pth --custom_config
config.json --custom_vocab vocab.json

What's Changed

  • 1.wav missing - change to default_voice.wav by @matthiss in #12

New Contributors

Full Changelog: 1.1...1.2

V1.1

22 Feb 07:12
e6dd63f
Compare
Choose a tag to compare