Releases · DrewThomasson/ebook2audiobook

19 Feb 03:32

ROBERT-MCDOWELL

25.2.18

cd48073

V25.2.18 Latest

Latest

CHANGELOG

version 25.2.18:

version structure is now based on YEAR.MONTH.PATCH_NUMBER
Now no need to have admin privileges on Windows to install ebook2audiobook packages (replaced chocolatey by scoop)
added MPS processor
added custom models dropdown list
added voices dropdown list and play button to listen each of them
added voice extractor for upload voices (separate vocals from background and music)
added delete button for voices, custom models and audiobooks list
added builtin voices to the voices list and can be used for all TTS models
added "--output_dir" for custom output folder in headless mode
added directory options for ebook upload batch files in gradio/gui mode
added new output audio format ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'].
More can be added on demand.
added running conversion cancellation via the ebook upload gradio component (when the "X" is clicked)
new global config settings:
tmp_expire = for inactive session before cleanup, in days
max_custom_model: max custom model on list (by session id)
max_custom_voices: max custom voice on list (by session id)
tts_default_settings: fine tuned XTTS default parameters
(refer to ./lib/conf.py for all new configuration settings)
gradio GUI settings are now saved and restored on refresh and browser exit
resume conversion in headless and gradio GUI mode, when client page/connection lost or reloaded
(however the user should restart the process manually with the same session id)
Math symbols and numbers to phonemes are now on all TTS engines
(non covered languages are pronounced with the default_language_code set in ./lib/conf.py.
PR are welcome to fix missing translations)
audio filtering, normalization and improvement of all upload voices and final audiobook
to have the best sound presence and clarity.
fixed custom model upload
fixed missing pages in conversion
fixed modules and libraries missing during the installation (regex, mecab etc..)
various gradio design improvements
optimized multi language sentence splitting to minimize hallucinations and unnatural pauses
now numbers and maths symbols are said for fairseq and XTTSv2
the TTS model is now loaded once in the script and for all users using the same model
added coqui-tts built-in voices for all TTS engines and as standard in all languages
added new modal alerts for info, error, exception and warnings
removed docker_utils which was a docker with ffmpeg and calibre only
removed fine tuned parameters as it caused worse results than better
optimized sentences splitting
Many more fixes and new features, but don't remember all.... see by yourself ;)

Currently in development:

added Terminal output console to gradio/gui
implement more TTS engines (list not decided yet)
apprise notification
implement chapter summarizing to create background music and sounds
implement indices in the metadata for each sentence in the final file
to eventually improve the pronounciation and replace it with the new sentence.
add built-in voice list of xttsv2
add czhech, croatian and others with cv/vits
add music interlude between chapters
adding chapters name (if chapters well detected) in place of number in the final metadata
split the output in multiple file if > 12hours # chapters as final
installation of the right torch and cuda version if GPU available so deepspeed can be used
automatic user crash bug report by email via a URL request
create a legends.py file for all gradio/gui legends to manage multilanguage
mark each sentence number in the metadata with the timecode so
the user would be able to re*convert one sentence before to export the audiobook
(it requires to not delete the ebook temp folder)
use "websocat" in "cmd.exe" and "bash/zsh" script to connect in headless mode via gradio and avoid tts load at each command

Assets 2

25 Dec 08:14

DrewThomasson

2.0

d7aed38

V2.0: Tons of improvements and support for 1,107+ languages! 🤯

New Improved v2.0 gui
Easy access to fine-tuned models
Loading bar actually works 🤯
Support for 1,107+ languages! 🤯
Single run/installer script Mac windows and linux locally
THANK YOU @ROBERT-MCDOWELL

What's Changed

Major update version 2.0.0 by @ROBERT-MCDOWELL in #35
pull attempty into v2.0 by @DrewThomasson in #43
swapped download_xttsv2_model with existing download_and_extract by @DrewThomasson in #44
PR#1 by @ROBERT-MCDOWELL in #45
V2.0 base model downloader patch by @DrewThomasson in #46
Update README.md by @ROBERT-MCDOWELL in #48
Update colab_ebook2audiobookxtts.ipynb by @pafend in #49
renamed split_long_sentence to get_sentences, added punctuation for each language into language_mapping, unused code removed by @ROBERT-MCDOWELL in #50
Added 1162 languages, removed unused code by @ROBERT-MCDOWELL in #51
Last PR before merge to main by @ROBERT-MCDOWELL in #57
Added around 57 more test_ebooks as well as a ebook generator by @DrewThomasson in #58
Add disclaimer to README about DRM and legal use by @DrewThomasson in #61
delete .DS_Store files by @DrewThomasson in #62
added Fairseq supported language list to english readme by @DrewThomasson in #63
Major commit by @ROBERT-MCDOWELL in #64
more fixes by @ROBERT-MCDOWELL in #65
rebuild test files, new tools folder, various typo fixes by @ROBERT-MCDOWELL in #66
Rebuild voices folder tree by @ROBERT-MCDOWELL in #67
regenerate test files, various bug fixes, new resume process implementation by @ROBERT-MCDOWELL in #70
double quotes to simple quotes normalization by @ROBERT-MCDOWELL in #71
various fixes by @ROBERT-MCDOWELL in #73
more fixes by @ROBERT-MCDOWELL in #74
Multiprocessing, multithread, multiuser ready, various changes and fixes by @ROBERT-MCDOWELL in #75
fix model loading, removed unused conf model options by @ROBERT-MCDOWELL in #77
Minor spelling and punctuation corrections by @Bynanaa in #79
fix chapter audio and settences order, device gpu to cuda by @ROBERT-MCDOWELL in #81
added voice actor voices by @DrewThomasson in #80
implementation of fine-tuned and various fixes by @ROBERT-MCDOWELL in #82
fixed fine-tuned dropdown by @ROBERT-MCDOWELL in #83
fixed chapter combien audio bug. fine-tuned model cache still to solve by @ROBERT-MCDOWELL in #84
Added discord server link to readme by @DrewThomasson in #86
fixed fine-tuned, custom modal upload back and working(?) by @ROBERT-MCDOWELL in #87
added ref.wav in custom model upload by @ROBERT-MCDOWELL in #88
custom_model should work now by @ROBERT-MCDOWELL in #89
v2.0 updated gui demo gif by @DrewThomasson in #90
various fixes by @ROBERT-MCDOWELL in #91
added convert to gif script to make it easier to create a new gif for readme by @DrewThomasson in #92
V2.0 update readme improved and added assets folder by @DrewThomasson in #93
added new options in conf.py, custom_model still in dev by @ROBERT-MCDOWELL in #94
updated custom_model now managed by session, fixed various bugs by @ROBERT-MCDOWELL in #97
fix conf inversion, added BobRoss in fine-tuned by @ROBERT-MCDOWELL in #98
added fine tuned models in conf.py, renamed some by @ROBERT-MCDOWELL in #99
optimize audio ffmpeg cmd by @ROBERT-MCDOWELL in #100
Various fixes by @ROBERT-MCDOWELL in #101
fixed crlf to lf unix on app.py, various other important fixes by @ROBERT-MCDOWELL in #104
removed old code by @ROBERT-MCDOWELL in #105
V2.0 update readme and Mac launcher by @DrewThomasson in #106
fix convert_btn and more... by @ROBERT-MCDOWELL in #108
fixed f-string errors by @ROBERT-MCDOWELL in #109
optimizing audio presence by @ROBERT-MCDOWELL in #110
varioux fixes by @ROBERT-MCDOWELL in #112
V2.0: Tons of improvements and support for 1,107+ languages! 🤯 by @DrewThomasson in #111

New Contributors

@pafend made their first contribution in #49
@Bynanaa made their first contribution in #79

Full Changelog: 1.2.1...2.0

Contributors

ROBERT-MCDOWELL, Bynanaa, and 2 other contributors

Assets 2

0 Join discussion

11 Oct 20:18

DrewThomasson

1.2.1

9cb2f33

V1.2.1

Fixed custom model loading issue.

What's Changed

chinese readme by @WUYIN66 in #25
Installation with pip in edit mode by @ROBERT-MCDOWELL in #26
Revert "Installation with pip in edit mode" by @DrewThomasson in #27
Merge new Kaggel additions by @DrewThomasson in #29

New Contributors

@WUYIN66 made their first contribution in #25
@ROBERT-MCDOWELL made their first contribution in #26
@DrewThomasson made their first contribution in #27

Full Changelog: 1.2...1.2.1

Contributors

ROBERT-MCDOWELL, WUYIN66, and DrewThomasson

Assets 3

0 Join discussion

09 Oct 03:30

DrewThomasson

1.2

c68d44e

V1.2

New and improved App

Single app that runs in gui or headless mode
Fixed Sentence splitting for all 16 languages

New and Improved Web GUI

Added these parameters for headless mode:

usage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE]
              [--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL]
              [--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG]
              [--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL]
              [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY]
              [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
              [--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING]

Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the
Gradio interface or run the script in headless mode for direct conversion.

options:
  -h, --help            show this help message and exit
  --share SHARE         Set to True to enable a public shareable Gradio link. Defaults
                        to False.
  --headless HEADLESS   Set to True to run in headless mode without the Gradio
                        interface. Defaults to False.
  --ebook EBOOK         Path to the ebook file for conversion. Required in headless
                        mode.
  --voice VOICE         Path to the target voice file for TTS. Optional, uses a default
                        voice if not provided.
  --language LANGUAGE   Language for the audiobook conversion. Options: en, es, fr, de,
                        it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to
                        English (en).
  --use_custom_model USE_CUSTOM_MODEL
                        Set to True to use a custom TTS model. Defaults to False. Must
                        be True to use custom models, otherwise you'll get an error.
  --custom_model CUSTOM_MODEL
                        Path to the custom model file (.pth). Required if using a custom
                        model.
  --custom_config CUSTOM_CONFIG
                        Path to the custom config file (config.json). Required if using
                        a custom model.
  --custom_vocab CUSTOM_VOCAB
                        Path to the custom vocab file (vocab.json). Required if using a
                        custom model.
  --custom_model_url CUSTOM_MODEL_URL
                        URL to download the custom model as a zip file. Optional, but
                        will be used if provided. Examples include David Attenborough's
                        model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor
                        ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr
                        ue'. More XTTS fine-tunes can be found on my Hugging Face at
                        'https://huggingface.co/drewThomasson'.
  --temperature TEMPERATURE
                        Temperature for the model. Defaults to 0.65. Higher Tempatures
                        will lead to more creative outputs IE: more Hallucinations.
                        Lower Tempatures will be more monotone outputs IE: less
                        Hallucinations.
  --length_penalty LENGTH_PENALTY
                        A length penalty applied to the autoregressive decoder. Defaults
                        to 1.0.
  --repetition_penalty REPETITION_PENALTY
                        A penalty that prevents the autoregressive decoder from
                        repeating itself. Defaults to 2.0.
  --top_k TOP_K         Top-k sampling. Lower values mean more likely outputs and
                        increased audio generation speed. Defaults to 50.
  --top_p TOP_P         Top-p sampling. Lower values mean more likely outputs and
                        increased audio generation speed. Defaults to 0.8.
  --speed SPEED         Speed factor for the speech generation. IE: How fast the
                        Narrerator will speak. Defaults to 1.0.
  --enable_text_splitting ENABLE_TEXT_SPLITTING
                        Enable splitting text into sentences. Defaults to True.

Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice
--language en --use_custom_model True --custom_model model.pth --custom_config
config.json --custom_vocab vocab.json

What's Changed

1.wav missing - change to default_voice.wav by @matthiss in #12

New Contributors

@matthiss made their first contribution in #12

Full Changelog: 1.1...1.2

Contributors

matthiss

Assets 2

0 Join discussion

22 Feb 07:12

DrewThomasson

1.1

e6dd63f

V1.1

Full Changelog: https://github.com/DrewThomasson/ebook2audiobookXTTS/commits/1.1

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Currently in development:

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

New and improved App

New and Improved Web GUI

Added these parameters for headless mode:

What's Changed

New Contributors

Contributors

Releases: DrewThomasson/ebook2audiobook

V25.2.18

Currently in development:

V2.0: Tons of improvements and support for 1,107+ languages! 🤯

What's Changed

New Contributors

Contributors

V1.2.1

What's Changed

New Contributors

Contributors

V1.2

New and improved App

New and Improved Web GUI

Added these parameters for headless mode:

What's Changed

New Contributors

Contributors

V1.1