Releases: DrewThomasson/ebook2audiobook
version 25.2.18:
version structure is now based on YEAR.MONTH.PATCH_NUMBER
Now no need to have admin privileges on Windows to install ebook2audiobook packages (replaced chocolatey by scoop)
added MPS processor
added custom models dropdown list
added voices dropdown list and play button to listen each of them
added voice extractor for upload voices (separate vocals from background and music)
added delete button for voices, custom models and audiobooks list
added builtin voices to the voices list and can be used for all TTS models
added "--output_dir" for custom output folder in headless mode
added directory options for ebook upload batch files in gradio/gui mode
added new output audio format ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'].
More can be added on demand. -
added running conversion cancellation via the ebook upload gradio component (when the "X" is clicked)
new global config settings:
tmp_expire = for inactive session before cleanup, in days
max_custom_model: max custom model on list (by session id)
max_custom_voices: max custom voice on list (by session id)
tts_default_settings: fine tuned XTTS default parameters
(refer to ./lib/ for all new configuration settings) -
gradio GUI settings are now saved and restored on refresh and browser exit
resume conversion in headless and gradio GUI mode, when client page/connection lost or reloaded
(however the user should restart the process manually with the same session id) -
Math symbols and numbers to phonemes are now on all TTS engines
(non covered languages are pronounced with the default_language_code set in ./lib/
PR are welcome to fix missing translations) -
audio filtering, normalization and improvement of all upload voices and final audiobook
to have the best sound presence and clarity. -
fixed custom model upload
fixed missing pages in conversion
fixed modules and libraries missing during the installation (regex, mecab etc..)
various gradio design improvements
optimized multi language sentence splitting to minimize hallucinations and unnatural pauses
now numbers and maths symbols are said for fairseq and XTTSv2
the TTS model is now loaded once in the script and for all users using the same model
added coqui-tts built-in voices for all TTS engines and as standard in all languages
added new modal alerts for info, error, exception and warnings
removed docker_utils which was a docker with ffmpeg and calibre only
removed fine tuned parameters as it caused worse results than better
optimized sentences splitting
Many more fixes and new features, but don't remember all.... see by yourself ;)
Currently in development:
- added Terminal output console to gradio/gui
- implement more TTS engines (list not decided yet)
- apprise notification
- implement chapter summarizing to create background music and sounds
- implement indices in the metadata for each sentence in the final file
to eventually improve the pronounciation and replace it with the new sentence. - add built-in voice list of xttsv2
- add czhech, croatian and others with cv/vits
- add music interlude between chapters
- adding chapters name (if chapters well detected) in place of number in the final metadata
- split the output in multiple file if > 12hours # chapters as final
- installation of the right torch and cuda version if GPU available so deepspeed can be used
- automatic user crash bug report by email via a URL request
- create a file for all gradio/gui legends to manage multilanguage
- mark each sentence number in the metadata with the timecode so
the user would be able to re*convert one sentence before to export the audiobook
(it requires to not delete the ebook temp folder) - use "websocat" in "cmd.exe" and "bash/zsh" script to connect in headless mode via gradio and avoid tts load at each command
V2.0: Tons of improvements and support for 1,107+ languages! 🤯
- New Improved v2.0 gui
- Easy access to fine-tuned models
- Loading bar actually works 🤯
- Support for 1,107+ languages! 🤯
- Single run/installer script Mac windows and linux locally
What's Changed
- Major update version 2.0.0 by @ROBERT-MCDOWELL in #35
- pull attempty into v2.0 by @DrewThomasson in #43
- swapped download_xttsv2_model with existing download_and_extract by @DrewThomasson in #44
- PR#1 by @ROBERT-MCDOWELL in #45
- V2.0 base model downloader patch by @DrewThomasson in #46
- Update by @ROBERT-MCDOWELL in #48
- Update colab_ebook2audiobookxtts.ipynb by @pafend in #49
- renamed split_long_sentence to get_sentences, added punctuation for each language into language_mapping, unused code removed by @ROBERT-MCDOWELL in #50
- Added 1162 languages, removed unused code by @ROBERT-MCDOWELL in #51
- Last PR before merge to main by @ROBERT-MCDOWELL in #57
- Added around 57 more test_ebooks as well as a ebook generator by @DrewThomasson in #58
- Add disclaimer to README about DRM and legal use by @DrewThomasson in #61
- delete .DS_Store files by @DrewThomasson in #62
- added Fairseq supported language list to english readme by @DrewThomasson in #63
- Major commit by @ROBERT-MCDOWELL in #64
- more fixes by @ROBERT-MCDOWELL in #65
- rebuild test files, new tools folder, various typo fixes by @ROBERT-MCDOWELL in #66
- Rebuild voices folder tree by @ROBERT-MCDOWELL in #67
- regenerate test files, various bug fixes, new resume process implementation by @ROBERT-MCDOWELL in #70
- double quotes to simple quotes normalization by @ROBERT-MCDOWELL in #71
- various fixes by @ROBERT-MCDOWELL in #73
- more fixes by @ROBERT-MCDOWELL in #74
- Multiprocessing, multithread, multiuser ready, various changes and fixes by @ROBERT-MCDOWELL in #75
- fix model loading, removed unused conf model options by @ROBERT-MCDOWELL in #77
- Minor spelling and punctuation corrections by @Bynanaa in #79
- fix chapter audio and settences order, device gpu to cuda by @ROBERT-MCDOWELL in #81
- added voice actor voices by @DrewThomasson in #80
- implementation of fine-tuned and various fixes by @ROBERT-MCDOWELL in #82
- fixed fine-tuned dropdown by @ROBERT-MCDOWELL in #83
- fixed chapter combien audio bug. fine-tuned model cache still to solve by @ROBERT-MCDOWELL in #84
- Added discord server link to readme by @DrewThomasson in #86
- fixed fine-tuned, custom modal upload back and working(?) by @ROBERT-MCDOWELL in #87
- added ref.wav in custom model upload by @ROBERT-MCDOWELL in #88
- custom_model should work now by @ROBERT-MCDOWELL in #89
- v2.0 updated gui demo gif by @DrewThomasson in #90
- various fixes by @ROBERT-MCDOWELL in #91
- added convert to gif script to make it easier to create a new gif for readme by @DrewThomasson in #92
- V2.0 update readme improved and added assets folder by @DrewThomasson in #93
- added new options in, custom_model still in dev by @ROBERT-MCDOWELL in #94
- updated custom_model now managed by session, fixed various bugs by @ROBERT-MCDOWELL in #97
- fix conf inversion, added BobRoss in fine-tuned by @ROBERT-MCDOWELL in #98
- added fine tuned models in, renamed some by @ROBERT-MCDOWELL in #99
- optimize audio ffmpeg cmd by @ROBERT-MCDOWELL in #100
- Various fixes by @ROBERT-MCDOWELL in #101
- fixed crlf to lf unix on, various other important fixes by @ROBERT-MCDOWELL in #104
- removed old code by @ROBERT-MCDOWELL in #105
- V2.0 update readme and Mac launcher by @DrewThomasson in #106
- fix convert_btn and more... by @ROBERT-MCDOWELL in #108
- fixed f-string errors by @ROBERT-MCDOWELL in #109
- optimizing audio presence by @ROBERT-MCDOWELL in #110
- varioux fixes by @ROBERT-MCDOWELL in #112
- V2.0: Tons of improvements and support for 1,107+ languages! 🤯 by @DrewThomasson in #111
New Contributors
Full Changelog: 1.2.1...2.0
Fixed custom model loading issue.
What's Changed
- chinese readme by @WUYIN66 in #25
- Installation with pip in edit mode by @ROBERT-MCDOWELL in #26
- Revert "Installation with pip in edit mode" by @DrewThomasson in #27
- Merge new Kaggel additions by @DrewThomasson in #29
New Contributors
- @WUYIN66 made their first contribution in #25
- @ROBERT-MCDOWELL made their first contribution in #26
- @DrewThomasson made their first contribution in #27
Full Changelog: 1.2...1.2.1
New and improved App
Single app that runs in gui or headless mode
Fixed Sentence splitting for all 16 languages
New and Improved Web GUI
Added these parameters for headless mode:
usage: [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE]
[--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL]
[--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG]
[--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL]
[--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY]
[--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
[--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING]
Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the
Gradio interface or run the script in headless mode for direct conversion.
-h, --help show this help message and exit
--share SHARE Set to True to enable a public shareable Gradio link. Defaults
to False.
--headless HEADLESS Set to True to run in headless mode without the Gradio
interface. Defaults to False.
--ebook EBOOK Path to the ebook file for conversion. Required in headless
--voice VOICE Path to the target voice file for TTS. Optional, uses a default
voice if not provided.
--language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de,
it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to
English (en).
--use_custom_model USE_CUSTOM_MODEL
Set to True to use a custom TTS model. Defaults to False. Must
be True to use custom models, otherwise you'll get an error.
--custom_model CUSTOM_MODEL
Path to the custom model file (.pth). Required if using a custom
--custom_config CUSTOM_CONFIG
Path to the custom config file (config.json). Required if using
a custom model.
--custom_vocab CUSTOM_VOCAB
Path to the custom vocab file (vocab.json). Required if using a
custom model.
--custom_model_url CUSTOM_MODEL_URL
URL to download the custom model as a zip file. Optional, but
will be used if provided. Examples include David Attenborough's
model: '
ue'. More XTTS fine-tunes can be found on my Hugging Face at
--temperature TEMPERATURE
Temperature for the model. Defaults to 0.65. Higher Tempatures
will lead to more creative outputs IE: more Hallucinations.
Lower Tempatures will be more monotone outputs IE: less
--length_penalty LENGTH_PENALTY
A length penalty applied to the autoregressive decoder. Defaults
to 1.0.
--repetition_penalty REPETITION_PENALTY
A penalty that prevents the autoregressive decoder from
repeating itself. Defaults to 2.0.
--top_k TOP_K Top-k sampling. Lower values mean more likely outputs and
increased audio generation speed. Defaults to 50.
--top_p TOP_P Top-p sampling. Lower values mean more likely outputs and
increased audio generation speed. Defaults to 0.8.
--speed SPEED Speed factor for the speech generation. IE: How fast the
Narrerator will speak. Defaults to 1.0.
--enable_text_splitting ENABLE_TEXT_SPLITTING
Enable splitting text into sentences. Defaults to True.
Example: python --headless --ebook path_to_ebook --voice path_to_voice
--language en --use_custom_model True --custom_model model.pth --custom_config
config.json --custom_vocab vocab.json
What's Changed
New Contributors
Full Changelog: 1.1...1.2
Full Changelog: