3.0.0 (2021-05-23) - INCOMPATIBLE CHANGE: Dropped support for Python 3.4 and below.
- INCOMPATIBLE CHANGE: Stop words are case insensitive.
- INCOMPATIBLE CHANGE: Dropped support for Python 3.2
- BUG FIX: Preserve new lines from original text in paragraphs.
- BUG FIX: Function
decode_htmlnow respects parametererrorswhen falling todefault_encoding#9.
- FEATURE: Added XPath selector to the paragrahs. XPath selector is also available in detailed output as
xpathattribute of<p>tag #5.
- FEATURE: Added pluggable DOM preprocessor.
- FEATURE: Added support for Python 3.2+.
- INCOMPATIBLE CHANGE: Paragraphs are instances of
justext.paragraph.Paragraph. - INCOMPATIBLE CHANGE: Script 'justext' removed in favour of
command
python -m justext. - FEATURE: It's possible to enter an URI as input document in CLI.
- FEATURE: It is possible to pass unicode string directly.
- FEATURE: Character counts used instead of word counts where possible in order to make the algorithm work well in the language independent mode (without a stoplist) for languages where counting words is not easy (Japanese, Chinese, Thai, etc).
- BUG FIX: More robust parsing of meta tags containing the information about used charset.
- BUG FIX: Corrected decoding of HTML entities € to Ÿ
- First public release.