Skip to content

Commit aee2657

Browse files
committed
Updated CHANGELOG for v5
1 parent c9e7a98 commit aee2657

File tree

1 file changed

+117
-27
lines changed

1 file changed

+117
-27
lines changed

Diff for: CHANGELOG.md

+117-27
Original file line numberDiff line numberDiff line change
@@ -7,38 +7,128 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10-
### Breaking Changes
11-
12-
- Renamed `filesystem.validate_zimfile_creatable` to `filesystem.file_creatable` to reflect general applicability to check file creation beyond ZIM files #200
13-
- Remove any "ZIM" reference in exceptions while working with files #200
14-
- Significantly enhance the safety of metadata manipulation (#205)
15-
- add types for all metadata, one type per metadata name plus some generic ones for non-standard metadata
16-
- all types are responsible to validate metadata value at initialization time
17-
- validation checks for adherence to the ZIM specification and conventions are automated
18-
- cleanup of unwanted control characters and stripping white characters are **automated in all text metadata**
19-
- whenever possible, try to **automatically clean a "reasonably" bad metadata** (e.g. automaticall accept and remove duplicate tags - harmless - but not duplicate language codes - codes are supposed to be ordered, so it is a weird situation) ; this is an alignment of paradigm, because for some metadata the lib was permissive, while for other it was quite restrictive ; this PR tries to align this and **make the lib as permissive as possible**, avoiding to fail a scraper for something which could be automatically fixed
20-
- it is now possible to disable ZIM conventions checks with `zim.metadata.check_metadata_conventions`
21-
- simplify `zim.creator.Creator.config_metadata` by using these types and been more strict:
22-
- add new `StandardMetadata` class for standard metadata, including list of mandatory one
23-
- by default, all non-standard metadata must start with `X-` prefix
24-
- this not yet an openZIM convention / specification, so it is possible to disable this check with `fail_on_missing_prefix` argument
25-
- simplify `add_metadata`, use same metadata types
26-
- simplify `zim.creator.Creator.start` with new types, and drop all metadata from memory after being passed to the libzim
27-
- drop `zim.creator.convert_and_check_metadata` (not usefull anymore, simply use proper metadata type)
28-
- move `MANDATORY_ZIM_METADATA_KEYS` and `DEFAULT_DEV_ZIM_METADATA` from `constants` to `zim.metadata` to avoid circular dependencies
29-
- new `inputs.unique_values` utility function to compute the list of uniques values from a given list, but preserving initial list order
30-
- in `__init__` of `zim.creator.Creator`, rename `disable_metadata_checks` to `check_metadata_conventions` for clarity and brevity
31-
- beware that this manipulate the global `zim.metadata.check_metadata_conventions`, so if you have many creator running in parallel, they can't have different settings, last one initialized will "win"
10+
This is a major release with a lot of breaking changes but most changes are easy to fix.
11+
12+
It focuses on type safety with the introduction of runtime checks: any call to zimscraperlib API must match the type definition or an exception will be raised.
13+
14+
Documentation is available as docstrings and on https://python-scraperlib.readthedocs.io
15+
16+
Main changes includes:
17+
18+
- ZIM metadata handling has changes completely with new types for each kind of metadata.
19+
- `i18n` module has been redesigned around a single main class `Language`
20+
- New `rewriting` module for HTTML/CSS/JS (that one being done at runtime via Wombat)
21+
- Now supporting only Python 3.12
3222

3323
### Added
3424

35-
- Add `filesystem.validate_folder_writable` to check if a folder can be written to #200
36-
- Expose `constants.VERSION` to have access to zimscraperlib version from scrapers #224
37-
- Added mkdocs based documentation site. #92
25+
- Documentation using `mkdocs`, published on readthedocs.com (#92)
26+
- `rewriting` module to rewrite URLs in content for generic scrapers
27+
- `rewriting.css` to rewrite URLs in CSS files
28+
- `rewriting.html` to rewrite URLs in HTML files
29+
- `rewriting.js` to rewrite URLs in JS files (at runtime, using `wombat`)
30+
- `wombat-setup` javascript module in `javascript/`
31+
- `typing` module with custom types:
32+
- `Callback` to use where we expect callbacks
33+
- `SupportsWrite`, `SupportsRead`, `SupportsSeeking` `SupportsSeekableRead` and `SupportsSeekableWrite`: protocols for IO type annotations
34+
- `zim.metadata` module with a type-based approach for each kind of metadata and helpers for custom ones
35+
- [`zim.metadata`] `APPLY_RECOMMENDATIONS`: general flag to toggle openZIM-recommended constraints
36+
- [`zim.metadata`] Type-based classes: `Metadata`, `TextBasedMetadata`, `TextListBasedMetadata`, `DateBasedMetadata`, `IllustrationBasedMetadata`
37+
- [`zim.metadata`] Usage-based classes: `NameMetadata`, `LanguageMetadata`, `DefaultIllustrationMetadata`, etc.
38+
- [`zim.metadata`] `StandardMetadataList` to package the standard metadata
39+
- See details for additional API endpoints and variables
40+
- [`constants`] `DEFAULT_WEB_REQUESTS_TIMEOUT` exposed for `download` module
41+
- [`download`] `stream_file()` now accepts `timeout: int` param (defaults to constant timeout) (#222)
42+
- [`filesystem`] `path_from` context manager to acquire a pathlib `Path` from `Path` or `TemporaryDirectory`
43+
- [`i18n`] `Language`, `get_language()` and `get_language_or_none()`. See breaking changes
44+
- [`image.optimization`] `OptimizePngOptions` dataclass to store PNG options
45+
- [`image.optimization`] `OptimizeJpgOptions` dataclass to store JPEG options
46+
- [`image.optimization`] `OptimizeGifOptions` dataclass to store WebP options
47+
- [`image.optimization`] `OptimizeOptions` dataclass to store cross-formats options
48+
- [`inputs`] `unique_values()` to deduplicate a list while preserving order
49+
- [`logging`] `DEFAULT_FORMAT_WITH_THREADS` as many scrapers uses threads
50+
- [`video.encoding`] `reencode()`'s `existing_tmp_path` param
51+
- [`zim.filesystem`] `validate_folder_writable()` to ensure one can write into a folder (#200)
52+
- [`zim.creator`] `Creator._get_first_language_metadata_value()` to retrieve first language from metadata
53+
- [`zim.items`] `no_indexing_indexdata()` to get an IndexData that disables indexing
54+
- [`zim.items`] `URLItem.get_mimetype()` now only returning `str`
55+
56+
## Changed (Breaking)
57+
58+
- Entire API is now type-protected using beartype. Any call to scraperlib that doesn't satisfy the annotated types will raise an exception
59+
- [`constants`] `MANDATORY_ZIM_METADATA_KEYS` and `DEFAULT_DEV_ZIM_METADATA` moved to `zim/metadata`
60+
- [`download`] `YoutubeDownloader.download`'s `options` parameters now expect an `dict[str, Any]` instead of `dict`
61+
- [`download`] `YoutubeConfig` options now limited to `str | bool | int | None`
62+
- [`download`] `_get_retry_adapter()` now exposed as `get_retry_adapter()`
63+
- [`download`] `stream_file`'s `byte_stream' param now more flexible, accepting `SupportsWrite[bytes] | SupportsSeekableWrite[bytes]`
64+
- [`download`] `stream_file`'s `proxies` param now accepting `dict[str, str]` instead of `dict`
65+
- [`filesystem`] `delete_callback()` is now a simple callback accepting an `fpath` and deleting it (doesnt chain other callback anymore).
66+
- [`filesystem`] `delete_callback()` doesnt fail on missing file (#192)
67+
- [`i18n`] Redesigned API around a single object:
68+
- `Language` which is inited with any acceptable code. Raises `NotFoundError` on 639-3 matching failure
69+
- `find_language_names()` is retained but only accepts a `query: str`
70+
- added `get_language()` and `get_language_or_none()` as shortcuts around `Language`
71+
- `is_valid_iso_639_3()` is retained
72+
- [`image.conversion`] `convert_image()` now accepts `io.BytesIO` in place of `IO[bytes]` for `src` and `dst`.
73+
- [`image.conversion`] `convert_svg2png()` now accepts `io.BytesIO` in place of `IO[bytes]` for `src` and `dst`.
74+
- [`image.optimization`] `optimize_png()` now accepts `options: OptimizePngOptions` instead of individual params.
75+
- [`image.optimization`] `optimize_jpeg()` now accepts `options: OptimizeJpgOptions` instead of individual params.
76+
- [`image.optimization`] `optimize_webp()` now accepts `options: OptimizeWebpOptions` instead of individual params.
77+
- [`image.optimization`] `optimize_gif()` now accepts `options: OptimizeGifOptions` instead of individual params.
78+
- [`image.presets`] All presets now use the new options dataclass instead of ClassVar dict
79+
- [`image.probing`] `format_for()` now accepts `io.BytesIO` in place of `IO[bytes]` for `src`.
80+
- [`image.probing`] `is_valid_image()` now accepts `io.BytesIO` in place of `IO[bytes]` for `image`.
81+
- [`image.utils`] `save_image()` now accepts `io.BytesIO` in place of `IO[bytes]` for `dst`.
82+
- [`video.config`] `Config` was mostly not using type annotations.
83+
- [`video.config`] `Config` options only expecting `str | None`
84+
- [`video.presets`] All options only expecting `str | None`
85+
- [`video.encoding`] `reencode()` now always returning a `tuple[bool, CompletedProcess]`
86+
- [`zim._libkiwix`] `MimetypeAndCounter` now expects specific types for `mimetype: str` and `value: int`
87+
- [`zim.filesystem`] `make_zim_file()` publisher`param now properly expects an`str`
88+
- [`zim.filesystem`] `IncorrectZIMPathError` renamed to `IncorrectPathError`
89+
- [`zim.filesystem`] `MissingZIMFolderError` renamed to `MissingFolderError`
90+
- [`zim.filesystem`] `NotADirectoryZIMFolderError` renamed to `NotADirectoryFolderError`
91+
- [`zim.filesystem`] `NotWritableZIMFolderError` renamed to `NotWritableFolderError`
92+
- [`zim.filesystem`] `IncorrectZIMFilenameError` renamed to `IncorrectFilenameError`
93+
- [`zim.filesystem`] `validate_zimfile_creatable()` renamed to `validate_file_creatable()`
94+
- [`zim.items`] `Item` and `StaticItem` now expecting `hints` as `dict[libzim.writer.Hint, int]` instead of `dict`
95+
- [`zim.items`] `Item.get_hints()` now returning `dict[libzim.writer.Hint, int]` instead of `dict`
96+
- [`zim.items`] `URLItem.download_for_size()` now specifying type annotations and reordered params
97+
- [`zim.providers`] `FileLikeProvider.gen_blob()` and `URLProvider.gen_blob()` now properly annotates return type (`Generator[libzim.writer.Blob, None, None]`)
98+
- [`zim.providers`] `URLProvider.get_size_of()` param `url` now explicitly expects an `str`
99+
- [`zim.creator`] `Creator.config_metadata()` signature changed, now mainly accepting a `StandardMetadataList`
100+
- [`zim.creator`] `Creator.config_dev_metadata()` signature changed to accept now metadata types
101+
- [`zim.creator`] `Creator.add_item_for()`'s `callback` renamed to `callbacks` and accepting `Callback`
102+
- [`zim.creator`] `Creator.add_item()`'s `callback` renamed to `callbacks` and accepting `Callback`
103+
104+
## Changed
105+
106+
- [deps] `iso639-lang` now requires at least v2.4.0
107+
- [`download`] `stream_file()` now return `tuple[int, requests.structures.CaseInsensitiveDict[str]]` instead of `tuple[int, requests.structures.CaseInsensitiveDict]`
108+
- [`download`] `stream_file()` now accepts both `fpath` and `byte_stream` params (writes to both)
109+
- [`image.utils`] `save_image()` now accepts `Any` `**params`.
110+
- [`zim.archive`] `Archive.counters` now returning `CounterMap` (compatible with previous `dict[str, int]`)
38111

39-
### Fixed
112+
## Fixed
40113

41-
- Set default timeout in `download.stream_file` to 10 seconds, and allow to override value #222
114+
- Direct dependencies now properly references: pillow, urllib3, piexif, idna (#226)
115+
- [`download`] `YoutubeDownloader.download` now respects its return type (`bool | Future[Any]`)
116+
- [`image.conversion`] `convert_image()` `**params` properly declared as accepting `None`.
117+
- [`logging`] `getLogger()`'s' `console` now properly accepting `TextIO | io.StringIO | None`
118+
- [`video.probing`] `get_media_info()` type annotation for `src_path`
119+
- [`zim.archive`] `Archive.get_item()` return type (`libzim.reader.Item`)
120+
121+
## Removed
122+
123+
- Support for Python 3.8/3.9/3.10/3.11. Only Python 3.12 is supported now.
124+
- [`i18n`] `Lang` (See breaking changes)
125+
- [`i18n`] `get_iso_lang_data()` (See breaking changes)
126+
- [`i18n`] `update_with_macro()` (See breaking changes)
127+
- [`i18n`] `get_language_details()` (See breaking changes)
128+
- [`uri`] `rebuild_uri` `failsafe` param (was only handling incorrect types)
129+
- [`video.encoding`] `reencode()`'s `with_process` param
130+
- [`zim.creator`] `Creator.validate_metadata()`
131+
- [`zim.creator`] `Creator.convert_and_check_metadata()`
42132

43133
## [4.0.0] - 2024-08-05
44134

0 commit comments

Comments
 (0)