You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+117-27
Original file line number
Diff line number
Diff line change
@@ -7,38 +7,128 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
-
### Breaking Changes
11
-
12
-
- Renamed `filesystem.validate_zimfile_creatable` to `filesystem.file_creatable` to reflect general applicability to check file creation beyond ZIM files #200
13
-
- Remove any "ZIM" reference in exceptions while working with files #200
14
-
- Significantly enhance the safety of metadata manipulation (#205)
15
-
- add types for all metadata, one type per metadata name plus some generic ones for non-standard metadata
16
-
- all types are responsible to validate metadata value at initialization time
17
-
- validation checks for adherence to the ZIM specification and conventions are automated
18
-
- cleanup of unwanted control characters and stripping white characters are **automated in all text metadata**
19
-
- whenever possible, try to **automatically clean a "reasonably" bad metadata** (e.g. automaticall accept and remove duplicate tags - harmless - but not duplicate language codes - codes are supposed to be ordered, so it is a weird situation) ; this is an alignment of paradigm, because for some metadata the lib was permissive, while for other it was quite restrictive ; this PR tries to align this and **make the lib as permissive as possible**, avoiding to fail a scraper for something which could be automatically fixed
20
-
- it is now possible to disable ZIM conventions checks with `zim.metadata.check_metadata_conventions`
21
-
- simplify `zim.creator.Creator.config_metadata` by using these types and been more strict:
22
-
- add new `StandardMetadata` class for standard metadata, including list of mandatory one
23
-
- by default, all non-standard metadata must start with `X-` prefix
24
-
- this not yet an openZIM convention / specification, so it is possible to disable this check with `fail_on_missing_prefix` argument
25
-
- simplify `add_metadata`, use same metadata types
26
-
- simplify `zim.creator.Creator.start` with new types, and drop all metadata from memory after being passed to the libzim
27
-
- drop `zim.creator.convert_and_check_metadata` (not usefull anymore, simply use proper metadata type)
28
-
- move `MANDATORY_ZIM_METADATA_KEYS` and `DEFAULT_DEV_ZIM_METADATA` from `constants` to `zim.metadata` to avoid circular dependencies
29
-
- new `inputs.unique_values` utility function to compute the list of uniques values from a given list, but preserving initial list order
30
-
- in `__init__` of `zim.creator.Creator`, rename `disable_metadata_checks` to `check_metadata_conventions` for clarity and brevity
31
-
- beware that this manipulate the global `zim.metadata.check_metadata_conventions`, so if you have many creator running in parallel, they can't have different settings, last one initialized will "win"
10
+
This is a major release with a lot of breaking changes but most changes are easy to fix.
11
+
12
+
It focuses on type safety with the introduction of runtime checks: any call to zimscraperlib API must match the type definition or an exception will be raised.
13
+
14
+
Documentation is available as docstrings and on https://python-scraperlib.readthedocs.io
15
+
16
+
Main changes includes:
17
+
18
+
- ZIM metadata handling has changes completely with new types for each kind of metadata.
19
+
-`i18n` module has been redesigned around a single main class `Language`
20
+
- New `rewriting` module for HTTML/CSS/JS (that one being done at runtime via Wombat)
21
+
- Now supporting only Python 3.12
32
22
33
23
### Added
34
24
35
-
- Add `filesystem.validate_folder_writable` to check if a folder can be written to #200
36
-
- Expose `constants.VERSION` to have access to zimscraperlib version from scrapers #224
37
-
- Added mkdocs based documentation site. #92
25
+
- Documentation using `mkdocs`, published on readthedocs.com (#92)
26
+
-`rewriting` module to rewrite URLs in content for generic scrapers
27
+
-`rewriting.css` to rewrite URLs in CSS files
28
+
-`rewriting.html` to rewrite URLs in HTML files
29
+
-`rewriting.js` to rewrite URLs in JS files (at runtime, using `wombat`)
30
+
-`wombat-setup` javascript module in `javascript/`
31
+
-`typing` module with custom types:
32
+
-`Callback` to use where we expect callbacks
33
+
-`SupportsWrite`, `SupportsRead`, `SupportsSeeking``SupportsSeekableRead` and `SupportsSeekableWrite`: protocols for IO type annotations
34
+
-`zim.metadata` module with a type-based approach for each kind of metadata and helpers for custom ones
35
+
-[`zim.metadata`]`APPLY_RECOMMENDATIONS`: general flag to toggle openZIM-recommended constraints
-[`zim.filesystem`]`validate_folder_writable()` to ensure one can write into a folder (#200)
52
+
-[`zim.creator`]`Creator._get_first_language_metadata_value()` to retrieve first language from metadata
53
+
-[`zim.items`]`no_indexing_indexdata()` to get an IndexData that disables indexing
54
+
-[`zim.items`]`URLItem.get_mimetype()` now only returning `str`
55
+
56
+
## Changed (Breaking)
57
+
58
+
- Entire API is now type-protected using beartype. Any call to scraperlib that doesn't satisfy the annotated types will raise an exception
59
+
-[`constants`]`MANDATORY_ZIM_METADATA_KEYS` and `DEFAULT_DEV_ZIM_METADATA` moved to `zim/metadata`
60
+
-[`download`]`YoutubeDownloader.download`'s `options` parameters now expect an `dict[str, Any]` instead of `dict`
61
+
-[`download`]`YoutubeConfig` options now limited to `str | bool | int | None`
62
+
-[`download`]`_get_retry_adapter()` now exposed as `get_retry_adapter()`
63
+
-[`download`]`stream_file`'s `byte_stream' param now more flexible, accepting `SupportsWrite[bytes] | SupportsSeekableWrite[bytes]`
64
+
-[`download`]`stream_file`'s `proxies` param now accepting `dict[str, str]` instead of `dict`
65
+
-[`filesystem`]`delete_callback()` is now a simple callback accepting an `fpath` and deleting it (doesnt chain other callback anymore).
66
+
-[`filesystem`]`delete_callback()` doesnt fail on missing file (#192)
67
+
-[`i18n`] Redesigned API around a single object:
68
+
-`Language` which is inited with any acceptable code. Raises `NotFoundError` on 639-3 matching failure
69
+
-`find_language_names()` is retained but only accepts a `query: str`
70
+
- added `get_language()` and `get_language_or_none()` as shortcuts around `Language`
71
+
-`is_valid_iso_639_3()` is retained
72
+
-[`image.conversion`]`convert_image()` now accepts `io.BytesIO` in place of `IO[bytes]` for `src` and `dst`.
73
+
-[`image.conversion`]`convert_svg2png()` now accepts `io.BytesIO` in place of `IO[bytes]` for `src` and `dst`.
74
+
-[`image.optimization`]`optimize_png()` now accepts `options: OptimizePngOptions` instead of individual params.
75
+
-[`image.optimization`]`optimize_jpeg()` now accepts `options: OptimizeJpgOptions` instead of individual params.
76
+
-[`image.optimization`]`optimize_webp()` now accepts `options: OptimizeWebpOptions` instead of individual params.
77
+
-[`image.optimization`]`optimize_gif()` now accepts `options: OptimizeGifOptions` instead of individual params.
78
+
-[`image.presets`] All presets now use the new options dataclass instead of ClassVar dict
79
+
-[`image.probing`]`format_for()` now accepts `io.BytesIO` in place of `IO[bytes]` for `src`.
80
+
-[`image.probing`]`is_valid_image()` now accepts `io.BytesIO` in place of `IO[bytes]` for `image`.
81
+
-[`image.utils`]`save_image()` now accepts `io.BytesIO` in place of `IO[bytes]` for `dst`.
82
+
-[`video.config`]`Config` was mostly not using type annotations.
83
+
-[`video.config`]`Config` options only expecting `str | None`
84
+
-[`video.presets`] All options only expecting `str | None`
85
+
-[`video.encoding`]`reencode()` now always returning a `tuple[bool, CompletedProcess]`
86
+
-[`zim._libkiwix`]`MimetypeAndCounter` now expects specific types for `mimetype: str` and `value: int`
87
+
-[`zim.filesystem`]`make_zim_file()` publisher`param now properly expects an`str`
88
+
-[`zim.filesystem`]`IncorrectZIMPathError` renamed to `IncorrectPathError`
89
+
-[`zim.filesystem`]`MissingZIMFolderError` renamed to `MissingFolderError`
90
+
-[`zim.filesystem`]`NotADirectoryZIMFolderError` renamed to `NotADirectoryFolderError`
91
+
-[`zim.filesystem`]`NotWritableZIMFolderError` renamed to `NotWritableFolderError`
92
+
-[`zim.filesystem`]`IncorrectZIMFilenameError` renamed to `IncorrectFilenameError`
93
+
-[`zim.filesystem`]`validate_zimfile_creatable()` renamed to `validate_file_creatable()`
94
+
-[`zim.items`]`Item` and `StaticItem` now expecting `hints` as `dict[libzim.writer.Hint, int]` instead of `dict`
95
+
-[`zim.items`]`Item.get_hints()` now returning `dict[libzim.writer.Hint, int]` instead of `dict`
96
+
-[`zim.items`]`URLItem.download_for_size()` now specifying type annotations and reordered params
97
+
-[`zim.providers`]`FileLikeProvider.gen_blob()` and `URLProvider.gen_blob()` now properly annotates return type (`Generator[libzim.writer.Blob, None, None]`)
98
+
-[`zim.providers`]`URLProvider.get_size_of()` param `url` now explicitly expects an `str`
99
+
-[`zim.creator`]`Creator.config_metadata()` signature changed, now mainly accepting a `StandardMetadataList`
100
+
-[`zim.creator`]`Creator.config_dev_metadata()` signature changed to accept now metadata types
101
+
-[`zim.creator`]`Creator.add_item_for()`'s `callback` renamed to `callbacks` and accepting `Callback`
102
+
-[`zim.creator`]`Creator.add_item()`'s `callback` renamed to `callbacks` and accepting `Callback`
103
+
104
+
## Changed
105
+
106
+
-[deps]`iso639-lang` now requires at least v2.4.0
107
+
-[`download`]`stream_file()` now return `tuple[int, requests.structures.CaseInsensitiveDict[str]]` instead of `tuple[int, requests.structures.CaseInsensitiveDict]`
108
+
-[`download`]`stream_file()` now accepts both `fpath` and `byte_stream` params (writes to both)
109
+
-[`image.utils`]`save_image()` now accepts `Any``**params`.
110
+
-[`zim.archive`]`Archive.counters` now returning `CounterMap` (compatible with previous `dict[str, int]`)
38
111
39
-
###Fixed
112
+
## Fixed
40
113
41
-
- Set default timeout in `download.stream_file` to 10 seconds, and allow to override value #222
114
+
- Direct dependencies now properly references: pillow, urllib3, piexif, idna (#226)
115
+
-[`download`]`YoutubeDownloader.download` now respects its return type (`bool | Future[Any]`)
116
+
-[`image.conversion`]`convert_image()``**params` properly declared as accepting `None`.
117
+
-[`logging`]`getLogger()`'s' `console` now properly accepting `TextIO | io.StringIO | None`
118
+
-[`video.probing`]`get_media_info()` type annotation for `src_path`
119
+
-[`zim.archive`]`Archive.get_item()` return type (`libzim.reader.Item`)
120
+
121
+
## Removed
122
+
123
+
- Support for Python 3.8/3.9/3.10/3.11. Only Python 3.12 is supported now.
124
+
-[`i18n`]`Lang` (See breaking changes)
125
+
-[`i18n`]`get_iso_lang_data()` (See breaking changes)
126
+
-[`i18n`]`update_with_macro()` (See breaking changes)
127
+
-[`i18n`]`get_language_details()` (See breaking changes)
128
+
-[`uri`]`rebuild_uri``failsafe` param (was only handling incorrect types)
0 commit comments