|
1 | 1 | # Release notes |
2 | 2 |
|
| 3 | +## Changes from 4.1.2 to 4.2.0 |
| 4 | + |
| 5 | +### CTable: columnar compressed tables |
| 6 | + |
| 7 | +- Introduced `blosc2.CTable`, a new columnar table container for compressed, typed columns. CTables support dataclass- and schema-based construction, row iteration, column access, table views, `head()` / `tail()` / `sample()`, sorting, selection and compact `where` expressions. |
| 8 | +- Added persistent CTables backed by `TreeStore`, with support for `blosc2.open()`, `CTable.open()`, `CTable.load()`, `CTable.save()`, `CTable.to_b2d()` and `CTable.to_b2z()`. CTable views can be saved too, and `.b2z`/`.b2d` path handling has been tightened. |
| 9 | +- Added mutation operations for CTables, including `append()`, `extend()`, `delete()`, `compact()`, `add_column()`, `drop_column()`, `rename_column()` and related schema validation. |
| 10 | +- Added computed columns, including virtual computed columns backed by lazy expressions, materialized computed columns and automatic filling of materialized computed columns during inserts. |
| 11 | +- Added CTable indexing support, including persistent indexes, direct expression indexes, ordered index reuse, boolean `LazyExpr`/`NDArray` masks in `CTable.__getitem__`, `iter_sorted()` and indexing support for `.b2z` tables. |
| 12 | +- Added nullable schema support and null policies for CTable scalar columns, preserving nullable scalar Parquet round-trips. |
| 13 | +- Added variable-length CTable column support via `ListArray` / `ObjectArray`, including `vlstring` and `vlbytes` schema specs, fixed-length string/bytes import support and list/struct Arrow/Parquet round-trips. |
| 14 | +- Added Arrow, Parquet and CSV interoperability for CTables, including batch-wise Arrow/Parquet import/export, Arrow schema metadata preservation, `CTable.from_arrow_batches()` improvements and a new `parquet-to-blosc2` CLI utility. |
| 15 | +- Added CTable documentation, tutorials, examples and benchmarks covering schema definition, persistence, querying, indexing, mutations, nullable columns, computed columns and variable-length columns. |
| 16 | + |
| 17 | +### Indexing and ordering |
| 18 | + |
| 19 | +- Added a new indexing subsystem for NDArrays and CTables, including full, partial/bucket, light/medium and OPSI-style index kinds, out-of-core index builders and sidecar storage. |
| 20 | +- Added `blosc2.Index` as the unified public index handle, plus APIs such as `create_index()`, `compact_index()`, `iter_sorted()`, `will_use_index()` and related query explanation support. |
| 21 | +- Added materialized expression indexes for NDArrays and direct expression indexes for CTables. |
| 22 | +- Added persistent query-result caching for indexed lookups, with FIFO pruning and cache accounting. |
| 23 | +- Added `blosc2.argsort()` and refactored indexing APIs around explicit index enums and sorting helpers. |
| 24 | +- Improved indexed query performance with Cython accelerators, threaded chunk batching, zero-copy/cached mmap reads, chunk-aware and reduced-order layouts and faster scattered row gathering. |
| 25 | +- Reduced memory usage during index creation and lookup by avoiding full sidecar materialization, replacing memmap staging with Blosc2 scratch arrays and adding `tmpdir` support for full out-of-core indexes. |
| 26 | + |
| 27 | +### Persistence, stores and serialization |
| 28 | + |
| 29 | +- Added structured Blosc2 serialization based on b2object carriers, including persisted `C2Array`, `LazyExpr` and DSL `LazyUDF` objects. |
| 30 | +- Added `blosc2.Ref` for serializing external references, plus examples for b2object bundles and persisted expressions/UDFs. |
| 31 | +- Added `blosc2.load()` as a convenience loader. |
| 32 | +- Added `vlmeta` support to `LazyArray` objects. |
| 33 | +- Improved store handling by preserving lazy b2object carriers in `DictStore`, allowing reopened proxies to refill caches after read-only opens, relaxing `DictStore`/`TreeStore` suffix requirements and adding `DictStore.to_b2d()`. |
| 34 | +- Accelerated `blosc2.open()` by trying standard opens first and warning on implicit append mode. |
| 35 | + |
| 36 | +### Arrays, computation and containers |
| 37 | + |
| 38 | +- Added `ObjectArray` for fully general object data and renamed the earlier `VLArray` work accordingly; added `ListArray` docstrings and Arrow integration improvements. |
| 39 | +- Added schema helpers including numeric specs, `blosc2.struct()` and `blosc2.object()` for nested/fully general column declarations. |
| 40 | +- Improved `fromiter()` with direct chunked construction and substantially lower peak memory use. |
| 41 | +- Improved `asarray()` behavior for NDArray inputs when copy-inducing keyword arguments are supplied. |
| 42 | +- Added `SChunk.reorder_offsets()`. |
| 43 | +- Improved `BatchArray` defaults and documentation; the default compression level is now tuned for faster lookup/scan behavior. |
| 44 | +- Continued matmul/linalg optimization work and shared-thread-pool integration. |
| 45 | + |
| 46 | +### CLI, docs and examples |
| 47 | + |
| 48 | +- Added the `parquet-to-blosc2` command with options such as `--max-rows`, `--parquet-batch-size`, `--blosc2-items-per-block` and `--use-dict`. |
| 49 | +- Added new CTable, ObjectArray, BatchArray, containers, indexing and serialization tutorials and examples. |
| 50 | +- Reorganized and expanded the API reference for CTable, Column, schema specs, Index, save/load helpers and miscellaneous APIs. |
| 51 | +- Updated benchmark suites for CTables, indexing, Parquet import/export, BatchArray and NDArray construction/indexing. |
| 52 | + |
| 53 | +### Fixes and compatibility |
| 54 | + |
| 55 | +- Updated bundled C-Blosc2 to v3.0.2 and require C-Blosc2 >= 3.0.0 when building against a system library. |
| 56 | +- Updated bundled C-Blosc2 and miniexpr sources multiple times. |
| 57 | +- Restored compatibility with NumPy < 2. |
| 58 | +- Fixed Windows and mmap/file-locking issues in index creation, rebuilds and temporary file cleanup. |
| 59 | +- Fixed full-index query failures for large CTable columns and full out-of-core merge failures on systems with small `/tmp`. |
| 60 | +- Fixed stale sidecar/cache reuse and targeted cache invalidation when persistent sidecars are replaced. |
| 61 | +- Fixed `.b2z` double-open corruption caused by GC-triggered repacking and made temporary `.b2z` unpacking default to the source file directory. |
| 62 | +- Fixed a regression when reopening persisted proxies in read-only mode. |
| 63 | +- Fixed GC-induced thread hangs on macOS with Python 3.14 and hardened async chunk reading/cache cleanup paths. |
| 64 | +- Fixed lazy-chunk source-size handling in decode/getitem callers. |
| 65 | +- Fixed nullable validation, dictionary extend validation, CTable close propagation, print alignment and NumPy mask support. |
| 66 | +- Fixed `arange()` regressions and several pre-existing `set_slice` error-handling issues. |
| 67 | +- Clamped indexing/thread defaults for wasm32. |
| 68 | + |
3 | 69 | ## Changes from 4.1.1 to 4.1.2 |
4 | 70 |
|
5 | 71 | - A new fast path for src/blosc2/linalg.py that uses the matmul prefilter machinery in src/blosc2/blosc2_ext.pyx. |
|
0 commit comments