Skip to content

Commit 0512b21

Browse files
committed
Different amendments in docs for CTable methods
1 parent b43ee3b commit 0512b21

3 files changed

Lines changed: 156 additions & 31 deletions

File tree

doc/reference/classes.rst

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,11 @@ Blosc2 Classes
33

44
.. currentmodule:: blosc2
55

6+
67
Main Classes
78
------------
89
.. autosummary::
910

10-
CTable
11-
Column
1211
NDArray
1312
NDField
1413
LazyArray
@@ -17,39 +16,42 @@ Main Classes
1716
BatchArray
1817
ListArray
1918
ObjectArray
20-
SChunk
21-
DictStore
22-
TreeStore
23-
EmbedStore
24-
Index
2519
Ref
2620
Proxy
2721
ProxySource
2822
ProxyNDSource
2923
SimpleProxy
24+
DictStore
25+
TreeStore
26+
EmbedStore
27+
CTable
28+
Column
29+
Index
30+
SChunk
3031

3132
.. toctree::
3233
:maxdepth: 1
3334

34-
ctable
3535
ndarray
36-
index_class
36+
ndfield
3737
lazyarray
3838
c2array
3939
array
40-
schunk
41-
dict_store
42-
tree_store
43-
embed_store
4440
batch_array
4541
list_array
4642
objectarray
47-
ndfield
4843
ref
4944
proxy
5045
proxysource
5146
proxyndsource
5247
simpleproxy
48+
dict_store
49+
tree_store
50+
embed_store
51+
ctable
52+
index_class
53+
schunk
54+
5355

5456
Other Classes
5557
-------------

doc/reference/ctable.rst

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -225,18 +225,6 @@ When a NumPy structured array is needed, materialize explicitly::
225225
.. automethod:: CTable.iter_sorted
226226

227227

228-
Aggregates & statistics
229-
-----------------------
230-
231-
.. autosummary::
232-
233-
CTable.describe
234-
CTable.cov
235-
236-
.. automethod:: CTable.describe
237-
.. automethod:: CTable.cov
238-
239-
240228
Mutations
241229
---------
242230

@@ -286,31 +274,54 @@ ordered reuse is required.
286274
Persistence
287275
-----------
288276

277+
Persist CTables to disk or interchange formats, and restore them later without
278+
losing schema information. These methods cover native Blosc2 persistence as
279+
well as import/export paths for CSV, Arrow, and Parquet data.
280+
289281
.. autosummary::
290282

283+
CTable.load
284+
CTable.open
291285
CTable.save
292286
CTable.to_csv
293287
CTable.to_arrow
294288
CTable.to_parquet
289+
CTable.from_arrow
290+
CTable.from_parquet
291+
CTable.from_csv
295292

293+
.. automethod:: CTable.load
294+
.. automethod:: CTable.open
296295
.. automethod:: CTable.save
297296
.. automethod:: CTable.to_csv
298297
.. automethod:: CTable.to_arrow
299298
.. automethod:: CTable.to_parquet
299+
.. automethod:: CTable.from_arrow
300+
.. automethod:: CTable.from_parquet
301+
.. automethod:: CTable.from_csv
300302

301303

302-
Inspection
303-
----------
304+
Inspection & statistics
305+
-----------------------
306+
307+
Compute common descriptive statistics directly on ``CTable`` data without
308+
materializing rows first. These methods operate column-wise on the compressed
309+
representation, making it easy to summarize distributions or measure
310+
relationships between numeric columns.
304311

305312
.. autosummary::
306313

314+
CTable.column_schema
307315
CTable.info
308316
CTable.schema_dict
309-
CTable.column_schema
317+
CTable.describe
318+
CTable.cov
310319

320+
.. automethod:: CTable.column_schema
311321
.. automethod:: CTable.info
312322
.. automethod:: CTable.schema_dict
313-
.. automethod:: CTable.column_schema
323+
.. automethod:: CTable.describe
324+
.. automethod:: CTable.cov
314325

315326

316327
----

src/blosc2/ctable.py

Lines changed: 113 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3338,7 +3338,119 @@ def from_parquet(
33383338
blosc2_items_per_block: int | None = None,
33393339
**kwargs,
33403340
) -> CTable:
3341-
"""Read a Parquet file into a :class:`CTable` batch-wise using pyarrow."""
3341+
"""Read a Parquet file into a :class:`CTable`.
3342+
3343+
The Parquet file is streamed batch by batch through :mod:`pyarrow` and then
3344+
converted into a typed :class:`CTable`. By default, the result is created in
3345+
memory, but you can also persist it on disk via ``urlpath``.
3346+
3347+
This method delegates the actual table construction to
3348+
:meth:`CTable.from_arrow`, so Arrow schema handling, nullable-column support,
3349+
and Blosc2 write tuning follow the same rules as that method.
3350+
3351+
Parameters
3352+
----------
3353+
path : str or path-like
3354+
Path to the source Parquet file.
3355+
3356+
columns : list[str] or None, optional
3357+
Subset of columns to read from the Parquet file. If provided, only these
3358+
columns are loaded and their order in the resulting table matches the
3359+
order in this list. Column names must be unique.
3360+
3361+
batch_size : int, optional
3362+
Number of rows per Arrow batch read from the Parquet file. This controls
3363+
how much data is pulled from the file at a time before being handed off
3364+
to the CTable builder. Must be greater than 0.
3365+
3366+
urlpath : str or None, optional
3367+
Destination storage path for the resulting CTable. If ``None`` (the
3368+
default), the table is created in memory. If provided, the table is backed
3369+
by persistent on-disk storage.
3370+
3371+
mode : str, optional
3372+
Storage open mode for ``urlpath``. Defaults to ``"w"``. This is passed
3373+
through to :meth:`CTable.from_arrow`.
3374+
3375+
cparams : object, optional
3376+
Compression parameters for the created Blosc2 containers. Passed through
3377+
to :meth:`CTable.from_arrow`.
3378+
3379+
dparams : object, optional
3380+
Decompression parameters for the created Blosc2 containers. Passed through
3381+
to :meth:`CTable.from_arrow`.
3382+
3383+
validate : bool, optional
3384+
Whether to enable extra internal validation while building the table.
3385+
Defaults to ``False``.
3386+
3387+
auto_null_sentinels : bool, optional
3388+
If ``True`` (default), nullable scalar columns imported from Parquet may
3389+
automatically receive per-column null sentinel values when needed. Sentinel
3390+
selection follows the current null-policy rules used by CTable schema
3391+
handling.
3392+
3393+
blosc2_batch_size : int or None, optional
3394+
Number of items written to Blosc2 containers per internal write batch.
3395+
Passed through to :meth:`CTable.from_arrow`.
3396+
3397+
blosc2_items_per_block : int or None, optional
3398+
Target number of items per internal Blosc2 block. Passed through to
3399+
:meth:`CTable.from_arrow`.
3400+
3401+
**kwargs
3402+
Additional keyword arguments forwarded to ``pyarrow.parquet.ParquetFile``.
3403+
Use these for Parquet-reader-specific options supported by PyArrow.
3404+
3405+
Returns
3406+
-------
3407+
CTable
3408+
A new :class:`CTable` populated from the Parquet file. The table contains
3409+
all selected columns and all rows from the file. If ``urlpath`` is
3410+
provided, the returned table is disk-backed; otherwise it is in-memory.
3411+
3412+
Raises
3413+
------
3414+
ImportError
3415+
If :mod:`pyarrow` is not installed.
3416+
ValueError
3417+
If ``batch_size`` is not greater than 0.
3418+
ValueError
3419+
If ``columns`` contains duplicate names.
3420+
Exception
3421+
Any exception raised by :mod:`pyarrow` while opening or reading the Parquet
3422+
file, or by :meth:`CTable.from_arrow` while converting Arrow data into a
3423+
CTable.
3424+
3425+
Examples
3426+
--------
3427+
Load an entire Parquet file into an in-memory table:
3428+
3429+
>>> import blosc2
3430+
>>> t = blosc2.CTable.from_parquet("data.parquet")
3431+
3432+
Load only a subset of columns:
3433+
3434+
>>> t = blosc2.CTable.from_parquet(
3435+
... "data.parquet",
3436+
... columns=["user_id", "amount", "country"],
3437+
... )
3438+
3439+
Create a disk-backed table while reading in batches:
3440+
3441+
>>> t = blosc2.CTable.from_parquet(
3442+
... "data.parquet",
3443+
... batch_size=50_000,
3444+
... urlpath="data.ctable",
3445+
... )
3446+
3447+
Pass additional options through to PyArrow's Parquet reader:
3448+
3449+
>>> t = blosc2.CTable.from_parquet(
3450+
... "data.parquet",
3451+
... memory_map=True,
3452+
... )
3453+
"""
33423454
pq = cls._require_pyarrow_parquet("from_parquet()")
33433455
pa = cls._require_pyarrow("from_parquet()")
33443456
cls._validate_arrow_batch_size(batch_size)

0 commit comments

Comments
 (0)