Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions doc/reference/ctable.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,27 @@ explicitly call :meth:`~blosc2.CTable.to_arrow` or iterate with
CTable.__str__

.. automethod:: __len__

Return the number of live (non-deleted) rows.

.. automethod:: __iter__

Iterate over live rows in insertion order, yielding namedtuple-like
row objects with one attribute per column.

.. automethod:: __getitem__

Type-driven indexing:

* ``str`` — column name returns a :class:`Column`; any other string
is interpreted as a boolean expression and behaves like
:meth:`where`.
* ``int`` — single row as a namedtuple-like object.
* ``slice`` — row-range view.
* ``list[int]`` / ``ndarray[int]`` — gathered-row view.
* ``ndarray[bool]`` — boolean-mask filtered view.
* ``list[str]`` — column-projection view (same as :meth:`select`).

.. automethod:: __repr__
.. automethod:: __str__

Expand Down Expand Up @@ -113,12 +132,14 @@ Attributes
CTable.schema
CTable.base

.. autoattribute:: CTable.col_names
.. autoproperty:: CTable.computed_columns
.. autoproperty:: CTable.nrows
.. autoproperty:: CTable.ncols
.. autoproperty:: CTable.cbytes
.. autoproperty:: CTable.nbytes
.. autoproperty:: CTable.schema
.. autoattribute:: CTable.base


Inserting data
Expand All @@ -130,8 +151,25 @@ Inserting data
CTable.extend

.. automethod:: CTable.append

Append a single row to the table. *data* may be a list, tuple,
``numpy.void``, or structured ``numpy.ndarray`` whose fields match the
schema column order. Materialized columns whose values are omitted are
auto-filled from their recorded expression. Raises ``ValueError`` if the
table is read-only or a view.

Comment thread
FrancescAlted marked this conversation as resolved.
Outdated
.. automethod:: CTable.extend

Append multiple rows at once. *data* may be:

* a **dict of arrays** ``{"col": array, ...}`` — all arrays must have the
same length; missing columns are filled with their default value;
* a **list of rows**, each compatible with :meth:`append`;
* another **CTable** — columns are matched by name.

Pass ``validate=False`` to skip per-row Pydantic validation on trusted
bulk imports. Raises ``ValueError`` if the table is read-only or a view.

Comment thread
FrancescAlted marked this conversation as resolved.
Outdated

Querying
--------
Expand Down Expand Up @@ -186,18 +224,27 @@ When a NumPy structured array is needed, materialize explicitly::
.. autosummary::

CTable.where
CTable.view
CTable.select
CTable.head
CTable.tail
CTable.sample
CTable.sort_by
CTable.iter_sorted

.. automethod:: CTable.where
.. automethod:: CTable.view
.. automethod:: CTable.select
.. automethod:: CTable.head

Return a view of the first *N* live rows (default 5).

Comment thread
FrancescAlted marked this conversation as resolved.
Outdated
.. automethod:: CTable.tail

Return a view of the last *N* live rows (default 5).
Comment thread
FrancescAlted marked this conversation as resolved.
Outdated
.. automethod:: CTable.sample
.. automethod:: CTable.sort_by
.. automethod:: CTable.iter_sorted


Aggregates & statistics
Expand Down Expand Up @@ -249,7 +296,19 @@ ordered reuse is required.
CTable.rename_column

.. automethod:: CTable.delete

Mark one or more rows as deleted (tombstone deletion). *ind* may be a
logical row index (``int``), a slice, or an iterable of logical indices.
Deleted rows are excluded from all subsequent queries and aggregates.
Physical storage is not reclaimed until :meth:`compact` is called.
Raises ``ValueError`` if the table is read-only or a view.

Comment thread
FrancescAlted marked this conversation as resolved.
Outdated
.. automethod:: CTable.compact

Physically rewrite every column array keeping only live rows, closing the
gaps left by prior :meth:`delete` calls. All existing indexes are dropped
and must be recreated afterwards. Raises ``ValueError`` if the table is
read-only or a view.
Comment thread
FrancescAlted marked this conversation as resolved.
Outdated
.. automethod:: CTable.add_column
.. automethod:: CTable.add_computed_column
.. automethod:: CTable.materialize_computed_column
Expand All @@ -266,10 +325,12 @@ Persistence
CTable.save
CTable.to_csv
CTable.to_arrow
CTable.to_parquet

.. automethod:: CTable.save
.. automethod:: CTable.to_csv
.. automethod:: CTable.to_arrow
.. automethod:: CTable.to_parquet


Inspection
Expand Down Expand Up @@ -311,10 +372,19 @@ All index operations and aggregates apply the table's tombstone mask
Column.__setitem__

.. automethod:: __len__

Return the number of live (non-deleted) values in this column.

.. automethod:: __iter__

Iterate over live values in insertion order, skipping deleted rows.

.. automethod:: __getitem__
.. automethod:: __setitem__

Set one or more live column values. Accepts the same index forms as
:meth:`__getitem__`.


Attributes
----------
Expand Down Expand Up @@ -461,10 +531,14 @@ Text & binary

string
bytes
vlstring
vlbytes
list

.. autoclass:: string
.. autoclass:: bytes
.. autofunction:: vlstring
.. autofunction:: vlbytes
.. autofunction:: list

List columns
Expand Down
28 changes: 28 additions & 0 deletions src/blosc2/ctable.py
Original file line number Diff line number Diff line change
Expand Up @@ -817,6 +817,7 @@ def view(self) -> ColumnViewIndexer:
return ColumnViewIndexer(self)

def __setitem__(self, key: int | slice | list | np.ndarray, value): # noqa: C901
"""Set one or more live column values; accepts the same index forms as :meth:`__getitem__`."""
if self._table._read_only:
raise ValueError("Table is read-only (opened with mode='r').")
if self.is_computed:
Expand Down Expand Up @@ -871,6 +872,7 @@ def __setitem__(self, key: int | slice | list | np.ndarray, value): # noqa: C90
self._table._root_table._mark_all_indexes_stale()

def __iter__(self):
"""Iterate over live column values in insertion order, skipping deleted rows."""
if self.is_computed:
yield from self._iter_chunks_computed(size=None)
return
Expand Down Expand Up @@ -914,6 +916,7 @@ def __repr__(self) -> str:
return f"Column({self._col_name!r}, dtype={self.dtype}, len={len(self)}, values=[{preview}])"

def __len__(self):
"""Return the number of live (non-deleted) values in this column."""
return blosc2.count_nonzero(self._valid_rows)

@property
Expand Down Expand Up @@ -1082,6 +1085,9 @@ def __ge__(self, other):

@property
def dtype(self):
"""NumPy dtype of the underlying storage, or ``None`` for
variable-length columns (:func:`~blosc2.vlstring`,
:func:`~blosc2.vlbytes`, :func:`~blosc2.list`)."""
return getattr(self._raw_col, "dtype", None)

def iter_chunks(self, size: int = 65536):
Expand Down Expand Up @@ -1566,6 +1572,16 @@ def _fmt_bytes(n: int) -> str:


class CTable(Generic[RowT]):
#: Ordered list of stored column names. Computed columns are **not**
#: included; access those via :attr:`computed_columns`.
col_names: list[str]

#: Parent table when this instance is a row-filter or column-projection
#: view (created by :meth:`where`, :meth:`select`, or :meth:`view`).
#: ``None`` for top-level tables. Structural mutations such as
#: :meth:`add_column` and :meth:`drop_column` are blocked on views.
base: CTable | None

def __init__(
self,
row_type: type[RowT],
Expand Down Expand Up @@ -1993,6 +2009,7 @@ def _rows_to_dicts(self, positions) -> list[dict]:
return rows

def __str__(self) -> str:
"""Pandas-style tabular display with column names, dtypes, and a row count footer."""
nrows = self._n_rows
ncols = len(self.col_names)
head_pos, tail_pos, hidden = self._display_positions()
Expand Down Expand Up @@ -2025,13 +2042,16 @@ def __str__(self) -> str:
return "\n".join(lines)

def __repr__(self) -> str:
"""Short ``CTable<cols>(N rows, X compressed)`` summary string."""
cols = ", ".join(self.col_names)
return f"CTable<{cols}>({self._n_rows:,} rows, {_fmt_bytes(self.cbytes)} compressed)"

def __len__(self):
"""Return the number of live (non-deleted) rows."""
return self._n_rows

def __iter__(self):
"""Iterate over live rows in insertion order, yielding namedtuple-like row objects."""
for i in range(self.nrows):
yield self._materialize_row(i)

Expand Down Expand Up @@ -2428,6 +2448,7 @@ def _make_view(cls, parent: CTable, new_valid_rows: blosc2.NDArray) -> CTable:
return obj

def view(self, new_valid_rows):
"""Return a row-filter view backed by a boolean mask array without copying data."""
if isinstance(new_valid_rows, np.ndarray) and new_valid_rows.dtype == np.bool_:
new_valid_rows = blosc2.asarray(new_valid_rows)
if not (
Expand All @@ -2448,6 +2469,7 @@ def view(self, new_valid_rows):
return CTable._make_view(self, new_valid_rows)

def head(self, N: int = 5) -> CTable:
"""Return a view of the first *N* live rows (default 5)."""
if N <= 0:
return self.view(blosc2.zeros(shape=len(self._valid_rows), dtype=np.bool_))
if self._n_rows <= N:
Expand All @@ -2468,6 +2490,7 @@ def head(self, N: int = 5) -> CTable:
return self.view(mask_arr)

def tail(self, N: int = 5) -> CTable:
"""Return a view of the last *N* live rows (default 5)."""
if N <= 0:
return self.view(blosc2.zeros(shape=len(self._valid_rows), dtype=np.bool_))
if self._n_rows <= N:
Expand Down Expand Up @@ -4253,6 +4276,7 @@ def __array__(self, dtype=None, copy=None):
return arr.copy() if copy else arr

def __getitem__(self, key):
"""Type-driven indexing: column name, boolean expression, row int/slice, mask, or column list."""
if isinstance(key, str):
if key in self._cols or key in self._computed_cols:
return Column(self, key)
Expand All @@ -4271,6 +4295,7 @@ def __getattr__(self, s: str):
# ------------------------------------------------------------------

def compact(self):
"""Rewrite all columns keeping only live rows, physically reclaiming deleted-row storage."""
if self._read_only:
raise ValueError("Table is read-only (opened with mode='r').")
if self.base is not None:
Expand Down Expand Up @@ -5656,6 +5681,7 @@ def _load_initial_data(self, new_data) -> None:
self.extend(new_data)

def append(self, data: list | np.void | np.ndarray) -> None:
"""Append a single row to the table."""
if self._read_only:
raise ValueError("Table is read-only (opened with mode='r').")
if self.base is not None:
Expand Down Expand Up @@ -5688,6 +5714,7 @@ def append(self, data: list | np.void | np.ndarray) -> None:
self._mark_all_indexes_stale()

def delete(self, ind: int | slice | str | Iterable) -> None:
"""Mark one or more rows as deleted (tombstone); physical storage is reclaimed by :meth:`compact`."""
if self._read_only:
raise ValueError("Table is read-only (opened with mode='r').")
if self.base is not None:
Expand All @@ -5710,6 +5737,7 @@ def delete(self, ind: int | slice | str | Iterable) -> None:
self._storage.bump_visibility_epoch()

def extend(self, data: list | CTable | Any, *, validate: bool | None = None) -> None: # noqa: C901
"""Append multiple rows at once from a dict of arrays, a list of rows, or another :class:`CTable`."""
if self._read_only:
raise ValueError("Table is read-only (opened with mode='r').")
if self.base is not None:
Expand Down
Loading