Skip to content

Commit bbd9823

Browse files
committed
Add sections in tutorials and examples
1 parent c9deed3 commit bbd9823

2 files changed

Lines changed: 116 additions & 47 deletions

File tree

doc/getting_started/tutorials/13.containers.ipynb

Lines changed: 70 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,7 @@
55
"id": "cell-01",
66
"metadata": {},
77
"source": [
8-
"# Working with Containers\n",
9-
"\n",
10-
"This notebook is a guided tour of the main data containers in `python-blosc2`.\n",
11-
"\n",
12-
"The goal is to build a practical mental model first: what each container is, how the containers relate, and when each one is the right tool.\n",
13-
"\n",
14-
"We will cover these containers in this order:\n",
15-
"\n",
16-
"1. `SChunk`\n",
17-
"2. `NDArray`\n",
18-
"3. `ObjectArray`\n",
19-
"4. `BatchArray`\n",
20-
"5. `EmbedStore`\n",
21-
"6. `DictStore`\n",
22-
"7. `TreeStore`\n",
23-
"8. `C2Array`"
8+
"# Working with Containers\n\nThis notebook is a guided tour of the main data containers in `python-blosc2`.\n\nThe goal is to build a practical mental model first: what each container is, how the containers relate, and when each one is the right tool.\n\nWe will cover these containers in this order:\n\n1. `SChunk`\n2. `NDArray`\n3. `ObjectArray`\n4. `BatchArray`\n5. `EmbedStore`\n6. `DictStore`\n7. `TreeStore` (including inline `CTable` support)\n8. `C2Array`"
249
]
2510
},
2611
{
@@ -444,6 +429,73 @@
444429
" show(\"/exp/run2/data\", tstore[\"/exp/run2/data\"][:])"
445430
]
446431
},
432+
{
433+
"cell_type": "markdown",
434+
"id": "cell-17b",
435+
"metadata": {},
436+
"source": [
437+
"### Storing CTables inside a TreeStore\n",
438+
"\n",
439+
"A `TreeStore` can hold **both NDArrays and CTables** in the same bundle. A `CTable` is stored inline as a named subtree — all its columns, metadata, and index sidecars live as ordinary Blosc2 leaves inside the outer store. From the outside it appears as a single key, exactly like any other leaf:\n",
440+
"\n",
441+
"* `ts[\"/table\"] = ctable` — stores the CTable inline (same syntax as NDArray).\n",
442+
"* `ts[\"/table\"]` — returns a `CTable` object transparently.\n",
443+
"* `\"/table/_meta\" not in ts` — internal keys are hidden from normal traversal.\n",
444+
"* `del ts[\"/table\"]` — removes the whole object and all its leaves at once.\n",
445+
"\n",
446+
"The inline layout means there are **no nested ZIP files**: all leaves are flat members of the outer `.b2z` archive and can be opened by offset without extraction."
447+
]
448+
},
449+
{
450+
"cell_type": "code",
451+
"execution_count": null,
452+
"id": "cell-17c",
453+
"metadata": {},
454+
"outputs": [],
455+
"source": [
456+
"from dataclasses import dataclass\n",
457+
"\n",
458+
"\n",
459+
"@dataclass\n",
460+
"class Reading:\n",
461+
" sensor_id: int = 0\n",
462+
" value: float = 0.0\n",
463+
"\n",
464+
"\n",
465+
"bundle_path = reset(\"bundle.b2z\")\n",
466+
"\n",
467+
"# --- Write: mix NDArrays and CTables in one bundle ----------------------\n",
468+
"t = blosc2.CTable(Reading)\n",
469+
"for i in range(6):\n",
470+
" t.append(Reading(sensor_id=i, value=round(i * 1.1, 2)))\n",
471+
"\n",
472+
"with blosc2.TreeStore(bundle_path, mode=\"w\") as ts:\n",
473+
" ts[\"/raw/signal\"] = np.arange(8, dtype=np.float32)\n",
474+
" ts[\"/tables/readings\"] = t # CTable stored inline\n",
475+
" show(\"keys after write\", sorted(ts.keys()))\n",
476+
" show(\"/tables/readings/_meta in ts (hidden)\", \"/tables/readings/_meta\" in ts)\n",
477+
"\n",
478+
"# --- Read back from the .b2z archive ------------------------------------\n",
479+
"with blosc2.open(bundle_path, mode=\"r\") as ts:\n",
480+
" readings = ts[\"/tables/readings\"] # returns CTable transparently\n",
481+
" show(\"type\", type(readings).__name__)\n",
482+
" show(\"rows\", len(readings))\n",
483+
" show(\"sensor_id\", list(readings[\"sensor_id\"][:]))\n",
484+
" show(\"value\", list(readings[\"value\"][:]))\n",
485+
"\n",
486+
"# --- Append a row in-place (append mode) --------------------------------\n",
487+
"with blosc2.TreeStore(bundle_path, mode=\"a\") as ts:\n",
488+
" r = ts[\"/tables/readings\"]\n",
489+
" r.append(Reading(sensor_id=99, value=-1.0))\n",
490+
" r.close() # optional; outer store also closes it on __exit__\n",
491+
" show(\"rows after append\", len(ts[\"/tables/readings\"]))\n",
492+
"\n",
493+
"# --- Delete the CTable (all internal leaves removed) -------------------\n",
494+
"with blosc2.TreeStore(bundle_path, mode=\"a\") as ts:\n",
495+
" del ts[\"/tables/readings\"]\n",
496+
" show(\"keys after delete\", sorted(ts.keys()))"
497+
]
498+
},
447499
{
448500
"cell_type": "markdown",
449501
"id": "cell-18",
@@ -494,43 +546,15 @@
494546
"id": "cell-20",
495547
"metadata": {},
496548
"source": [
497-
"## Choosing The Right Container\n",
498-
"\n",
499-
"| Container | Backing idea | Best for |\n",
500-
"| --- | --- | --- |\n",
501-
"| `SChunk` | raw compressed chunks | direct chunk-level storage control |\n",
502-
"| `NDArray` | `SChunk` plus array metadata | dense numeric arrays |\n",
503-
"| `ObjectArray` | one variable-length entry per chunk | ragged or heterogeneous Python values |\n",
504-
"| `BatchArray` | one batch per chunk | batch-oriented ingestion and access |\n",
505-
"| `EmbedStore` | one bundled object store | packaging a few Blosc2 objects together |\n",
506-
"| `DictStore` | keyed collection of leaves | portable multi-object datasets |\n",
507-
"| `TreeStore` | hierarchical keyed collection | tree-structured datasets |\n",
508-
"| `C2Array` | remote array handle | arrays hosted by a remote Caterva2 service |\n",
509-
"\n",
510-
"A simple rule of thumb is:\n",
511-
"\n",
512-
"- start with `NDArray` for dense numeric data\n",
513-
"- drop down to `SChunk` if you need chunk-level control\n",
514-
"- use `ObjectArray` or `BatchArray` for variable-length Python objects\n",
515-
"- use `EmbedStore`, `DictStore`, or `TreeStore` when your dataset contains multiple objects"
549+
"## Choosing The Right Container\n\n| Container | Backing idea | Best for |\n| --- | --- | --- |\n| `SChunk` | raw compressed chunks | direct chunk-level storage control |\n| `NDArray` | `SChunk` plus array metadata | dense numeric arrays |\n| `ObjectArray` | one variable-length entry per chunk | ragged or heterogeneous Python values |\n| `BatchArray` | one batch per chunk | batch-oriented ingestion and access |\n| `EmbedStore` | one bundled object store | packaging a few Blosc2 objects together |\n| `DictStore` | keyed collection of leaves | portable multi-object datasets |\n| `TreeStore` | hierarchical keyed collection | tree-structured datasets with NDArrays and/or CTables |\n| `C2Array` | remote array handle | arrays hosted by a remote Caterva2 service |\n\nA simple rule of thumb is:\n\n- start with `NDArray` for dense numeric data\n- drop down to `SChunk` if you need chunk-level control\n- use `ObjectArray` or `BatchArray` for variable-length Python objects\n- use `EmbedStore`, `DictStore`, or `TreeStore` when your dataset contains multiple objects"
516550
]
517551
},
518552
{
519553
"cell_type": "markdown",
520554
"id": "cell-21",
521555
"metadata": {},
522556
"source": [
523-
"## Final Notes\n",
524-
"\n",
525-
"This notebook is intentionally organized from low-level storage to higher-level organization:\n",
526-
"\n",
527-
"- understand `SChunk` first\n",
528-
"- use `NDArray` for most dense numeric workloads\n",
529-
"- move to `ObjectArray` or `BatchArray` when entries stop being fixed-size arrays\n",
530-
"- use `EmbedStore`, `DictStore`, or `TreeStore` when you need to package multiple objects together\n",
531-
"- use `C2Array` when the data lives on a remote service\n",
532-
"\n",
533-
"For deeper details on a specific class, continue with the reference docs and the dedicated tutorials for `ObjectArray`, `BatchArray`, and indexing."
557+
"## Final Notes\n\nThis notebook is intentionally organized from low-level storage to higher-level organization:\n\n- understand `SChunk` first\n- use `NDArray` for most dense numeric workloads\n- move to `ObjectArray` or `BatchArray` when entries stop being fixed-size arrays\n- use `EmbedStore`, `DictStore`, or `TreeStore` when you need to package multiple objects together\n- use `TreeStore` + `CTable` together when your bundle mixes dense arrays with structured tables\n- use `C2Array` when the data lives on a remote service\n\nFor deeper details on a specific class, continue with the reference docs and the dedicated tutorials for `ObjectArray`, `BatchArray`, and indexing."
534558
]
535559
},
536560
{

examples/tree-store.py

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55
# SPDX-License-Identifier: BSD-3-Clause
66
#######################################################################
77

8-
# Example usage of TreeStore with hierarchical navigation and vlmeta
8+
# Example usage of TreeStore with hierarchical navigation, vlmeta, and CTables
9+
10+
from dataclasses import dataclass
911

1012
import numpy as np
1113

@@ -66,3 +68,46 @@
6668
rsub = tstore2["/child0"]
6769
print("/child0/new_leaf via subtree:", rsub["/new_leaf"][:])
6870
print(f"TreeStore file at: {tstore2.localpath}")
71+
72+
# ---------------------------------------------------------------------------
73+
# Mixing NDArrays and CTables in the same TreeStore
74+
# ---------------------------------------------------------------------------
75+
76+
77+
@dataclass
78+
class Reading:
79+
sensor_id: int = 0
80+
value: float = 0.0
81+
82+
83+
with blosc2.TreeStore("example_tree.b2z", mode="a") as ts:
84+
# Create a small CTable in memory and store it inline
85+
t = blosc2.CTable(Reading)
86+
for i in range(5):
87+
t.append(Reading(sensor_id=i, value=float(i) * 1.1))
88+
89+
# Assignment syntax is identical to NDArray
90+
ts["/readings"] = t
91+
print("Keys after adding CTable:", sorted(ts.keys()))
92+
93+
# Object internals are hidden from normal traversal
94+
print("/readings/_meta in ts:", "/readings/_meta" in ts) # False
95+
96+
with blosc2.open("example_tree.b2z", mode="r") as ts:
97+
# CTable is returned transparently; no special open call needed
98+
readings = ts["/readings"]
99+
print(f"CTable type: {type(readings).__name__}, rows: {len(readings)}")
100+
print("sensor_id column:", list(readings["sensor_id"][:]))
101+
print("value column :", list(readings["value"][:]))
102+
103+
# Append a row to an inline CTable via append mode
104+
with blosc2.TreeStore("example_tree.b2z", mode="a") as ts:
105+
readings = ts["/readings"]
106+
readings.append(Reading(sensor_id=99, value=-1.0))
107+
readings.close() # explicit close before outer store repacks
108+
print("After append, rows:", len(ts["/readings"]))
109+
110+
# Delete the CTable object root (removes all internal leaves)
111+
with blosc2.TreeStore("example_tree.b2z", mode="a") as ts:
112+
del ts["/readings"]
113+
print("After deleting /readings:", sorted(ts.keys()))

0 commit comments

Comments
 (0)