Skip to content

Commit 6b45130

Browse files
authored
Merge pull request #59 from fema-ffrd/feature/zmeta
Zarr Metadata
2 parents 000829f + 497bdeb commit 6b45130

19 files changed

+19205
-43
lines changed

docs/source/API.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
API
2+
===
3+
.. toctree::
4+
:maxdepth: 1
5+
6+
RasGeomHdf
7+
RasPlanHdf
8+
RasHdf
9+
10+
:code:`rashdf` provides two primary classes for reading data from
11+
HEC-RAS geometry and plan HDF files: :code:`RasGeomHdf` and :code:`RasPlanHdf`.
12+
Both of these classes inherit from the :code:`RasHdf` base class, which
13+
inherits from the :code:`h5py.File` class.
14+
15+
Note that :code:`RasPlanHdf` inherits from :code:`RasGeomHdf`, so all of the
16+
methods available in :code:`RasGeomHdf` are also available in :code:`RasPlanHdf`.

docs/source/Advanced.rst

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
Advanced
2+
========
3+
:code:`rashdf` provides convenience methods for generating
4+
Zarr metadata for HEC-RAS HDF5 files. This is particularly useful
5+
for working with stochastic ensemble simulations, where many
6+
HEC-RAS HDF5 files are generated for different model realizations,
7+
forcing scenarios, or other sources of uncertainty.
8+
9+
To illustrate this, consider a set of HEC-RAS HDF5 files stored
10+
in an S3 bucket, where each file represents a different simulation
11+
of a river model. We can generate Zarr metadata for each simulation
12+
and then combine the metadata into a single Kerchunk metadata file
13+
that includes a new "sim" dimension. This combined metadata file
14+
can then be used to open a single Zarr dataset that includes all
15+
simulations.
16+
17+
The cell timeseries output for a single simulation might look
18+
something like this::
19+
20+
>>> from rashdf import RasPlanHdf
21+
>>> plan_hdf = RasPlanHdf.open_uri("s3://bucket/simulations/1/BigRiver.p01.hdf")
22+
>>> plan_hdf.mesh_cells_timeseries_output("BigRiverMesh1")
23+
<xarray.Dataset> Size: 66MB
24+
Dimensions: (time: 577, cell_id: 14188)
25+
Coordinates:
26+
* time (time) datetime64[ns] 5kB 1996-01-14...
27+
* cell_id (cell_id) int64 114kB 0 1 ... 14187
28+
Data variables:
29+
Water Surface (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
30+
Cell Cumulative Precipitation Depth (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
31+
Attributes:
32+
mesh_name: BigRiverMesh1
33+
34+
Note that the example below requires installation of the optional
35+
libraries :code:`kerchunk`, :code:`zarr`, :code:`fsspec`, and :code:`s3fs`::
36+
37+
from rashdf import RasPlanHdf
38+
from kerchunk.combine import MultiZarrToZarr
39+
import json
40+
41+
# Example S3 URL pattern for HEC-RAS plan HDF5 files
42+
s3_url_pattern = "s3://bucket/simulations/{sim}/BigRiver.p01.hdf"
43+
44+
zmeta_files = []
45+
sims = list(range(1, 11))
46+
47+
# Generate Zarr metadata for each simulation
48+
for sim in sims:
49+
s3_url = s3_url_pattern.format(sim=sim)
50+
plan_hdf = RasPlanHdf.open_uri(s3_url)
51+
zmeta = plan_hdf.zmeta_mesh_cells_timeseries_output("BigRiverMesh1")
52+
json_file = f"BigRiver.{sim}.p01.hdf.json"
53+
with open(json_file, "w") as f:
54+
json.dump(zmeta, f)
55+
zmeta_files.append(json_file)
56+
57+
# Combine Zarr metadata files into a single Kerchunk metadata file
58+
# with a new "sim" dimension
59+
mzz = MultiZarrToZarr(zmeta_files, concat_dims=["sim"], coo_map={"sim": sims})
60+
mzz_dict = mss.translate()
61+
62+
with open("BigRiver.combined.p01.json", "w") as f:
63+
json.dump(mzz_dict, f)
64+
65+
Now, we can open the combined dataset with :code:`xarray`::
66+
67+
import xarray as xr
68+
69+
ds = xr.open_dataset(
70+
"reference://",
71+
engine="zarr",
72+
backend_kwargs={
73+
"consolidated": False,
74+
"storage_options": {"fo": "BigRiver.combined.p01.json"},
75+
},
76+
chunks="auto",
77+
)
78+
79+
The resulting combined dataset includes a new :code:`sim` dimension::
80+
81+
<xarray.Dataset> Size: 674MB
82+
Dimensions: (sim: 10, time: 577, cell_id: 14606)
83+
Coordinates:
84+
* cell_id (cell_id) int64 117kB 0 1 ... 14605
85+
* sim (sim) int64 80B 1 2 3 4 5 6 7 8 9 10
86+
* time (time) datetime64[ns] 5kB 1996-01-14...
87+
Data variables:
88+
Cell Cumulative Precipitation Depth (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray>
89+
Water Surface (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray>
90+
Attributes:
91+
mesh_name: BigRiverMesh1

docs/source/RasGeomHdf.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
RasGeomHdf
22
==========
3+
34
.. currentmodule:: rashdf
45
.. autoclass:: RasGeomHdf
56
:show-inheritance:
@@ -21,6 +22,6 @@ RasGeomHdf
2122
get_geom_structures_attrs,
2223
get_geom_2d_flow_area_attrs,
2324
cross_sections_elevations,
24-
cross_sections
25+
cross_sections,
2526
river_reaches
2627

docs/source/RasPlanHdf.rst

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ RasPlanHdf
1313
mesh_max_ws_err,
1414
mesh_max_iter,
1515
mesh_last_iter,
16+
mesh_cells_summary_output,
17+
mesh_faces_summary_output,
18+
mesh_cells_timeseries_output,
19+
mesh_faces_timeseries_output,
1620
reference_lines,
1721
reference_lines_names,
1822
reference_points,
@@ -31,4 +35,8 @@ RasPlanHdf
3135
cross_sections_flow,
3236
cross_sections_wsel,
3337
steady_flow_names,
34-
steady_profile_xs_output
38+
steady_profile_xs_output,
39+
zmeta_mesh_cells_timeseries_output,
40+
zmeta_mesh_faces_timeseries_output,
41+
zmeta_reference_lines_timeseries_output,
42+
zmeta_reference_points_timeseries_output

docs/source/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@
3030
templates_path = ["_templates"]
3131
exclude_patterns = []
3232

33+
master_doc = "index"
34+
3335

3436
# -- Options for HTML output -------------------------------------------------
3537
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

docs/source/index.rst

Lines changed: 6 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@ HDF5 files. It is a wrapper around the :code:`h5py` library, and provides an int
1111
convenience functions for reading key HEC-RAS geometry data, output data,
1212
and metadata.
1313

14+
.. toctree::
15+
:maxdepth: 2
16+
17+
API
18+
Advanced
19+
1420
Installation
1521
============
1622
With :code:`pip`::
@@ -82,21 +88,3 @@ credentials)::
8288
'Simulation Start Time': datetime.datetime(1996, 1, 14, 12, 0),
8389
'Time Window': [datetime.datetime(1996, 1, 14, 12, 0),
8490
datetime.datetime(1996, 2, 7, 12, 0)]}
85-
86-
87-
API
88-
===
89-
.. toctree::
90-
:maxdepth: 1
91-
92-
RasGeomHdf
93-
RasPlanHdf
94-
RasHdf
95-
96-
:code:`rashdf` provides two primary classes for reading data from
97-
HEC-RAS geometry and plan HDF files: :code:`RasGeomHdf` and :code:`RasPlanHdf`.
98-
Both of these classes inherit from the :code:`RasHdf` base class, which
99-
inherits from the :code:`h5py.File` class.
100-
101-
Note that :code:`RasPlanHdf` inherits from :code:`RasGeomHdf`, so all of the
102-
methods available in :code:`RasGeomHdf` are also available in :code:`RasPlanHdf`.

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ classifiers = [
1212
"Programming Language :: Python :: 3.11",
1313
"Programming Language :: Python :: 3.12",
1414
]
15-
version = "0.5.0"
15+
version = "0.6.0"
1616
dependencies = ["h5py", "geopandas>=1.0,<2.0", "pyarrow", "xarray"]
1717

1818
[project.optional-dependencies]
19-
dev = ["pre-commit", "ruff", "pytest", "pytest-cov", "fiona"]
19+
dev = ["pre-commit", "ruff", "pytest", "pytest-cov", "fiona", "kerchunk", "zarr", "dask", "fsspec", "s3fs"]
2020
docs = ["sphinx", "numpydoc", "sphinx_rtd_theme"]
2121

2222
[project.urls]

src/rashdf/base.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ def __init__(self, name: str, **kwargs):
1919
Additional keyword arguments to pass to h5py.File
2020
"""
2121
super().__init__(name, mode="r", **kwargs)
22+
self._loc = name
2223

2324
@classmethod
2425
def open_uri(
@@ -49,7 +50,9 @@ def open_uri(
4950
import fsspec
5051

5152
remote_file = fsspec.open(uri, mode="rb", **fsspec_kwargs)
52-
return cls(remote_file.open(), **h5py_kwargs)
53+
result = cls(remote_file.open(), **h5py_kwargs)
54+
result._loc = uri
55+
return result
5356

5457
def get_attrs(self, attr_path: str) -> Dict:
5558
"""Convert attributes from a HEC-RAS HDF file into a Python dictionary for a given attribute path.

0 commit comments

Comments
 (0)