|
| 1 | +Advanced |
| 2 | +======== |
| 3 | +:code:`rashdf` provides convenience methods for generating |
| 4 | +Zarr metadata for HEC-RAS HDF5 files. This is particularly useful |
| 5 | +for working with stochastic ensemble simulations, where many |
| 6 | +HEC-RAS HDF5 files are generated for different model realizations, |
| 7 | +forcing scenarios, or other sources of uncertainty. |
| 8 | + |
| 9 | +To illustrate this, consider a set of HEC-RAS HDF5 files stored |
| 10 | +in an S3 bucket, where each file represents a different simulation |
| 11 | +of a river model. We can generate Zarr metadata for each simulation |
| 12 | +and then combine the metadata into a single Kerchunk metadata file |
| 13 | +that includes a new "sim" dimension. This combined metadata file |
| 14 | +can then be used to open a single Zarr dataset that includes all |
| 15 | +simulations. |
| 16 | + |
| 17 | +The cell timeseries output for a single simulation might look |
| 18 | +something like this:: |
| 19 | + |
| 20 | + >>> from rashdf import RasPlanHdf |
| 21 | + >>> plan_hdf = RasPlanHdf.open_uri("s3://bucket/simulations/1/BigRiver.p01.hdf") |
| 22 | + >>> plan_hdf.mesh_cells_timeseries_output("BigRiverMesh1") |
| 23 | + <xarray.Dataset> Size: 66MB |
| 24 | + Dimensions: (time: 577, cell_id: 14188) |
| 25 | + Coordinates: |
| 26 | + * time (time) datetime64[ns] 5kB 1996-01-14... |
| 27 | + * cell_id (cell_id) int64 114kB 0 1 ... 14187 |
| 28 | + Data variables: |
| 29 | + Water Surface (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray> |
| 30 | + Cell Cumulative Precipitation Depth (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray> |
| 31 | + Attributes: |
| 32 | + mesh_name: BigRiverMesh1 |
| 33 | + |
| 34 | +Note that the example below requires installation of the optional |
| 35 | +libraries :code:`kerchunk`, :code:`zarr`, :code:`fsspec`, and :code:`s3fs`:: |
| 36 | + |
| 37 | + from rashdf import RasPlanHdf |
| 38 | + from kerchunk.combine import MultiZarrToZarr |
| 39 | + import json |
| 40 | + |
| 41 | + # Example S3 URL pattern for HEC-RAS plan HDF5 files |
| 42 | + s3_url_pattern = "s3://bucket/simulations/{sim}/BigRiver.p01.hdf" |
| 43 | + |
| 44 | + zmeta_files = [] |
| 45 | + sims = list(range(1, 11)) |
| 46 | + |
| 47 | + # Generate Zarr metadata for each simulation |
| 48 | + for sim in sims: |
| 49 | + s3_url = s3_url_pattern.format(sim=sim) |
| 50 | + plan_hdf = RasPlanHdf.open_uri(s3_url) |
| 51 | + zmeta = plan_hdf.zmeta_mesh_cells_timeseries_output("BigRiverMesh1") |
| 52 | + json_file = f"BigRiver.{sim}.p01.hdf.json" |
| 53 | + with open(json_file, "w") as f: |
| 54 | + json.dump(zmeta, f) |
| 55 | + zmeta_files.append(json_file) |
| 56 | + |
| 57 | + # Combine Zarr metadata files into a single Kerchunk metadata file |
| 58 | + # with a new "sim" dimension |
| 59 | + mzz = MultiZarrToZarr(zmeta_files, concat_dims=["sim"], coo_map={"sim": sims}) |
| 60 | + mzz_dict = mss.translate() |
| 61 | + |
| 62 | + with open("BigRiver.combined.p01.json", "w") as f: |
| 63 | + json.dump(mzz_dict, f) |
| 64 | + |
| 65 | +Now, we can open the combined dataset with :code:`xarray`:: |
| 66 | + |
| 67 | + import xarray as xr |
| 68 | + |
| 69 | + ds = xr.open_dataset( |
| 70 | + "reference://", |
| 71 | + engine="zarr", |
| 72 | + backend_kwargs={ |
| 73 | + "consolidated": False, |
| 74 | + "storage_options": {"fo": "BigRiver.combined.p01.json"}, |
| 75 | + }, |
| 76 | + chunks="auto", |
| 77 | + ) |
| 78 | + |
| 79 | +The resulting combined dataset includes a new :code:`sim` dimension:: |
| 80 | + |
| 81 | + <xarray.Dataset> Size: 674MB |
| 82 | + Dimensions: (sim: 10, time: 577, cell_id: 14606) |
| 83 | + Coordinates: |
| 84 | + * cell_id (cell_id) int64 117kB 0 1 ... 14605 |
| 85 | + * sim (sim) int64 80B 1 2 3 4 5 6 7 8 9 10 |
| 86 | + * time (time) datetime64[ns] 5kB 1996-01-14... |
| 87 | + Data variables: |
| 88 | + Cell Cumulative Precipitation Depth (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray> |
| 89 | + Water Surface (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray> |
| 90 | + Attributes: |
| 91 | + mesh_name: BigRiverMesh1 |
0 commit comments