Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow selection of one or more members in an ensemble dataset #156

Merged
merged 6 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ datasets <building-introduction>`.
- :doc:`using/subsetting`
- :doc:`using/combining`
- :doc:`using/selecting`
- :doc:`using/ensembles`
- :doc:`using/grids`
- :doc:`using/zip`
- :doc:`using/statistics`
Expand All @@ -65,6 +66,7 @@ datasets <building-introduction>`.
using/subsetting
using/combining
using/selecting
using/ensembles
using/grids
using/zip
using/statistics
Expand Down
4 changes: 4 additions & 0 deletions docs/using/code/number1_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ds = open_dataset(
dataset,
number=1,
)
4 changes: 4 additions & 0 deletions docs/using/code/number2_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ds = open_dataset(
dataset,
number=[1, 3, 5],
)
27 changes: 27 additions & 0 deletions docs/using/ensembles.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. _selecting-members:

###################
Selecting members
###################

This section describes how to subset data that are part of an ensemble.
To combine ensembles, see :ref:`ensembles` in the
:ref:`combining-datasets` section.

.. _number:

If a dataset is an ensemble, you can select one or more specific members
using the `number` option. You can also use ``numbers`` (which is an
alias for ``number``), and ``member`` (or ``members``). The difference
between the two is that ``number`` is **1-based**, while ``member`` is
**0-based**.

Select a single element:

.. literalinclude:: code/number1_.py
:language: python

... or a list:

.. literalinclude:: code/number2_.py
:language: python
26 changes: 25 additions & 1 deletion docs/using/selecting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,28 @@ You can also rename variables:
This will be useful when you join datasets and do not want variables
from one dataset to override the ones from the other.

********
number
********

If a dataset is an ensemble, you can select one or more specific members
using the `number` option. You can also use ``numbers`` (which is an
alias for ``number``), and ``member`` (or ``members``). The difference
between the two is that ``number`` is **1-based**, while ``member`` is
**0-based**.

Select a single element:

.. literalinclude:: code/number1_.py
:language: python

... or a list:

.. literalinclude:: code/number2_.py
:language: python

.. _rescale:

*********
rescale
*********
Expand All @@ -87,7 +109,9 @@ rescale the data.
.. warning::

When providing units, the library assumes that the mapping between
them is a linear transformation. No check is does to ensure this is
them is a linear transformation. No check is done to ensure this is
the case.

.. _cfunits: https://github.com/NCAS-CMS/cfunits

.. _number:
10 changes: 10 additions & 0 deletions src/anemoi/datasets/data/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,16 @@ def __subset(self, **kwargs):
bbox = kwargs.pop("area")
return Cropping(self, bbox)._subset(**kwargs).mutate()

if "number" in kwargs or "numbers" or "member" in kwargs or "members" in kwargs:
from .ensemble import Number

members = {}
for key in ["number", "numbers", "member", "members"]:
if key in kwargs:
members[key] = kwargs.pop(key)

return Number(self, **members)._subset(**kwargs).mutate()

if "set_missing_dates" in kwargs:
from .missing import MissingDates

Expand Down
55 changes: 55 additions & 0 deletions src/anemoi/datasets/data/ensemble.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,68 @@

import logging

import numpy as np

from .debug import Node
from .forwards import Forwards
from .forwards import GivenAxis
from .indexing import apply_index_to_slices_changes
from .indexing import index_to_slices
from .indexing import update_tuple
from .misc import _auto_adjust
from .misc import _open

LOG = logging.getLogger(__name__)

OFFSETS = dict(number=1, numbers=1, member=0, members=0)


class Number(Forwards):
def __init__(self, forward, **kwargs):
super().__init__(forward)

self.members = []
for key, values in kwargs.items():
if not isinstance(values, (list, tuple)):
values = [values]
self.members.extend([int(v) - OFFSETS[key] for v in values])

self.members = sorted(set(self.members))
for n in self.members:
if not (0 <= n < forward.shape[2]):
raise ValueError(f"Member {n} is out of range. `number(s)` is one-based, `member(s)` is zero-based.")

self.mask = np.array([n in self.members for n in range(forward.shape[2])], dtype=bool)
self._shape, _ = update_tuple(forward.shape, 2, len(self.members))

@property
def shape(self):
return self._shape

def __getitem__(self, index):
if isinstance(index, int):
result = self.forward[index]
result = result[:, self.mask, :]
return result

if isinstance(index, slice):
result = self.forward[index]
result = result[:, :, self.mask, :]
return result

index, changes = index_to_slices(index, self.shape)
result = self.forward[index]
result = result[:, :, self.mask, :]
return apply_index_to_slices_changes(result, changes)

def tree(self):
return Node(self, [self.forward.tree()], numbers=[n + 1 for n in self.members])

def metadata_specific(self):
return {
"numbers": [n + 1 for n in self.members],
}


class Ensemble(GivenAxis):
def tree(self):
Expand Down
3 changes: 3 additions & 0 deletions src/anemoi/datasets/data/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,9 @@ def check_compatibility(self, d1, d2):
def tree(self):
return Node(self, [d.tree() for d in self.datasets], allow_gaps_in_dates=self.allow_gaps_in_dates)

def metadata_specific(self):
return {"allow_gaps_in_dates": self.allow_gaps_in_dates}

@debug_indexing
def __getitem__(self, n):
if isinstance(n, tuple):
Expand Down
Loading