Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/changes/2999.maintenance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Reorganized the chunk and statistics containers.
52 changes: 20 additions & 32 deletions src/ctapipe/containers.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
"TelescopePointingContainer",
"ArrayPointingContainer",
"StatisticsContainer",
"ChunkContainer",
"ChunkStatisticsContainer",
"ImageStatisticsContainer",
"IntensityStatisticsContainer",
Expand Down Expand Up @@ -1237,46 +1238,33 @@ class CameraCalibrationContainer(Container):


class StatisticsContainer(Container):
Comment thread
maxnoe marked this conversation as resolved.
"""Store descriptive statistics of a pixel-wise quantity for each channel"""
"""Container for descriptive statistics"""

mean = Field(
None,
"mean of a pixel-wise quantity for each channel"
"Type: float; Shape: (n_channels, n_pixel)",
)
median = Field(
None,
"median of a pixel-wise quantity for each channel"
"Type: float; Shape: (n_channels, n_pixel)",
)
std = Field(
None,
"standard deviation of a pixel-wise quantity for each channel"
"Type: float; Shape: (n_channels, n_pixel)",
)
n_events = Field(-1, "number of events used for the extraction of the statistics")
outlier_mask = Field(
None,
"Boolean mask indicating which pixels are considered outliers."
" Shape: (n_channels, n_pixels)",
)
is_valid = Field(
False,
(
"True if the pixel statistics are valid, False if they are not valid or "
"if a high fraction of faulty pixels exceeding the pre-defined threshold "
"is detected across the chunk of images."
),
)
mean = Field(None, "mean value")
median = Field(None, "median value")
std = Field(None, "standard deviation")


class ChunkStatisticsContainer(StatisticsContainer):
"""Store descriptive statistics of a chunk of images"""
class ChunkContainer(Container):
"""Store values of a chunk"""

time_start = Field(NAN_TIME, "high resolution start time of the chunk")
time_end = Field(NAN_TIME, "high resolution end time of the chunk")
event_id_start = Field(None, "event id of the first event of the chunk")
event_id_end = Field(None, "event id of the last event of the chunk")
n_events = Field(
-1, "number of events used for the calculation of the chunk values"
)
outlier_mask = Field(None, "boolean mask indicating outliers in the chunk")
is_valid = Field(False, "true if chunk values are valid")


class ChunkStatisticsContainer(ChunkContainer):
"""Container for descriptive statistics of the chunk distribution"""

mean = Field(None, "mean value of the chunk distribution")
median = Field(None, "median value of the chunk distribution")
std = Field(None, "standard deviation of the chunk distribution")
Comment on lines +1262 to +1267
Copy link
Copy Markdown
Member Author

@TjarkMiener TjarkMiener Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of replacing this duplication by

class ChunkStatisticsContainer(ChunkContainer):
    """Container for descriptive statistics of the chunk distribution"""
    stats = Field(
        default_factory=StatisticsContainer,
        description="Statistical description of the chunk distribution",
    )

which would be nice but would also lead to breaking changes here and elsewhere.
@mexanick @maxnoe @kosack

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we do elsewhere is have separate containers for the index and the data and write multiple containers to the same table.

I.e. you could have the ChunkContainer, StatsContainer and a HistogramContainer and write like this:

writer.write(table, (chunk_container, stats_container))

like this, you also don't need separate containers for the interpolated result and the chunk storage.

Copy link
Copy Markdown
Member

@maxnoe maxnoe Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See e.g.:

self._writer.write(
table_name="simulation/event/subarray/shower",
containers=[event.index, event.simulation.shower],
)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think n_events should be moved to the StatisticsContainer and then compute_stats() from the PlainAggregator and SigmaClippingAggregator should return StatisticsContainer and the compute_histo() from HistorgramAggregator from #2996 should return a HistogramContainer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh, that seems a bit weird. the n_events is a property of the chunk and should be the same for all chunk aggregations, regardless of whether you compute a histogram or stats or something else.

n_events should be really the number of events inside the chunk, which is very interesting in case of time-based chunks.

It should also be independent from which values are actually used due to e.g. outlier detection, sigma clipping or under/overflow.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the number of events are related to the chunks and the number of entries are related to the number of values used in the aggregation.



class PixelStatisticsContainer(Container):
Expand Down
Loading