reorganized stats/chunk containers#2999
Conversation
kosack
left a comment
There was a problem hiding this comment.
Looks good, just need a changelog entry
|
| class ChunkStatisticsContainer(ChunkContainer): | ||
| """Container for descriptive statistics of the chunk distribution""" | ||
|
|
||
| mean = Field(None, "mean value of the chunk distribution") | ||
| median = Field(None, "median value of the chunk distribution") | ||
| std = Field(None, "standard deviation of the chunk distribution") |
There was a problem hiding this comment.
I was thinking of replacing this duplication by
class ChunkStatisticsContainer(ChunkContainer):
"""Container for descriptive statistics of the chunk distribution"""
stats = Field(
default_factory=StatisticsContainer,
description="Statistical description of the chunk distribution",
)
which would be nice but would also lead to breaking changes here and elsewhere.
@mexanick @maxnoe @kosack
There was a problem hiding this comment.
What we do elsewhere is have separate containers for the index and the data and write multiple containers to the same table.
I.e. you could have the ChunkContainer, StatsContainer and a HistogramContainer and write like this:
writer.write(table, (chunk_container, stats_container))
like this, you also don't need separate containers for the interpolated result and the chunk storage.
There was a problem hiding this comment.
See e.g.:
ctapipe/src/ctapipe/io/datawriter.py
Lines 309 to 312 in 8705606
There was a problem hiding this comment.
I think n_events should be moved to the StatisticsContainer and then compute_stats() from the PlainAggregator and SigmaClippingAggregator should return StatisticsContainer and the compute_histo() from HistorgramAggregator from #2996 should return a HistogramContainer
There was a problem hiding this comment.
mmh, that seems a bit weird. the n_events is a property of the chunk and should be the same for all chunk aggregations, regardless of whether you compute a histogram or stats or something else.
n_events should be really the number of events inside the chunk, which is very interesting in case of time-based chunks.
It should also be independent from which values are actually used due to e.g. outlier detection, sigma clipping or under/overflow.
There was a problem hiding this comment.
right, the number of events are related to the chunks and the number of entries are related to the number of values used in the aggregation.




We noticed in #2996 that containers dealing with chunks and stats needed some maintenance and docstring polishing.