Skip to content

reorganized stats/chunk containers#2999

Open
TjarkMiener wants to merge 2 commits into
mainfrom
container_maintenance
Open

reorganized stats/chunk containers#2999
TjarkMiener wants to merge 2 commits into
mainfrom
container_maintenance

Conversation

@TjarkMiener
Copy link
Copy Markdown
Member

We noticed in #2996 that containers dealing with chunks and stats needed some maintenance and docstring polishing.

Copy link
Copy Markdown
Member

@kosack kosack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just need a changelog entry

@ctao-sonarqube
Copy link
Copy Markdown

Comment thread src/ctapipe/containers.py
Comment thread src/ctapipe/containers.py
Comment on lines +1262 to +1267
class ChunkStatisticsContainer(ChunkContainer):
"""Container for descriptive statistics of the chunk distribution"""

mean = Field(None, "mean value of the chunk distribution")
median = Field(None, "median value of the chunk distribution")
std = Field(None, "standard deviation of the chunk distribution")
Copy link
Copy Markdown
Member Author

@TjarkMiener TjarkMiener Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of replacing this duplication by

class ChunkStatisticsContainer(ChunkContainer):
    """Container for descriptive statistics of the chunk distribution"""
    stats = Field(
        default_factory=StatisticsContainer,
        description="Statistical description of the chunk distribution",
    )

which would be nice but would also lead to breaking changes here and elsewhere.
@mexanick @maxnoe @kosack

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we do elsewhere is have separate containers for the index and the data and write multiple containers to the same table.

I.e. you could have the ChunkContainer, StatsContainer and a HistogramContainer and write like this:

writer.write(table, (chunk_container, stats_container))

like this, you also don't need separate containers for the interpolated result and the chunk storage.

Copy link
Copy Markdown
Member

@maxnoe maxnoe Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See e.g.:

self._writer.write(
table_name="simulation/event/subarray/shower",
containers=[event.index, event.simulation.shower],
)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think n_events should be moved to the StatisticsContainer and then compute_stats() from the PlainAggregator and SigmaClippingAggregator should return StatisticsContainer and the compute_histo() from HistorgramAggregator from #2996 should return a HistogramContainer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh, that seems a bit weird. the n_events is a property of the chunk and should be the same for all chunk aggregations, regardless of whether you compute a histogram or stats or something else.

n_events should be really the number of events inside the chunk, which is very interesting in case of time-based chunks.

It should also be independent from which values are actually used due to e.g. outlier detection, sigma clipping or under/overflow.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the number of events are related to the chunks and the number of entries are related to the number of values used in the aggregation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants