-
Notifications
You must be signed in to change notification settings - Fork 278
added HistogramsAggregator #2996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: container_maintenance
Are you sure you want to change the base?
Changes from all commits
b19566a
0a13a75
cfa8e35
ed35129
a29981b
4348f4b
4fb91e6
acb1ff5
aad09a0
819f417
358d028
958a8f2
397ffbb
bc12780
94fd377
d3c8dbf
e23ca0a
69b54b0
bf20fe0
2d8d092
d79757c
320eb75
ec1403e
cd47153
3eacb9b
35d1918
408f51e
62d4b60
3e5f00e
aa3483a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Added a new class HistogramAggregator to compute histograms along a specified axis, and updated the documentation to reflect this new functionality. The documentation includes examples of how to use the HistogramsAggregator in practice. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,248 @@ | ||
| """ | ||
| Histogram aggregation with HistogramAggregator | ||
| ============================================== | ||
|
|
||
| This tutorial shows how to: | ||
|
|
||
| 1. Build an event table with camera-like data (images and peak times) and some invalid values. | ||
| 2. Configure and run HistogramAggregator in chunks. | ||
| 3. Access histogram counts, bin edges, summary statistics, and valid-event counts (n_events). | ||
| 4. Plot one pixel histogram from the selected chunks and both gain channels for both image and peak_time columns. | ||
| 5. Overlay mean, median, and std on top of the histogram curves. | ||
| 6. Plot the same histogram using Hist's built-in plotting functionality after filling a Hist object with the aggregated histogram counts and variances. | ||
| """ | ||
|
|
||
| import matplotlib.pyplot as plt | ||
| import numpy as np | ||
| import hist | ||
| from astropy.table import Table | ||
| from astropy.time import Time | ||
| from traitlets.config import Config | ||
|
|
||
| from ctapipe.monitoring.aggregator import HistogramAggregator | ||
| from hist import Hist | ||
|
|
||
|
|
||
| # ------------------------------------------------------------------- | ||
| # Create synthetic event-wise camera data | ||
| # ------------------------------------------------------------------- | ||
| rng = np.random.default_rng(42) | ||
|
|
||
| n_events = 2000 | ||
| n_channels = 2 | ||
| n_pixels = 100 | ||
|
|
||
| times = Time( | ||
| np.linspace(60117.911, 60117.9258, num=n_events), | ||
| scale="tai", | ||
| format="mjd", | ||
| ) | ||
| event_ids = np.arange(n_events) | ||
| images = rng.normal(loc=85.0, scale=10.0, size=(n_events, n_channels, n_pixels)) | ||
| images[:, 1, :] -= 15 # Simulate lower gain channel by shifting the mean down by 15 | ||
| peak_time = rng.normal(loc=20.0, scale=2.0, size=(n_events, n_channels, n_pixels)) | ||
|
|
||
| # Add a few invalid values to demonstrate n_events behavior. | ||
| images[3, 0, 10] = np.nan | ||
| images[15, 0, 10] = np.nan | ||
| peak_time[5, 0, 10] = np.nan | ||
| peak_time[35, 1, 10] = np.nan | ||
|
|
||
| # Optional static mask over sample dimensions (channel, pixel). | ||
| # Here we exclude channel 1, pixel 99 for all events. | ||
| masked_elements_of_sample = np.zeros((n_channels, n_pixels), dtype=bool) | ||
| masked_elements_of_sample[1, 99] = True | ||
|
|
||
| table = Table( | ||
| [times, event_ids, images, peak_time], | ||
| names=("time", "event_id", "image", "peak_time"), | ||
| ) | ||
|
|
||
|
|
||
| # ------------------------------------------------------------------- | ||
| # Configure and run histogram aggregation | ||
| # ------------------------------------------------------------------- | ||
| config_image = Config( | ||
| { | ||
| "HistogramAggregator": { | ||
| "chunking_type": "SizeChunking", | ||
| "axis_definition": { | ||
| "class_name": "Regular", | ||
| "bins": 50, | ||
| "start": 40.0, | ||
| "stop": 110.0, | ||
| }, | ||
| }, | ||
| "SizeChunking": {"chunk_size": 1000}, | ||
| } | ||
| ) | ||
|
|
||
| aggregator_image = HistogramAggregator(config=config_image) | ||
| result = aggregator_image( | ||
| table=table, | ||
| col_name="image", | ||
| masked_elements_of_sample=masked_elements_of_sample, | ||
| ) | ||
|
|
||
| config_peak_time = Config( | ||
| { | ||
| "HistogramAggregator": { | ||
| "chunking_type": "SizeChunking", | ||
| "axis_definition": { | ||
| "class_name": "Regular", | ||
| "bins": 50, | ||
| "start": 2.0, | ||
| "stop": 38.0, | ||
| }, | ||
| }, | ||
| "SizeChunking": {"chunk_size": 1000}, | ||
| } | ||
| ) | ||
|
|
||
| aggregator_peak_time = HistogramAggregator(config=config_peak_time) | ||
| result_peak_time = aggregator_peak_time( | ||
| table=table, | ||
| col_name="peak_time", | ||
| masked_elements_of_sample=masked_elements_of_sample, | ||
| ) | ||
|
|
||
| print(f"Number of chunks: {len(result)}") | ||
| print(f"histogram shape per chunk: {result[0]['histogram'].shape}") | ||
| print(f"bin edges shape per chunk: {result[0].meta['bin_edges'].shape}") | ||
| print(f"bin centers shape per chunk: {result[0].meta['bin_centers'].shape}") | ||
| print(f"n_events shape per chunk: {result[0]['n_events'].shape}") | ||
|
|
||
|
|
||
| # ------------------------------------------------------------------- | ||
| # Plot the histograms for one pixel with two gain channels | ||
| # ------------------------------------------------------------------- | ||
| # We aggreagted the histograms in two chunks of 1000 events each, so we have two histograms per gain channel | ||
| # for the selected pixel. We will plot both chunks for the selected pixel and gain channels | ||
| # on the same axes for comparison, and then do the same for the peak_time column in a separate figure. | ||
|
|
||
| pixel_index = 10 | ||
| gain_label = {0: "High Gain", 1: "Low Gain"} | ||
|
|
||
| fig, axes = plt.subplots(1, 2, figsize=(12, 4), sharey=True) | ||
| for chunk_index, ax in enumerate(axes): | ||
| bin_edges = result[chunk_index].meta["bin_edges"] | ||
| bin_centers = result[chunk_index].meta["bin_centers"] | ||
| channel_handles = [] | ||
|
|
||
| for channel_index in range(n_channels): | ||
| counts = result[chunk_index]["histogram"][:, channel_index, pixel_index] | ||
| valid_events = result[chunk_index]["n_events"][channel_index, pixel_index] | ||
|
|
||
| line = ax.step( | ||
| bin_edges[:-1], | ||
| counts, | ||
| where="post", | ||
| label=f"{gain_label[channel_index]} (n_events={valid_events})", | ||
| )[0] | ||
| channel_handles.append(line) | ||
| color = line.get_color() | ||
|
|
||
| # Plot bin variances as error bars (use sqrt of variance for error) at bin centers | ||
| bin_errors = np.sqrt(counts) | ||
| ax.errorbar( | ||
| bin_centers, | ||
| counts, | ||
| yerr=bin_errors, | ||
| fmt="none", | ||
| color=color, | ||
| elinewidth=1.0, | ||
| capsize=3, | ||
| alpha=0.6, | ||
| ) | ||
|
|
||
| ax.set_title(f"Chunk {chunk_index}, pixel {pixel_index}") | ||
| ax.set_xlabel("image value") | ||
| ax.set_ylabel("Counts") | ||
|
|
||
| ax.legend( | ||
| handles=channel_handles, | ||
| loc="upper left", | ||
| fontsize=8, | ||
| ) | ||
|
|
||
| plt.show() | ||
|
|
||
|
|
||
| # ------------------------------------------------------------------- | ||
| # Plot peak_time histograms in a separate figure | ||
| # ------------------------------------------------------------------- | ||
| fig, axes = plt.subplots(1, 2, figsize=(12, 4), sharey=True) | ||
| for chunk_index, ax in enumerate(axes): | ||
| bin_edges = result_peak_time[chunk_index].meta["bin_edges"] | ||
| bin_centers = result_peak_time[chunk_index].meta["bin_centers"] | ||
|
|
||
| channel_handles = [] | ||
|
|
||
| for channel_index in range(n_channels): | ||
| counts = result_peak_time[chunk_index]["histogram"][ | ||
| :, channel_index, pixel_index | ||
| ] | ||
| valid_events = result_peak_time[chunk_index]["n_events"][ | ||
| channel_index, pixel_index | ||
| ] | ||
|
|
||
| line = ax.step( | ||
| bin_edges[:-1], | ||
| counts, | ||
| where="post", | ||
| label=f"{gain_label[channel_index]} (n_events={valid_events})", | ||
| )[0] | ||
| channel_handles.append(line) | ||
| color = line.get_color() | ||
|
|
||
| # Plot bin variances as error bars (use sqrt of variance for error) at bin centers | ||
| bin_errors = np.sqrt(counts) | ||
| ax.errorbar( | ||
| bin_centers, | ||
| counts, | ||
| yerr=bin_errors, | ||
| fmt="none", | ||
| color=color, | ||
| elinewidth=1.0, | ||
| capsize=3, | ||
| alpha=0.6, | ||
| ) | ||
|
|
||
| ax.set_title(f"Peak Time - Chunk {chunk_index}, pixel {pixel_index}") | ||
| ax.set_xlabel("peak_time value") | ||
| ax.set_ylabel("Counts") | ||
| ax.legend( | ||
| handles=channel_handles, | ||
| loc="upper left", | ||
| fontsize=8, | ||
| ) | ||
|
|
||
| plt.show() | ||
|
|
||
|
|
||
| # ------------------------------------------------------------------- | ||
| # Initialize hist, fill it and plot via Hist functionality | ||
| # ------------------------------------------------------------------- | ||
|
|
||
| # Create a Hist object with the same binning as the aggregator | ||
| bin_edges = result[0].meta["bin_edges"] | ||
| h = Hist( | ||
| hist.axis.Regular(len(bin_edges) - 1, bin_edges[0], bin_edges[-1], name="value") | ||
| ) | ||
|
|
||
| # Get the histogram counts and variances for the selected pixel and channel | ||
| chunk_index = 0 | ||
| counts = result[0]["histogram"][:, chunk_index, pixel_index] | ||
|
|
||
| # Set the histogram values using the view interface | ||
| h.view(flow=False)[:] = counts | ||
|
|
||
| # Plot the histogram with error bars using Hist's built-in plotting functionality | ||
| # Requires 'hist[plot]' to be installed in the environment. | ||
| h.plot(yerr=True) | ||
| plt.title( | ||
| f"Chunk {chunk_index}, Pixel {pixel_index} (High Gain) histogram from Hist object" | ||
| ) | ||
| plt.xlabel("image value") | ||
| plt.ylabel("Counts") | ||
| plt.show() | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -63,6 +63,7 @@ | |
| "StatisticsContainer", | ||
| "ChunkContainer", | ||
| "ChunkStatisticsContainer", | ||
| "ChunkHistogramsContainer", | ||
| "ImageStatisticsContainer", | ||
| "IntensityStatisticsContainer", | ||
| "PeakTimeStatisticsContainer", | ||
|
|
@@ -1267,6 +1268,12 @@ class ChunkStatisticsContainer(ChunkContainer): | |
| std = Field(None, "standard deviation of the chunk distribution") | ||
|
|
||
|
|
||
| class ChunkHistogramsContainer(ChunkContainer): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd name this singular |
||
| """Container for histograms of the chunk distribution""" | ||
|
|
||
| histogram = Field(None, "histogram of the chunk distribution") | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would be nice to have a helper function to convert this Container back to a Hist object. Here is an example: def hist_from_container(
cont: ChunkHistogramsContainer, axis_names=["pedestal", "channel", "pixel"]
) -> Hist:
"""Returns a Hist constructed from a stored ChunkHistogramsContainer."""
bin_edges = cont.meta["bin_edges"]
axes = [hist.axis.Variable(edges=bin_edges, name=axis_names[0])]
# the rest of the dimensions
for name, n_bins in zip(axis_names[1:], cont.histogram.shape[1:]):
if n_bins == 2:
axes.append(hist.axis.IntCategory(categories=np.arange(2), name=name))
else:
axes.append(
hist.axis.Regular(bins=n_bins, start=0, stop=n_bins - 1, name=name)
)
h = Hist(*axes)
h[...] = cont.histogram[...]
return hThen I can do things like: with HDF5TableReader("stats.h5") as reader:
for i, container in enumerate(
reader.read(
table_name="/dl1/monitoring/telescope/calibration/camera/pixel_histograms/sky_pedestal_image/tel_001",
containers=ChunkHistogramsContainer,
prefixes=[""],
)
):
h = hist_from_container(container)
fig, ax = plt.subplots(1, 3, figsize=(10, 3), layout="constrained")
fig.suptitle(f"Chunk {i}")
h[:, 0, :].plot(ax=ax[0], norm="log")
h[:, 1, :].plot(ax=ax[1], norm="log")
h.integrate("pixel").stack("channel").plot(ax=ax[2], legend=True)
ax[0].set_title(f"Channel 0")
ax[1].set_title(f"Channel 1")
ax[2].set_title("Integral oval all pixels")
Ideally, you could also store the axis names (
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What we did in datapipe-testbench is to serialize all the necessary Hist info in the metadata: the axis names, the axis types (e.g. Regular, Variable, Category), units, etc. That would work well here as well (and in the future I could use this class directly).
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For column-wise access, you could also do a similar
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One more nice demo of using this if it's a Hist: fig, ax = plt.subplots(1,2, figsize=(10,3))
disp1 = CameraDisplay(geom, image=h[:,0,:].profile("pedestal").values(), ax=ax[0])
disp2 = CameraDisplay(geom, image=h[:,0,:].profile("pedestal").variances(), ax=ax[1])
ax[0].set_title(ax[0].get_title() + " Pedestal")
ax[1].set_title(ax[0].get_title() + " Pedestal Variance")
disp1.add_colorbar()
disp2.add_colorbar()
There, you don't even need the stats, however, as @maxnoe pointed out above, computing the mean and variance from a histogram isn't as precise as doing it from the events. |
||
|
|
||
| class PixelStatisticsContainer(Container): | ||
| """ | ||
| Container for pixel statistics from flat-field and sky pedestal events | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -108,6 +108,7 @@ | |
| "v7.3.0", | ||
| "v7.4.0", | ||
| "v7.5.0", | ||
| "v7.6.0", | ||
| ] | ||
|
|
||
|
|
||
|
|
||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See prev comment: a helper function to do this would be nice.