Merge telescope-wise data from the same OB#2916
Conversation
|
I'm not sure that we need the merger of the subarray. The individual telescope data stream has the For now, In any case, I don't really foresee the need to combine "per-telescope" subarray descriptions, at least not for observed CTAO data. Do you have a usecase that requires merging subarrays? |
Thanks for the feedback, that makes sense for the standard CTAO production workflow. The motivation for including subarray merging here is mainly to support a few non-standard but actively used workflows rather than the core CTAO analysis chain. In calibpipe, we work with telescope-wise monitoring outputs that we want to merge at a later stage; this could be handled by a dedicated calibpipe tool, but since ctapipe-merge already provides a generic merging interface and now supports different merge strategies, it feels convenient to support this pattern there as well. The required changes are relatively minor, with the main added complexity being the handling of the trigger tables in _flush(). A second motivation comes from CTLearn, where some stereo deep learning models operate on lower-level data and combine information from multiple telescopes early in the analysis chain. While it is possible to redesign the readers to explicitly handle multiple telescope-wise files, this quickly becomes more complex, whereas allowing merged telescope-wise inputs simplifies the workflow. In addition, the static helper SubarrayDescription.merge_subarrays() is needed in CTLearn for one of our best-performing models, where predictions are made on multiple telescope-pair subarrays and then combined. I agree that this functionality is likely not needed for standard observed CTAO data processing, but with relatively small, opt-in changes we can support these additional use cases without impacting the default workflow, while improving convenience for downstream tools. |
@maxnoe , I would like to add here the use case of telescope cross calibration. There we want to process all the data from all subarrays of a given night, and depending on the case, we might also want to process data from more than one nights. Reading @TjarkMiener reply, if it is straightforward to add such functionality in ctapipe, that can merge DL2 data from different OBs and merge the subarrays, that would be great. |
|
Ok, thanks, these are fine usecases and merging the subarray data should be straight forward. However, we should make sure that the default for merging telescopes is merging of the same ob, which should require the same subarray. |
| if self.subarray != subarray: | ||
| raise CannotMerge( | ||
| f"Subarrays do not match for file: {other.filename}" | ||
| ) |
There was a problem hiding this comment.
@Voutsi the default merge strategy will check if the subarrays match between different observation blocks. For your use case you would need to disable that check? Should we add another merge strategy for this then?
There was a problem hiding this comment.
@TjarkMiener , yes for my usecase I need to merge different subarrays. But I am not sure I need a new strategy. Wouldn't this strategy "events-multiple-obs" do the job?
There was a problem hiding this comment.
This needs much more work.
Thinking about it, you just cannot simply combine subarrays, there are multiple arrays in the data that are arrays of length n_telescopes. You need to change these values to be consistent over all events, e.g.the tels_with_trigger in the trigger subarray trigger table.
There was a problem hiding this comment.
You need to change these values to be consistent over all events, e.g.the tels_with_trigger in the trigger subarray trigger table.
Yes, you are missing a test that checks that sub.tel_ids_to_mask() is consistent for the original and merged subarrays, which of course will fail. It's unfortunately a pretty strong underlying assumption that the telescope list doesn't change.
Maybe another reason we need a format validation tool, since checking the consistency of event fields like tels_with_trigger or similar is quite high-level.
Do you really need to merge subarrays though? I would just refactor the problem into two steps/tools:
- tool that collects statistics you need for each telescope and writes them to a file, inside of which are data indexed by the telescope type name or tel_id, or whatever you need. That tool should be able to run on run over multiple DL2/Event input files, perhaps just appending to the initial file so you end up with one "merged" stats file.
- tool that reads this intermediate stats file to compute inter/intra-telescope calibration.
There was a problem hiding this comment.
Thinking about it, you just cannot simply combine subarrays, there are multiple arrays in the data that are arrays of length
n_telescopes. You need to change these values to be consistent over all events, e.g.thetels_with_triggerin the trigger subarray trigger table.
Yes, we need to run the _flush() also for this usecase. I added a new merge strategy 'events-multiple-obs-different-subarrays' for it. I also exclude DL2 subarray from the merging since it is not really needed and it will contain arrays of length n_telescopes. It is also not a simple merge since it will involve some logic of how to combine those predictions.
There was a problem hiding this comment.
I think we before we continue here with implementation we need a better formulation of the different use-cases and what we expect as input and output data for each of them.
This looks now like a footgun for people to end up with files that violate assumptions of e.g. the HDF5EventSource and the TableLoader, i.e. of our data model.
I don't really see why this would require you to merge input files for that. Why not work on multiple files and load into memory from them as needed? |
|
@maxnoe Sorry for the long silence, I misinterpreted your comment about the multiple files slightly. First, I thought with multiple files you mean per
Besides excluding DL2 subarray from merging with strategy Can you please confirm so we can go ahead with the implementation, @maxnoe @kosack? |
Yes, indeed. |
An alternative approach here would be not to merge, but read multiple per-telescope DL1 files during the DL1 to DL2 step. This might be the more flexible approach and it doesn't require an extra step and copying around data. |
8cc1b2a to
5b8e7d0
Compare
|
The docs build is fixed in main, please update |
…e-wise data from the same OB
Extraction from timing parameters do not seems deterministic.
Co-authored-by: Karl Kosack <kosack@users.noreply.github.com>
use data format defs in merge tool tests
5b8e7d0 to
e9b8c4e
Compare
|
Thanks @maxnoe it is passing now. Regarding the SonarQube analysis, can you please have a look and let me know if the issues are acceptable or if I should resolve them. |
I don't think adding a "foot-gun" level feature to ctapipe, where we might create broken subarray descriptions and non-matching trigger and dl2 tables is warranted for that. Where do these telescope-wise files come from? Why do they need to be merged? Why merge at all? I think we need to solve the case of per-telescope DL1 input files, and not by just trying to merge them together. This only results in unnecessary data copies and possibly inconsistent files. |
I don’t think this is possible with the current implementation. Can you think of a specific misuse of the component that would violate the assumptions of the data model? DL2 is explicitly excluded when using |




This PR adds a new merge strategy for combining telescope-wise data from the same OB. It also added a static method to the
SubarrayDescriptionto merge subarrays into one.closes #1625