Skip to content

Add possibility to produce DL3 with ctapipe#2727

Draft
mdebony wants to merge 74 commits into
cta-observatory:mainfrom
mdebony:create_dl3
Draft

Add possibility to produce DL3 with ctapipe#2727
mdebony wants to merge 74 commits into
cta-observatory:mainfrom
mdebony:create_dl3

Conversation

@mdebony
Copy link
Copy Markdown
Contributor

@mdebony mdebony commented Mar 25, 2025

The purpose of this PR is to add support for the creation of DL3 file in ctapipe. The current output format is the GADF format as described in : https://gamma-astro-data-formats.readthedocs.io/en/v0.3/

The modification include several change in some part of the code used for IRFs production in order to make it compatible also for DL3 production (loading events and applying cuts).

This PR should be for now considered as a draft as several item are missing :

  • Fix the error encounter in some automatic test (help here is welcome)
  • User documentation on how to used those new functionality
  • Automatic test
  • A script to generate HDU and OBS index
  • Add information in the changelog

The objectives to first submit it as a draft is to be able to discuss several points :

  1. Handling of time
    It's not very clear to me the current time format in the DL2, and so if all the conversion performed are in line with what should be done.
    Also what is the best time scale to use for our case, TAI, UTC ?
    What is the reference time that should be used ? It is currently set in the code to UNIX time, but maybe we want to have a CTA dedicated one like other experience are doing.

  2. Optional columns for events
    There are currently support for most of the optional columns defined in the GADF format (https://gamma-astro-data-formats.readthedocs.io/en/v0.3/events/events.html). The two exceptions are x max and hillas parameters.
    For x max, I instead currently export h max. Are there any simple library to convert h max into x max ?
    For hillas parameters, as the intended use is mainly stereo, it was not obvious which one to add to the file and currently skipped all of them.

  3. Metadata
    For numerous metadata, i didn't find information about them in the DL2 file, but it could come partly due to currently using MC DL2 file :

    • Dead time
    • Name of the organization, site, and sub array
    • Information about the target object, name and coordinate
    • Information about the version of the calibration used, and information about the version of ctapipe used for lower level processing
  4. Data quality metadata
    In the optional metadata of the GADF, there are quite a few linked to quality (trigger rate, broken pixel, muon efficiency, humidity, NSB, ....). I guess than for CTA we would like to handle quality a bit differently. Should they be included any way. If yes, how do I retrieve all those information.

  5. Code organization and implementation
    I'm not yet used to ctapipe specificity (tools and component). I would like to validate, my use of them is corresponding to the intent. Also I've currently put the code for DL3 production mainly in the irf folder as a very large fraction is common. Should we rename it or move it ?

  6. Speed
    Currently the code is crazy slow (It took close to 30 minutes on my laptop to process a single gamma MC DL2 file). I've encountered some issue when I tried to profile it (any help here is welcome) but I guess most of it come from coordinate conversions. How important is this for the first version ?

@mdebony
Copy link
Copy Markdown
Contributor Author

mdebony commented Mar 25, 2025

That may not be so necessary, as in general CTAO will have multilpe targets per observation, but some info is contained in the scheduling block and observation blocks you can read from the EventSource or TableLoader. Generally the actual target name is not required, but you could also add a config option to be able to specify it.

I have a question on this point. Could a DL2 have multiple targets, ie different pointing in the same file. Currently everything is thinked a bit more current IACT way, one DL2 file = one run on a specific target

@mdebony
Copy link
Copy Markdown
Contributor Author

mdebony commented Mar 25, 2025

This should probably go into functions in a module like ctapipe.io.gadf.
In this case should I move the whole dl3.py file that I added to handle the writing of DL3 to this module ?

@kosack
Copy link
Copy Markdown
Member

kosack commented Mar 26, 2025

I have a question on this point. Could a DL2 have multiple targets, ie different pointing in the same file. Currently everything is thinked a bit more current IACT way, one DL2 file = one run on a specific target

The data format of DL2 allows multiple OBs to be merged, but for CTAO we can probably just assume for that we dont' mix observations in the DL2 produced for observed data. Certainly right now, the GADF format assumes that we do not mix OBs. It will likely be the other way around in fact, we will store multiple DL3 files for a single observation if there are more than one SOI, for example. And right now, also for different event types. My point was just "OB != science target", but you can assume one OB is one pointing, though the pointing could be fixed in ra/dec or alt/az ("drift mode"), since both are supported by ACADA.

Again, anything that goes into ctapipe should be as generic as possible (at least should work for any IACT) and not assume exactly what CTAO will doa, and anything ctao-specific should be developed outside ctapipe in a package in the datapipe gitlab space.

@mdebony
Copy link
Copy Markdown
Contributor Author

mdebony commented Mar 26, 2025

The multiple mode of pointing are handled. For multiple OB in the same file, with the current code, it should produce a DL3 file with correct GTI and pointing information, but some information like obs id will not represent everything. I also didn't handle at all the possibility to have different pointing mode in the same file.
A possibility would be to generate one DL3 file per obs id. It will require quite a lot of modification to this PR but it's doable.

@mdebony
Copy link
Copy Markdown
Contributor Author

mdebony commented Mar 26, 2025

For x max, I instead currently export h max. Are there any simple library to convert h max into x max ?

The ctapipe.atmosphere module lets you do that, but of course it only works if an atmosphere model is available. For simulations, that model is automatically available in the EventSource, but that is not yet imlpemented for EventSources that read real data. But generally, you can just use:

with EventSource(filename) as source:
    if source.atmosphere_density_profile:
        x_max = source.atmosphere_density_profile.slant_depth_from_height(h_max, zenith_angle)

That will work for any EventSource, but so far you need to test of the atmosphere_density_profile is not None, and otherwise you cannot compute X_max. Probably we should add a TableLoader.read_atmosphere() method as well, since right now you have to construct an EventSource to get it, but that's not too bad.

h max to x max conversion is now implemented but not properly tested as the DL2 file I have on hand doesn't have atmosphere profile information (or at least EventSource is not finding the atmosphere profile).

Comment thread src/ctapipe/irf/preprocessing.py Outdated
Comment thread src/ctapipe/irf/preprocessing.py
Comment thread src/ctapipe/irf/dl3.py Outdated
@TjarkMiener
Copy link
Copy Markdown
Member

I have converted DL2 files of the performance paper from lstchain to ctapipe format. They can be found on the onsite cluster /fefs/aswg/workspace/tjark.miener/dl2_ctapipe/perfpaper_data/dl2_LST-1.Run0*. Maybe they can be helpful for testing and debugging.

"""

# Setting preprocessing for DL3
EventPreprocessor.irf_pre_processing = False
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the class variables here is not how the config system is supposed to work.

You should either override the options, change the defaults by creating a new subclass or just pass required options as keyword arguments when creating the class.

Comment thread src/ctapipe/irf/dl3.py

class DL3EventsWriter(Component):
"""
Base class for writing a DL3 file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be an abstract interface class

Comment thread src/ctapipe/irf/dl3.py
self._target_information = None
self._software_information = None

@abstractmethod
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definitions of all of these getters and setters look like overkill.

It's also seems a bit weird to attach all of these attributes to a data writer. These should be set on the objects, this class writes out to files, not of the writer itself.

Comment thread src/ctapipe/irf/dl3.py
self._obs_id = obs_id

@property
def events(self) -> QTable:
Copy link
Copy Markdown
Member

@maxnoe maxnoe Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect a DL3 writer API to look something like this:

@abtractmethod
def __call__(self, path, events: DL3Events, irf: IRF, metadata: DL3Metadata, ...):
      ...

and it writes the DL3 file to path.

The writer should't have attributes related to the data itself.

@mdebony mdebony mentioned this pull request Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants