Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

58 emission cube cue #92

Closed
wants to merge 41 commits into from
Closed

58 emission cube cue #92

wants to merge 41 commits into from

Conversation

anschaible
Copy link
Collaborator

I thought this is straight forward, however, I have issues with the pipeline setup and I don't know how to fix them.

I implemented the gas lookup based on CUE and get the emission lines and the gas continuum. There are still two task to do: write pytests for the cue/grid.py class and norm the gas particle information for the cue lookup by the solar values. Both tasks I con do myself. The gas lookup works, if I implement the functions individual (see notebook cue_lookup.ipynb).

However I thought it is quite easy to set the calculate_spectra function in the pipeline setup just to the gas spectrum function in the cue core module, but here I get a key error and see not why this error occurs and how to fix it. The pipeline breaks with the gas function (see notebook rubix_pipeline_gas.ipynb). I do not know how to change the pipeline setup or the gas function to get rid of this issue. Maybe it is a very obvious small thing that has to be changed and it is just not obvious to me. This is not urgent. If someone at some free minute has time to have a look into it and maybe quickly give a hint, what may solve the issue, I would be happy, but as I said it is not very urgent.

…as cell values out of boundary for the cue model
@MaHaWo
Copy link
Collaborator

MaHaWo commented Oct 30, 2024

as per our discussion on monday I tried to run the pipeline in rubix_pipeline_gas.py to find out what's wrong with it. However, the following error occurs in this pipeline and in the tests:

ModuleNotFoundError: No module named 'rubix.spectra.cue.cue.src

There appears to be something missing here? Can you give me some pointers?

@anschaible

@anschaible
Copy link
Collaborator Author

Okay then the first problem may depend on how I implemented the other github repo. For the gas lookup we rely on the CUE model (https://arxiv.org/pdf/2405.04598). Therefore I cloned the CUE repo (https://github.com/yi-jia-li/cue) in the folder rubix.spectra.cue. In my local version it works, but if I want to have a look online on the github page into this code folder, this not possible, so something here is not right.

The notebook cue_lookup.ipynb has the individual steps to create the gas cube. @MaHaWo can you test if you can run it or if you get the same error?

@MaHaWo
Copy link
Collaborator

MaHaWo commented Oct 30, 2024

ok, wil have another look this afternoon

@MaHaWo
Copy link
Collaborator

MaHaWo commented Oct 30, 2024

can´t you pip install it as a normal dependency? It seems to be a normal package, so you can do something like
https://github.com/yi-jia-li/cue@main

@MaHaWo
Copy link
Collaborator

MaHaWo commented Oct 30, 2024

ok, so what happens with the pipeline seems to be the following:
in your pipeline_gas class, there is

def _get_pipeline_functions(self) -> list:
        """
        Sets up the pipeline functions.

        Returns
        -------
        list
            List of functions to be used in the pipeline.
        """
        self.logger.info("Setting up the pipeline...")
        self.logger.debug("Pipeline Configuration: %s", self.pipeline_config)

        # TODO: maybe there is a nicer way to load the functions from the yaml config?
        rotate_galaxy = get_galaxy_rotation(self.user_config)
        filter_particles = get_filter_particles(self.user_config)
        spaxel_assignment = get_spaxel_assignment(self.user_config)
        gas_emission = get_gas_emission(self.user_config)
        scale_spectrum_by_mass = get_scale_spectrum_by_mass(self.user_config)
        doppler_shift_and_resampling = get_doppler_shift_and_resampling(
            self.user_config
        )
        calculate_datacube = get_calculate_datacube(self.user_config)
        convolve_psf = get_convolve_psf(self.user_config)
        convolve_lsf = get_convolve_lsf(self.user_config)
        apply_noise = get_apply_noise(self.user_config)

        functions = [
            rotate_galaxy,
            filter_particles,
            spaxel_assignment,
            gas_emission,
            scale_spectrum_by_mass,
            doppler_shift_and_resampling,
            calculate_datacube,
            convolve_psf,
            convolve_lsf,
            apply_noise,
        ]

        return functions
        

however, in the get_gas_emssion function in cue.py, the function returned is named gas_emission, not calculate_spectra:

def get_gas_emission(config: dict):

    if "gas" not in config["data"]["args"]["particle_type"]:
        raise ValueError("No gas particle data in this galaxy")

    if "gas" not in config["data"]["args"]["cube_type"]:
        raise ValueError("Not specified in the config to calculate gas emission cube")

    logger = get_logger()

    def gas_emission(rubixdata: object) -> object:
        """Calculate gas emission lines and gas continuum emission."""
        logger.info("Calculation gas emission...")

        CueClass = CueGasLookup(config)

        rubixdata = CueClass.get_gas_emission_flux(rubixdata)
        rubixdata = jax.block_until_ready(rubixdata)

        logger.debug("Completed gas emission calculation: %s", rubixdata)
        return rubixdata

    return gas_emission

In turn, when the pipeline looks up the function name to register the functions for lookup, it does so by the __name__ attribute of the function it looks at, and saves it under this name, i.e., here, as gas_emission, not as calculate_spectra:

   def register_transformer(self, cls):
       """
       register_transformer Make a functtion available to the calling
       pipeline object. The registered function must be a pure functional
       function in order to be transformable with jax. The registered transformers
       are used to build a pipeline.
       Parameters
       ----------
       cls
           function object to register.

       Raises
       ------
       ValueError
           When the function is already registered  with the pipeline
       """
       if cls.__name__ in self.transformers:
           raise ValueError("A transformer of this name is already present")
       self.transformers[cls.__name__] = cls

Consequently, when calculate_spectra is looked up, it is not found, because the function's name attribute reads 'gas_emission'.

I wasn't aware of this being a source of instability, but maybe we should change that in some way. Should happen in a separate PR though. In the mean time, you should be able to get your pipeline to work by calling the thing 'gas_emssion' instead of calculate spectra or vice versa.

Edit: add code

@anschaible
Copy link
Collaborator Author

Thanks for looking into this problem. Your comments helped me a lot to get a better understanding of the. details of the pipeline structure. Now that I know that I have to name transformer exactly the same in the core module and in the yaml, this should be no issue for further individual pipelines. Now that I know this, I will write this in the documentation and then ist should hopefully be clear to anyone and not again a issue

@anschaible
Copy link
Collaborator Author

I have still to open questions in the software direction:

Installing cue via pip is propably thebetter way. However there is a small issue: They have in the cue pacage a version with 128 emission lines and a new version with 138 emission lines, but the two versions are a bit mixed and the current version in not working. I have to change in line.py in line163 "line_old" to "np.arange(138)". Therefore it is not working with just pip install or is there a way to work around?

I get an TracerArrayConversionError for cue, because it uses numpy:

TracerArrayConversionError: The numpy.ndarray conversion method array() was called on traced array with shape float32[1000,12].
The error occurred while tracing the function expr at /home/rubix/rubix/pipeline/linear_pipeline.py:152 for jit. This concrete value was not available in Python because it depends on the values of the arguments input[<flat index 0>][<flat index 2>], input[<flat index 2>][<flat index 0>], input[<flat index 2>][<flat index 2>], input[<flat index 2>][<flat index 3>], and input[<flat index 2>][<flat index 6>].
See https://jax.readthedocs.io/en/latest/errors.html#jax.errors.TracerArrayConversionError

I already tried this, but it does not work:
from functools import partial
@partial(jit, static_argnums=(0,))

@MaHaWo do you know a way to fix these two issues?

@MaHaWo
Copy link
Collaborator

MaHaWo commented Nov 4, 2024

hm, ok. seems that cue causes more trouble than I thought then. Will look into these things later, from the top of my head I'm not sure what to do. worst case scenario we might need to make a fork from which we can open PRs against the original cue to contribute a bit towards its usability.

@MaHaWo
Copy link
Collaborator

MaHaWo commented Nov 5, 2024

@anschaible maybe we can go through this step by step? I'm trying to run the rubix_pipeline_gas notebook. I have a few issues that have not come up in our discussion so far:

  • when I clone cue and install it via pip, I get an error about mixed tabs and spaces as indentends. can you confirm?
  • I renamed the gas_emission function to calculate_spectra to make the pipeline work. this, however, does result in the following error (relevant part being the last paragraph). Can you guide me a bit here? Am I missing steps?
TypeError                                 Traceback (most recent call last)
Cell In[1], line 65
      5 config = {
      6     \"pipeline\":{\"name\": \"calc_ifu_gas\"},
      7     
   (...)
     60     },        
     61 }
     63 pipe = RubixPipeline(config)
---> 65 data = pipe.run()
     67 #datacube = data.stars.datacube
     68 #img = datacube.sum(axis=2)
     69 #plt.imshow(img, origin=\"lower\")

File ~/Development/rubix/rubix/core/pipeline_gas.py:177, in RubixPipeline.run(self)
    175 # Running the pipeline
    176 self.logger.info(\"Running the pipeline on the input data...\")
--> 177 output = self.func(self.data)
    179 jax.block_until_ready(output)
    180 time_end = time.time()

    [... skipping hidden 11 frame]

File ~/Development/rubix/rubix/pipeline/linear_pipeline.py:151, in LinearTransformerPipeline.build_expression.<locals>.expr(input, pipeline)
    149 res = input
    150 for f in pipeline:
--> 151     res = f(res)
    152 return res

File ~/Development/rubix/rubix/core/rotation.py:81, in get_galaxy_rotation.<locals>.rotate_galaxy(rubixdata, type)
     78 halfmass_radius = rubixdata.galaxy.halfmassrad_stars
     80 # Rotate the galaxy
---> 81 coords, velocities = rotate_galaxy_core(
     82     positions=coords,
     83     velocities=velocities,
     84     masses=masses,
     85     halfmass_radius=halfmass_radius,
     86     alpha=alpha,
     87     beta=beta,
     88     gamma=gamma,
     89 )
     91 # Update the inputs
     92 #rubixdata.gas.coords = coords
     93 #rubixdata.gas.velocity = velocities
     94 setattr(rubixdata.gas, \"coords\", coords)

File ~/Development/rubix/rubix/galaxy/alignment.py:276, in rotate_galaxy(positions, velocities, masses, halfmass_radius, alpha, beta, gamma)
    244 def rotate_galaxy(positions, velocities, masses, halfmass_radius, alpha, beta, gamma):
    245     \"\"\"Orientate the galaxy by applying a rotation matrix to the positions of the particles.
    246 
    247     Parameters
   (...)
    273         The rotated positions aqnd velocities.
    274     \"\"\"
--> 276     I = moment_of_inertia_tensor(positions, masses, halfmass_radius)
    277     R = rotation_matrix_from_inertia_tensor(I)
    278     pos_rot = apply_init_rotation(positions, R)

File ~/Development/rubix/rubix/galaxy/alignment.py:89, in moment_of_inertia_tensor(positions, masses, halfmass_radius)
     69 def moment_of_inertia_tensor(positions, masses, halfmass_radius):
     70     \"\"\"Calculate the moment of inertia tensor for a given set of positions and masses within the half-light radius.
     71        Assumes the galaxy is already centered.
     72 
   (...)
     85         The moment of inertia tensor.
     86     \"\"\"
     88     distances = jnp.sqrt(
---> 89         jnp.sum(positions**2, axis=1)
     90     )  # Direct calculation since positions are already centered
     92     within_halfmass_radius = distances <= halfmass_radius
     94     # Ensure within_halfmass_radius is concrete

TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'"

Other issues you mentioned:

The problem arises because jax tries to trace a value that has an unknown shape at the time of tracing afais. The issue, however, seems to run a bit deeper.
It seems that you do all the cue lookup (and with it all the calls cue does internally) within the function that is eventually returned from the function factory:

def get_gas_emission(config: dict):
    if "gas" not in config["data"]["args"]["particle_type"]:
        raise ValueError("No gas particle data in this galaxy")

    if "gas" not in config["data"]["args"]["cube_type"]:
        raise ValueError("Not specified in the config to calculate gas emission cube")

    logger = get_logger()

    def calculate_spectra(rubixdata: object) -> object:
        """Calculate gas emission lines and gas continuum emission."""
        logger.info("Calculation gas emission...")

        CueClass = CueGasLookup(config)

        rubixdata = CueClass.get_gas_emission_flux(rubixdata)
        rubixdata = jax.block_until_ready(rubixdata)

        logger.debug("Completed gas emission calculation: %s", rubixdata)
        return rubixdata

    return calculate_spectra

This function will be just-in-time compiled, and with it all the data structures that it uses.
Does this have to be so? can you do this CueGasLookup stuff outside of calculate_spectra function and only pass in the final data, i.e., a version that has been converted to a jax array of fixed shape? The key idea behind the factory pattern implemented here is that it allows for a separation of pre-processing (should happen outside the returned function) and computation (happens within the returned function). From the looks of it we might be able to take advantage of this here?
Similarly, there seem to be other lookup things depending on the config in the computation function, e.g.,

        telescope = get_telescope(self.config)

in line 444, grid.py.

I'm not sure if I'm contradicting my own advice here though, I feel like we talked about this.

@MaHaWo
Copy link
Collaborator

MaHaWo commented Nov 5, 2024

I also created a fork of cue here: https://github.com/MaHaWo/cue. We can fix the stuff we need there and then make PRs against the original. It isn´t in a good state as far as I can see and it would perhaps be of use to the wider community to contribute where we can.

@anschaible
Copy link
Collaborator Author

To the TypeError, when running rubix_pipeline_gas.ipynb: If you have run the pipeline before for stellar spectra, then there exists already a rubix_galaxy.h5 file in notebooks/output/, but this file has then no gas, so try to remove in /data/ the galaxy-id-11.hdf5 file and in /output/ the rubix_galaxy.h5 file and run the pipeline again, then this error should disappear

@anschaible
Copy link
Collaborator Author

Before going through the other issues mentioned, a short comment to the cue github and fork:

  1. when I installed it, I got no error about mixed tabs and spaces
  2. I have no permission to push to MaHaWo/cue on GitHub. I want to open a new branch fixing the line number from 128 to 138. It is just chaning one line of code: In line.py in line 163: line_ind=line_old -> line_ind=np.arange(138)

@ufuk-cakir
Copy link
Owner

To the TypeError, when running rubix_pipeline_gas.ipynb: If you have run the pipeline before for stellar spectra, then there exists already a rubix_galaxy.h5 file in notebooks/output/, but this file has then no gas, so try to remove in /data/ the galaxy-id-11.hdf5 file and in /output/ the rubix_galaxy.h5 file and run the pipeline again, then this error should disappear

this issue sounds like something we should check during runtime, as the error message is not really descriptive. We should probably create a issue for that

@anschaible
Copy link
Collaborator Author

Yes, if there is a way to do the CueGasLookup outside the spectra_calculation function, I am more than happy to change it. I tried to rewrite the code to split the pre-processing and computation, but I am not fully sure, how to do it. @MaHaWo and @ufuk-cakir you have a better expertise on it, could you help to implement this or give more comments on how to do it?

@TobiBu
Copy link
Collaborator

TobiBu commented Nov 5, 2024

Yes, if there is a way to do the CueGasLookup outside the spectra_calculation function, I am more than happy to change it. I tried to rewrite the code to split the pre-processing and computation, but I am not fully sure, how to do it. @MaHaWo and @ufuk-cakir you have a better expertise on it, could you help to implement this or give more comments on how to do it?

Just chiming in here, this lookup sounds pretty much what we also do for the stellar spectra / the SSP templates. any solution for cue would apply to this as well or vice-versa. That said, we should check where and how we to the SSP lookup in the pipeline as this is also kind of pre-processing.

@anschaible
Copy link
Collaborator Author

Maybe I think to complicated, but isn't the large difference to ssp that we do not have a file that we load to do the lookup? we have to run cue for each particle?

@TobiBu
Copy link
Collaborator

TobiBu commented Nov 6, 2024

Maybe I think to complicated, but isn't the large difference to ssp that we do not have a file that we load to do the lookup? we have to run cue for each particle?

that depends. in case of Mastar models, which are tabulated this is true. there we have a simple look up. in case of asps we might actually have a function call much like in CUE, I need to check the details though.

@MaHaWo
Copy link
Collaborator

MaHaWo commented Nov 7, 2024

@anschaible you should have an invite for the cue fork now. That thing about tabs and spaces is weird. Maybe it's an editor thing? anyway, doesn´t seem particularly important 🤔

@anschaible anschaible closed this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emission cube
4 participants