Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running anemoi-training example notebook #119

Open
krinchyman opened this issue Jan 31, 2025 · 0 comments
Open

Running anemoi-training example notebook #119

krinchyman opened this issue Jan 31, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@krinchyman
Copy link

krinchyman commented Jan 31, 2025

What happened?

First of all, I'm very new to anemoi framework so this can be not an issue or a known one.

I'm trying to run inference using AIFS checkpoint from HugginFace. I've cloned the repository, installed requested packages and run the inference example. The model does not fit in my GPU, so I asked the inference to be done in the CPU modifying the code as follows:

runner = SimpleRunner('aifs_single_v0.2.1.ckpt', device='cpu')

After that, running the inference:

for state in runner.run(input_state=input_state, lead_time=24):
print_state(state)

Gives an error related to node_attributes as follows:

AttributeError: 'AnemoiModelEncProcDec' object has no attribute 'node_attributes'

longitude and latitude are not present in the example's input_stat as in documentation's example, so I decided to calculate N320 gaussian grid longitud and latitude using forcings that are added to fields dictionary (cos/sin_longitude and cos/sin_latitude) after running the first try, but no luck (runner also has runner.checkpoint.longitudes/latitudes).

Am i missing the point?

What are the steps to reproduce the bug?

Simply changing this line:

runner = SimpleRunner('aifs_single_v0.2.1.ckpt', device='gpu')

to

runner = SimpleRunner('aifs_single_v0.2.1.ckpt', device='cpu')

Version

0.4.5

Platform (OS and architecture)

Debian 11

Relevant log output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 for state in runner.run(input_state=input_state, lead_time=24):
      2     print_state(state)

File ~/.anaconda3/envs/ai-models/lib/python3.10/site-packages/anemoi/inference/runner.py:131, in Runner.run(self, input_state, lead_time)
    128 input_tensor = self.prepare_input_tensor(input_state)
    130 try:
--> 131     yield from self.postprocess(self.forecast(lead_time, input_tensor, input_state))
    132 except (TypeError, ModuleNotFoundError, AttributeError):
    133     if self.report_error:

File ~/.anaconda3/envs/ai-models/lib/python3.10/site-packages/anemoi/inference/postprocess.py:34, in Accumulator.__call__(self, source)
     33 def __call__(self, source):
---> 34     for state in source:
     35         for accumulation in self.accumulations:
     36             if accumulation in state["fields"]:

File ~/.anaconda3/envs/ai-models/lib/python3.10/site-packages/anemoi/inference/runner.py:294, in Runner.forecast(self, lead_time, input_tensor_numpy, input_state)
    292 # Predict next state of atmosphere
    293 with torch.autocast(device_type=self.device, dtype=self.autocast):
--> 294     y_pred = self.predict_step(self.model, input_tensor_torch, fcstep=s)
    296 # Detach tensor and squeeze (should we detach here?)
    297 output = np.squeeze(y_pred.cpu().numpy())  # shape: (values, variables)

File ~/.anaconda3/envs/ai-models/lib/python3.10/site-packages/anemoi/inference/runner.py:248, in Runner.predict_step(self, model, input_tensor_torch, fcstep, **kwargs)
    245 def predict_step(self, model, input_tensor_torch, fcstep, **kwargs):
    246     # extra args are only used in specific runners
    247     # TODO: move this to a Stepper class.
--> 248     return model.predict_step(input_tensor_torch)

File ~/.anaconda3/envs/ai-models/lib/python3.10/site-packages/anemoi/models/interface/__init__.py:121, in AnemoiModelInterface.predict_step(self, batch)
    117     # Dimensions are
    118     # batch, timesteps, horizonal space, variables
    119     x = batch[:, 0 : self.multi_step, None, ...]  # add dummy ensemble dimension as 3rd index
--> 121     y_hat = self(x)
    123 return self.post_processors(y_hat, in_place=False)

File ~/.anaconda3/envs/ai-models/lib/python3.10/site-packages/torch/nn/modules/module.py:1553, in Module._wrapped_call_impl(self, *args, **kwargs)
   1551     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1552 else:
-> 1553     return self._call_impl(*args, **kwargs)

File ~/.anaconda3/envs/ai-models/lib/python3.10/site-packages/torch/nn/modules/module.py:1562, in Module._call_impl(self, *args, **kwargs)
   1557 # If we don't have any hooks, we want to skip the rest of the logic in
   1558 # this function, and just call forward.
   1559 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1560         or _global_backward_pre_hooks or _global_backward_hooks
   1561         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1562     return forward_call(*args, **kwargs)
   1564 try:
   1565     result = None

File ~/.anaconda3/envs/ai-models/lib/python3.10/site-packages/anemoi/models/models/encoder_processor_decoder.py:176, in AnemoiModelEncProcDec.forward(self, x, model_comm_group)
    170 ensemble_size = x.shape[2]
    172 # add data positional info (lat/lon)
    173 x_data_latent = torch.cat(
    174     (
    175         einops.rearrange(x, "batch time ensemble grid vars -> (batch ensemble grid) (time vars)"),
--> 176         self.node_attributes(self._graph_name_data, batch_size=batch_size),
    177     ),
    178     dim=-1,  # feature dimension
    179 )
    181 x_hidden_latent = self.node_attributes(self._graph_name_hidden, batch_size=batch_size)
    183 # get shard shapes

File ~/.anaconda3/envs/ai-models/lib/python3.10/site-packages/torch/nn/modules/module.py:1729, in Module.__getattr__(self, name)
   1727     if name in modules:
   1728         return modules[name]
-> 1729 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'AnemoiModelEncProcDec' object has no attribute 'node_attributes'

Accompanying data

No response

Organisation

MeteoGalicia

@krinchyman krinchyman added the bug Something isn't working label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant