-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
backendRelated to the backend - earthkit-workflows - earthkit-workflows-anemoiRelated to the backend - earthkit-workflows - earthkit-workflows-anemoibugSomething isn't workingSomething isn't working
Description
What happened?
After the recent updates to the install process, something is going wrong with the way venv's are made and used.
PR #201 and ecmwf/earthkit-workflows#152 are my attempts to fix it, at the moment neither should be merged.
Concrete logs of what is happening are as follows
File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/anemoi/inference/runner.py", line 511, in prepare_output_state
for state in output:
^^^^^^
File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/anemoi/inference/runner.py", line 641, in forecast
self.model.eval()
^^^^^^^^^^
File "/Users/fiab/.local/share/uv/python/cpython-3.12.7-macos-aarch64-none/lib/python3.12/functools.py", line 993, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/anemoi/inference/runner.py", line 554, in model
raise e
File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/anemoi/inference/runner.py", line 546, in model
model = torch.load(self.checkpoint.path, map_location=self.device, weights_only=False).to(self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/torch/serialization.py", line 1475, in load
overall_storage=overall_storage,
ModuleNotFoundError: No module named 'torch.utils.serialization'This ModuleNotFoundError also occurs for anemoi.models
This particular import error is the result of the prior install of a particular torch version and invalidation of the caches,
- torch 2.6 does not have this module (the model suggested version
- torch 2.8 does (this is the one running in the box)
Note: this shouldn't be an issue as no pickling is taking place
The odd thing is that this is not occuring on the dev server, running Rocky Linux.
Steps that have been taken
- The PR to
earthkit-workflowsdoes seemingly resolve it foranemoi-models - Removing
torchfrom the ondemand install list resolves thetorchissue - Overriding the torch version on the box doesn't seem to help either oddly
What are the steps to reproduce the bug?
Run a forecast on MacOS
Version
latest
Platform (OS and architecture)
MacOS m3
Relevant log output
Accompanying data
No response
Organisation
No response
Metadata
Metadata
Assignees
Labels
backendRelated to the backend - earthkit-workflows - earthkit-workflows-anemoiRelated to the backend - earthkit-workflows - earthkit-workflows-anemoibugSomething isn't workingSomething isn't working
Type
Projects
Status
No status