Skip to content

No module named "x" #202

@HCookie

Description

@HCookie

What happened?

After the recent updates to the install process, something is going wrong with the way venv's are made and used.
PR #201 and ecmwf/earthkit-workflows#152 are my attempts to fix it, at the moment neither should be merged.

Concrete logs of what is happening are as follows

  File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/anemoi/inference/runner.py", line 511, in prepare_output_state
    for state in output:
                 ^^^^^^
  File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/anemoi/inference/runner.py", line 641, in forecast
    self.model.eval()
    ^^^^^^^^^^
  File "/Users/fiab/.local/share/uv/python/cpython-3.12.7-macos-aarch64-none/lib/python3.12/functools.py", line 993, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/anemoi/inference/runner.py", line 554, in model
    raise e
  File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/anemoi/inference/runner.py", line 546, in model
    model = torch.load(self.checkpoint.path, map_location=self.device, weights_only=False).to(self.device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/cascade_runner_venv_vks9w3ce/lib/python3.12/site-packages/torch/serialization.py", line 1475, in load
    overall_storage=overall_storage,
ModuleNotFoundError: No module named 'torch.utils.serialization'

This ModuleNotFoundError also occurs for anemoi.models

This particular import error is the result of the prior install of a particular torch version and invalidation of the caches,

  • torch 2.6 does not have this module (the model suggested version
  • torch 2.8 does (this is the one running in the box)

Note: this shouldn't be an issue as no pickling is taking place

The odd thing is that this is not occuring on the dev server, running Rocky Linux.

Steps that have been taken

  • The PR to earthkit-workflows does seemingly resolve it for anemoi-models
  • Removing torch from the ondemand install list resolves the torch issue
  • Overriding the torch version on the box doesn't seem to help either oddly

What are the steps to reproduce the bug?

Run a forecast on MacOS

Version

latest

Platform (OS and architecture)

MacOS m3

Relevant log output

Accompanying data

No response

Organisation

No response

Metadata

Metadata

Assignees

Labels

backendRelated to the backend - earthkit-workflows - earthkit-workflows-anemoibugSomething isn't working

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions