Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solution diverging while using GPU for running VEROS #687

Open
sb4233 opened this issue Dec 26, 2024 · 5 comments
Open

Solution diverging while using GPU for running VEROS #687

sb4233 opened this issue Dec 26, 2024 · 5 comments

Comments

@sb4233
Copy link

sb4233 commented Dec 26, 2024

Opening a new issue which was first reported here - #669 , as the problem is still not solved.

While using GPU for running VEROS, the solution diverges -

RuntimeError: solution diverged at iteration 411577
srun: error: gpu1: task 0: Exited with exit code 1

I checked this with both the global_1deg.py and global_flexible.py setups.

@dionhaefner
Copy link
Collaborator

How can I reproduce this? Are you running those setups unmodified?

@sb4233
Copy link
Author

sb4233 commented Dec 28, 2024

How can I reproduce this? Are you running those setups unmodified?

Yeah, I am just using the default setups.
About the runs, I am doing a 1000-year simulation (10 x 100-year simulation) with the veros resubmit method on a single GPU.

@dionhaefner
Copy link
Collaborator

So how long can you run until it diverges?

@sb4233
Copy link
Author

sb4233 commented Dec 28, 2024

So how long can you run until it diverges?

Mostly it diverges at the end of first 100-year simulation.

@dionhaefner
Copy link
Collaborator

Could you please try again with this environment variable set?

$ export VEROS_LINEAR_SOLVER=scipy

(I suggest you write restart files in regular intervals, e.g. every 5 years or so, which should come in handy for future debugging.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants