Skip to content

Create "pidfile" to prevent accidental restarts #13

@pinkwah

Description

@pinkwah

When starting two jobs on the same data, they will conflict without warning.

We can write a .{case}.pid file which contains the process ID of the mpirun application. When it completes, this file is deleted.

When starting a new run, we check for the existence of this file, and we check for the existence of the process. On Linux, this can be done by checking for the existence of /proc/{pid} directory. Bonus points if we check /proc/{pid}/exe, which is a symlink that points to the executable. If the executable path starts with /prog/pflotran, we can assume it's some Cirrus process. If the executable points somewhere else, it is because mpirun finished and we didn't delete the PIDfile (eg. we crashed), and the PID number got reused for a different process.

We tell the user that the process is still running and either ask them whether they want to kill it, or give them instructions on what to do.

For LSF, we can record the jobID as LSF={jobid} in the .pid file. When runcirrus starts, it checks using bjobs whether jobid still exists. If it does, we tell the user and either ask them whether they want to bkill or give instruction on how to do this.

Ditto for PBS.

Note that each RGS node has different /proc directory (they're different computers). It's possible to trick this check by starting Cirrus on one node and then start Cirrus on a different node. The PID is not valid on the other node, so we will end up allowing it. However, anyone funky enough to do this must know what they're doing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions