Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added some support for tracking allocations #1904

Merged
merged 4 commits into from
Feb 13, 2024

Conversation

HawkmoonEternal
Copy link

Based on the discussions with @rhc54 , @sonjahapp, and Isaias in the Dynamic Workflows WG and email threads I started implementing some support for tracking allocations in PRRTE. The required changes turned out to be more extensive than I expected and more work will be required also for Session_control. I'm a bit tied up with other work in the coming weeks so I thought I'd just share what I have so far.

As Ralph suggested I created a prte_session_t struct which tracks the nodes, jobs, and child sessions associated with a session.

So far I have added:

  • support for answering allocation requests from remote daemons
  • created a prte_session_t struct which tracks the nodes, jobs, and child sessions associated with a session
  • support for PMIX_ALLOC_ID in PMIx_Spawn
  • restrict mapping to nodes in a specific session
  • creation/adaption of sessions based on PMIX_ALLOC_NEW/EXTEND/RELEASE

Some important assumptions:

  1. The PMIX_ALLOC_ID is a string representation of the uint64_t PMIX_SESSION_ID.
  2. Jobs are only allowed to spawn into their own session or into a child session created with PMIX_ALLOC_NEW.
  3. The default session's node array IS the global node pool. Some care is required when releasing the session struct but I figured this way it would be easier to track the nodes in the default session
  4. By default jobs run in the default session
  5. The PRRTE DVM job runs in the default session

I did some testing with the following scripts:
https://github.com/HawkmoonEternal/allocation_tracking_test

The allocation tracking seems to work for these test cases but I think some more extensive testing is required.
I also noticed that the default behavior of PRRTE is to abort all child jobs if the parent job is completed. I was wondering if we should maybe introduce an attribute in the PMIx_Spawn call to indicate that a job should persist. I could imagine situations such as postprocessing, insitu tasks etc. where the child jobs are expected to outlive the parent job.

Looking forward to your feedback and discussing this in the next Working Group meeting.

@rhc54
Copy link
Contributor

rhc54 commented Jan 6, 2024

Overall, this is impressive! Many thanks for doing it!

I see a couple of places where things need some adjusting, but the overall approach is sound. I'm working on job cleanup right now as there is a problem with the current code being a little overly aggressive and removing the contact files and session directories of other tools sharing the node (e.g., the prte daemon removing the scheduler's contact info). Should have that done soon.

Committing that work will generate some conflicts here, but nothing too horrible. I'll pick this branch up and resolve those, plus address the adjustments, and then push it back up here.

I believe we do have an attribute to indicate that a job should persist beyond its parent:

#define PMIX_NOHUP   (bool)
Any processes started on behalf of the calling tool (or the
specified namespace, if such specification is included in the
list of attributes) should continue after the tool disconnects
from its server

Although originally intended for a tool calling spawn, you can see that it was extended to apply to anyone calling spawn. I thought I had included support for that in PRRTE, but I'd have to look.

@rhc54
Copy link
Contributor

rhc54 commented Feb 8, 2024

I am working my way thru this change and found a few things that need addressing. In order to figure it all out, I've started adding documentation describing the structures being added/modified and how they are used, along with a "flow of control" for the entire scheduler interaction sequence.

I'd like to push the changes up here, but I had to rebase the branch to bring it up to date. I'll try to force push it and see if I can get the branch updated.

HawkmoonEternal added 2 commits February 13, 2024 08:49
Bring the branch up to date with PRRTE master. Add
documentation on scheduler integration, including
added and modified structures plus flow-of-control
for scheduler operations. Fix a few places that had
some errors. Fill in where some session identification
logic was missing.

Signed-off-by: Ralph Castain <[email protected]>
@rhc54 rhc54 force-pushed the alloc_tracking branch 2 times, most recently from def43ed to 3dd9b1d Compare February 13, 2024 16:21
Release of prte_server_req_t does not free the info
array unless the "copy" flag has been set to true.

Signed-off-by: Ralph Castain <[email protected]>
@rhc54 rhc54 merged commit c4b95f3 into openpmix:master Feb 13, 2024
12 checks passed
@hppritcha
Copy link
Contributor

this PR appears to have broken singleton support, at least in some cases. being investigated.

@rhc54
Copy link
Contributor

rhc54 commented Feb 22, 2024

@hppritcha I suspect you need something like the following:

diff --git a/src/tools/prte/prte.c b/src/tools/prte/prte.c
index 2bb9f571d5..ce69cc00ee 100644
--- a/src/tools/prte/prte.c
+++ b/src/tools/prte/prte.c
@@ -1401,6 +1401,7 @@ static int prep_singleton(const char *name)
     jdata = PMIX_NEW(prte_job_t);
     PMIX_LOAD_NSPACE(jdata->nspace, ptr);
     free(ptr);
+    jdata->session = prte_default_session;
     rc = prte_set_job_data_object(jdata);
     if (PRTE_SUCCESS != rc) {

@hppritcha
Copy link
Contributor

yep and there's more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants