-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PU-level scheduling in resource
& differing behavior from sched-simple
#624
Comments
Seems a bit counterintuitive that Untested, but I think you can specialize
In theory this should populate only NODES, PUs and GPUs into the resource graph store. The job spec however should specify "pu" as the requesting resources not "cores." Presumably, |
I think sched-simple just treats core as "pu" (an hwloc term I'm not sure we should adopt into our jobspec, but that is a different issue) for testing since jobspec v1 doesn't accept any way to specify threads in addition to cores. It is convenient to be able to run with up to as many processes as the system reports CPUs. i.e. there was no "sinister master plan" for sched-simple to behave differently. 😉 I agree eventually we should fix this, though maybe we wait until a jobspec v2? |
Though maybe |
I agree "pu" probably is a bad resource name.
Sure. We probably want to do this before the initial system instance deployment though. |
I agree! I think that is the behavior that I personally want 95% of the time.
Darn! I was hoping for a juicy conspiracy here.....guess I'll just go back to spreading pandemic-related conspiracies with my free time.
WIP Jobspec V2 is already up, so we can discuss the relevant jobspec-specifics there: flux-framework/rfc#229
"hw_thread" sounds good to me for the resource name. FWIW, it looks like python's argparse supports multi-character short args, so
|
BTW, if Looking at |
Ack! It seems dangerous to support grouping and multi-character short args at the same time! Could you support multiple long args |
Good call! Yeah, that does seem possible:
Opened a flux-core issue to track that: flux-framework/flux-core#2857 |
Coming back around to this due to recent issues in core flux-framework/flux-core#2968. This seemed to have become a real problem once cpu affinity support was added to the shell, at which time treating PUs as cores become an error. In the meantime, I unfortunately forgot at the time that sched-simple operated in this way, ultimately causing the bugs. I'll fix the sched-simple issue (actually a |
Thanks @grondo. BTW, will there be a case where users want to specialize their scheduling and schedule at the PU level (Hyperthreading)? Just FYI, if this specialization is needed, I think we can do this at the Fluxion level. (Setting the hwloc whitelist to include PU type resource) However, we need to understand how the R information is used at the job shell level... (we may still hit the same issue that you are working on though...). jobshell would understand the resource type named PU right? What API are you using for cpu affinity? |
I think we'll need _R_v2 for this to work. Currently R only contains a "core" id list for any rank children. Currently the job shell does not know about resource type "PU", since only "core" and "gpu" are allowed types in the RFC20 execution section. The shell affinity plugin uses |
I think flux-core also |
When running with
simple-sched
, it appears that the behavior when specifying-c1
toflux-mini
, is to run the process on a PU.That does not appear to be the case with
resource
:It seems wrong to me that the same jobspec has such different behavior under the two different schedulers. Do we have a way in flux-sched to enable PU-level scheduling? I'm wondering if this is something that needs to be handled at the
flux-mini
level? Ultimately, not sure I have many intelligent thoughts on this right now, but I wanted to at least document it.The text was updated successfully, but these errors were encountered: