Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throughput of jobs slows down over time - part deux #6419

Closed
chu11 opened this issue Nov 7, 2024 · 5 comments · Fixed by #6422
Closed

throughput of jobs slows down over time - part deux #6419

chu11 opened this issue Nov 7, 2024 · 5 comments · Fixed by #6422
Assignees

Comments

@chu11
Copy link
Member

chu11 commented Nov 7, 2024

While working on #6414 , I noticed that job throughput degrades over time. (similar to the old #3583)

Running throughput test of 8192 jobs over and over again in an instance via this old script

count=$1
n=$2
echo "Running throughput.py ${count} jobs per iteration, ${n} iterations"
for i in `seq 1 ${n}`
do
    src/test/throughput.py -n ${count} | grep throughput
done
>./test_throughput.sh 8192 15
throughput:     487.3 job/s (script: 475.7 job/s)
throughput:     405.4 job/s (script: 397.8 job/s)
throughput:     288.0 job/s (script: 284.2 job/s)
throughput:     307.4 job/s (script: 303.1 job/s)
throughput:     278.4 job/s (script: 274.9 job/s)
throughput:     276.4 job/s (script: 272.9 job/s)
throughput:     220.1 job/s (script: 217.9 job/s)
throughput:     210.3 job/s (script: 208.3 job/s)
throughput:     128.8 job/s (script: 128.0 job/s)
throughput:     83.6 job/s (script:  83.3 job/s)
throughput:     67.6 job/s (script:  67.4 job/s)
throughput:     57.1 job/s (script:  56.9 job/s)

(I broke out of the test after 12 iterations, not wanting to bother waiting anymore)

Some degradation is to be expected, as memory gets eaten up and what not, but this seems a tad larger decrease than I'd expect. In total we're still < 100K total jobs in the results above, so it shouldn't be hogging too much memory. Perhaps a round of perf analysis would be good to do, perhaps we got some bottleneck somewhere.

@grondo
Copy link
Contributor

grondo commented Nov 8, 2024

Can you retry without the history jobtap plugin? i.e. flux jobtap remove .history

@grondo
Copy link
Contributor

grondo commented Nov 8, 2024

Yeah, I think the history plugin is most of the issue here. Without it, I'm seeing more of a linear gradual slowdown.

This makes sense since I think the history plugin adds jobs to a list without bound using zlistx_insert() which is not as lightweight as zlistx_add_end().

grondo@fluke1:~/git/flux-core.git$ flux jobtap remove .history
grondo@fluke1:~/git/flux-core.git$ for i in {1..12}; do src/test/throughput.py -n 8192 | grep throughput; done
throughput:     372.5 job/s (script: 367.2 job/s)
throughput:     353.4 job/s (script: 349.5 job/s)
throughput:     323.9 job/s (script: 320.4 job/s)
throughput:     315.7 job/s (script: 311.9 job/s)
throughput:     308.3 job/s (script: 304.6 job/s)
throughput:     312.8 job/s (script: 309.0 job/s)
throughput:     308.1 job/s (script: 304.7 job/s)
throughput:     307.5 job/s (script: 303.4 job/s)
throughput:     300.3 job/s (script: 297.2 job/s)
throughput:     297.6 job/s (script: 294.7 job/s)
throughput:     296.0 job/s (script: 293.2 job/s)
throughput:     282.1 job/s (script: 279.5 job/s)

And actually if I restart the test the throughput is (mostly) restored. So, probably 100% attributable to the history plugin:

$ for i in {1..12}; do src/test/throughput.py -n 8192 | grep throughput; done
throughput:     358.3 job/s (script: 354.0 job/s)
throughput:     319.9 job/s (script: 315.5 job/s)
throughput:     309.7 job/s (script: 306.0 job/s)
throughput:     302.5 job/s (script: 299.2 job/s)
throughput:     296.0 job/s (script: 292.2 job/s)
throughput:     295.6 job/s (script: 292.3 job/s)
throughput:     293.5 job/s (script: 290.5 job/s)
throughput:     292.4 job/s (script: 288.7 job/s)
throughput:     288.0 job/s (script: 285.3 job/s)
throughput:     280.2 job/s (script: 277.3 job/s)
throughput:     276.8 job/s (script: 274.1 job/s)
throughput:     277.4 job/s (script: 274.7 job/s)

@chu11
Copy link
Member Author

chu11 commented Nov 8, 2024

good guess, and I'm thinking ....

    else if (streq (topic, "job.inactive-add")) {
        if (hola_list_find (hist->users, key, entry)) {
            job_entry_destroy (entry);
            return 0;
        }
        if (!hola_list_insert (hist->users, key, entry, false)) {
            job_entry_destroy (entry);
            return -1;
        }
    }

that hola_list_find() can't be good, probably just scans a longer and longer list with every inactive job. Could add a hash to do a simple "is in in the list yet" check? Would atleast make that part much faster.

Also ... I'm wondering if that false for the low_value to hola_list_insert() is correct. I think the lists have newer entries first, meaning t_submit bigger. So on average newer inactive jobs should be towards the front of the list? I could be getting confused by these comparators, but maybe a good idea to mess around in here a bit.

@grondo
Copy link
Contributor

grondo commented Nov 8, 2024

I probably wouldn't spend too much time on it, since in a high throughput scenario you can just unload the plugin. flux job last is a nice to have, but probably not necessary mixed with very large workloads.

@chu11 chu11 self-assigned this Nov 8, 2024
@chu11
Copy link
Member Author

chu11 commented Nov 8, 2024

yeah, I think these insert flags are backwards.

diff --git a/src/modules/job-manager/plugins/history.c b/src/modules/job-manager/plugins/history.c
index 8d761ee55..1fc929a04 100644
--- a/src/modules/job-manager/plugins/history.c
+++ b/src/modules/job-manager/plugins/history.c
@@ -165,13 +165,13 @@ static int jobtap_cb (flux_plugin_t *p,
             job_entry_destroy (entry);
             return 0;
         }
-        if (!hola_list_insert (hist->users, key, entry, false)) {
+        if (!hola_list_insert (hist->users, key, entry, true)) {
             job_entry_destroy (entry);
             return -1;
         }
     }
     else if (streq (topic, "job.new")) {
-        if (!hola_list_insert (hist->users, key, entry, false)) {
+        if (!hola_list_insert (hist->users, key, entry, true)) {
             job_entry_destroy (entry);
             return -1;
         }

the lists are sorted with bigger t_submit earlier in the list. low_value being false means it starts inserting from the end of the list. For job.new we'd always expect it to go in the front of the list. For job.inactive-add, we'd expect on average it to be closer to the front than the end (especially for high throughput and after many many jobs). (And that hola_list_find() should be fine ... on average it should find the job towards the front of the list and not be too costly).

trying this out

>./test_throughput.sh 8192 12
throughput:     450.9 job/s (script: 440.6 job/s)
throughput:     411.1 job/s (script: 403.5 job/s)
throughput:     391.2 job/s (script: 384.3 job/s)
throughput:     376.8 job/s (script: 370.3 job/s)
throughput:     349.4 job/s (script: 343.8 job/s)
throughput:     330.2 job/s (script: 325.1 job/s)
throughput:     376.8 job/s (script: 370.3 job/s)
throughput:     352.7 job/s (script: 347.0 job/s)
throughput:     340.7 job/s (script: 335.4 job/s)
throughput:     349.5 job/s (script: 344.1 job/s)
throughput:     345.1 job/s (script: 339.7 job/s)
throughput:     359.3 job/s (script: 353.4 job/s)

chu11 added a commit to chu11/flux-core that referenced this issue Nov 8, 2024
Problem: Internal job history lists are sorted with larger
t_submit values at the front of the list.  When new jobs
are inserted (i.e. jobs with bigger t_submit values), they
are inserted with a search starting at the back of the list.
This is the opposite of what should be done and leads to a
slowdown in job throughput over time.

Insert into the job history list starting at the front.

Fixes flux-framework#6419
@mergify mergify bot closed this as completed in #6422 Nov 8, 2024
@mergify mergify bot closed this as completed in 722d786 Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants