throughput of jobs slows down over time - part deux #6419

chu11 · 2024-11-07T23:55:30Z

While working on #6414 , I noticed that job throughput degrades over time. (similar to the old #3583)

Running throughput test of 8192 jobs over and over again in an instance via this old script

count=$1
n=$2
echo "Running throughput.py ${count} jobs per iteration, ${n} iterations"
for i in `seq 1 ${n}`
do
    src/test/throughput.py -n ${count} | grep throughput
done

>./test_throughput.sh 8192 15
throughput:     487.3 job/s (script: 475.7 job/s)
throughput:     405.4 job/s (script: 397.8 job/s)
throughput:     288.0 job/s (script: 284.2 job/s)
throughput:     307.4 job/s (script: 303.1 job/s)
throughput:     278.4 job/s (script: 274.9 job/s)
throughput:     276.4 job/s (script: 272.9 job/s)
throughput:     220.1 job/s (script: 217.9 job/s)
throughput:     210.3 job/s (script: 208.3 job/s)
throughput:     128.8 job/s (script: 128.0 job/s)
throughput:     83.6 job/s (script:  83.3 job/s)
throughput:     67.6 job/s (script:  67.4 job/s)
throughput:     57.1 job/s (script:  56.9 job/s)

(I broke out of the test after 12 iterations, not wanting to bother waiting anymore)

Some degradation is to be expected, as memory gets eaten up and what not, but this seems a tad larger decrease than I'd expect. In total we're still < 100K total jobs in the results above, so it shouldn't be hogging too much memory. Perhaps a round of perf analysis would be good to do, perhaps we got some bottleneck somewhere.

The text was updated successfully, but these errors were encountered:

grondo · 2024-11-08T00:06:12Z

Can you retry without the history jobtap plugin? i.e. flux jobtap remove .history

grondo · 2024-11-08T03:05:21Z

Yeah, I think the history plugin is most of the issue here. Without it, I'm seeing more of a linear gradual slowdown.

This makes sense since I think the history plugin adds jobs to a list without bound using zlistx_insert() which is not as lightweight as zlistx_add_end().

grondo@fluke1:~/git/flux-core.git$ flux jobtap remove .history
grondo@fluke1:~/git/flux-core.git$ for i in {1..12}; do src/test/throughput.py -n 8192 | grep throughput; done
throughput:     372.5 job/s (script: 367.2 job/s)
throughput:     353.4 job/s (script: 349.5 job/s)
throughput:     323.9 job/s (script: 320.4 job/s)
throughput:     315.7 job/s (script: 311.9 job/s)
throughput:     308.3 job/s (script: 304.6 job/s)
throughput:     312.8 job/s (script: 309.0 job/s)
throughput:     308.1 job/s (script: 304.7 job/s)
throughput:     307.5 job/s (script: 303.4 job/s)
throughput:     300.3 job/s (script: 297.2 job/s)
throughput:     297.6 job/s (script: 294.7 job/s)
throughput:     296.0 job/s (script: 293.2 job/s)
throughput:     282.1 job/s (script: 279.5 job/s)

And actually if I restart the test the throughput is (mostly) restored. So, probably 100% attributable to the history plugin:

$ for i in {1..12}; do src/test/throughput.py -n 8192 | grep throughput; done
throughput:     358.3 job/s (script: 354.0 job/s)
throughput:     319.9 job/s (script: 315.5 job/s)
throughput:     309.7 job/s (script: 306.0 job/s)
throughput:     302.5 job/s (script: 299.2 job/s)
throughput:     296.0 job/s (script: 292.2 job/s)
throughput:     295.6 job/s (script: 292.3 job/s)
throughput:     293.5 job/s (script: 290.5 job/s)
throughput:     292.4 job/s (script: 288.7 job/s)
throughput:     288.0 job/s (script: 285.3 job/s)
throughput:     280.2 job/s (script: 277.3 job/s)
throughput:     276.8 job/s (script: 274.1 job/s)
throughput:     277.4 job/s (script: 274.7 job/s)

chu11 · 2024-11-08T03:32:55Z

good guess, and I'm thinking ....

    else if (streq (topic, "job.inactive-add")) {
        if (hola_list_find (hist->users, key, entry)) {
            job_entry_destroy (entry);
            return 0;
        }
        if (!hola_list_insert (hist->users, key, entry, false)) {
            job_entry_destroy (entry);
            return -1;
        }
    }

that hola_list_find() can't be good, probably just scans a longer and longer list with every inactive job. Could add a hash to do a simple "is in in the list yet" check? Would atleast make that part much faster.

Also ... I'm wondering if that false for the low_value to hola_list_insert() is correct. I think the lists have newer entries first, meaning t_submit bigger. So on average newer inactive jobs should be towards the front of the list? I could be getting confused by these comparators, but maybe a good idea to mess around in here a bit.

grondo · 2024-11-08T04:51:59Z

I probably wouldn't spend too much time on it, since in a high throughput scenario you can just unload the plugin. flux job last is a nice to have, but probably not necessary mixed with very large workloads.

chu11 · 2024-11-08T09:51:48Z

yeah, I think these insert flags are backwards.

diff --git a/src/modules/job-manager/plugins/history.c b/src/modules/job-manager/plugins/history.c
index 8d761ee55..1fc929a04 100644
--- a/src/modules/job-manager/plugins/history.c
+++ b/src/modules/job-manager/plugins/history.c
@@ -165,13 +165,13 @@ static int jobtap_cb (flux_plugin_t *p,
             job_entry_destroy (entry);
             return 0;
         }
-        if (!hola_list_insert (hist->users, key, entry, false)) {
+        if (!hola_list_insert (hist->users, key, entry, true)) {
             job_entry_destroy (entry);
             return -1;
         }
     }
     else if (streq (topic, "job.new")) {
-        if (!hola_list_insert (hist->users, key, entry, false)) {
+        if (!hola_list_insert (hist->users, key, entry, true)) {
             job_entry_destroy (entry);
             return -1;
         }

the lists are sorted with bigger t_submit earlier in the list. low_value being false means it starts inserting from the end of the list. For job.new we'd always expect it to go in the front of the list. For job.inactive-add, we'd expect on average it to be closer to the front than the end (especially for high throughput and after many many jobs). (And that hola_list_find() should be fine ... on average it should find the job towards the front of the list and not be too costly).

trying this out

>./test_throughput.sh 8192 12
throughput:     450.9 job/s (script: 440.6 job/s)
throughput:     411.1 job/s (script: 403.5 job/s)
throughput:     391.2 job/s (script: 384.3 job/s)
throughput:     376.8 job/s (script: 370.3 job/s)
throughput:     349.4 job/s (script: 343.8 job/s)
throughput:     330.2 job/s (script: 325.1 job/s)
throughput:     376.8 job/s (script: 370.3 job/s)
throughput:     352.7 job/s (script: 347.0 job/s)
throughput:     340.7 job/s (script: 335.4 job/s)
throughput:     349.5 job/s (script: 344.1 job/s)
throughput:     345.1 job/s (script: 339.7 job/s)
throughput:     359.3 job/s (script: 353.4 job/s)

Problem: Internal job history lists are sorted with larger t_submit values at the front of the list. When new jobs are inserted (i.e. jobs with bigger t_submit values), they are inserted with a search starting at the back of the list. This is the opposite of what should be done and leads to a slowdown in job throughput over time. Insert into the job history list starting at the front. Fixes flux-framework#6419

chu11 self-assigned this Nov 8, 2024

chu11 mentioned this issue Nov 8, 2024

job-manager/history: optimize list insertion #6422

Merged

mergify bot closed this as completed in #6422 Nov 8, 2024

mergify bot closed this as completed in 722d786 Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

throughput of jobs slows down over time - part deux #6419

throughput of jobs slows down over time - part deux #6419

chu11 commented Nov 7, 2024

grondo commented Nov 8, 2024

grondo commented Nov 8, 2024

chu11 commented Nov 8, 2024 •

edited

Loading

grondo commented Nov 8, 2024

chu11 commented Nov 8, 2024 •

edited

Loading

throughput of jobs slows down over time - part deux #6419

throughput of jobs slows down over time - part deux #6419

Comments

chu11 commented Nov 7, 2024

grondo commented Nov 8, 2024

grondo commented Nov 8, 2024

chu11 commented Nov 8, 2024 • edited Loading

grondo commented Nov 8, 2024

chu11 commented Nov 8, 2024 • edited Loading

chu11 commented Nov 8, 2024 •

edited

Loading

chu11 commented Nov 8, 2024 •

edited

Loading