Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netperf_stress: Support running netperf clients on systems with more than 256 CPUs #4062

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bgartzi
Copy link
Contributor

@bgartzi bgartzi commented Feb 7, 2025

Netperf client couldn't start on systems with more than 256 cpus. This impacts the ability to run netperf_stress tests and possible some other more in those systems.

This patch tries to alleviate the issue by modifying the number of maximum cpus supported by netperf prior to the binary compilation.

To do so, the patch is divided in two commits:

  1. To compile netperf, first the source code needs to be decompressed. Then, the code is compiled into a binary. Both commands were run through the same remote session. The first commit in this patch splits the pre-compilation and compilation steps into two different steps. That way, we could modify netperf's source code on-the-go if needed.
  2. The second commit adds a mechanism to detect whether the system running netperf has a number of CPUS bigger than the value defined in netperf's MAXCPUS or not. First, it finds the number of CPUs the system is running on. Then, it finds the source code file in which MAXCPUS is defined and its value. If the value found is smaller than the number of CPUs in the system, it updates it's value by issuing a sed command. If it is bigger, the value is left as-is. This happens prior to compilation.

ID: 3365

When compiled, netperf source code is decompressed first, then it is
compiled. However, both of these steps happen under the same subprocess
command.

There could be some cases in which it would be valuable to sepparate
those steps. For example, if some further netperf source code
modifications were needed.

Signed-off-by: Beñat Gartzia Arruabarrena <[email protected]>
netperf supports systems with up to 256 CPUs by default. There are,
though, larger systems with more CPUs that netperf struggles to work on.
According to netperf's error traces, modifying the number of MAX CPUs
previous to compilation is encouraged.

This patch adds the possibility of modifying the source code if needed.
To do so, it locates which is the file defining the macro (although the
default is src/netlib.h) and the value assigend to it. Then, it checks
the number of cpus the system running netperf has. If the number of cpus
exceeds the configured value of MAXCPUS, then it modifies the value
prior to compilation.

Signed-off-by: Beñat Gartzia Arruabarrena <[email protected]>
@bgartzi bgartzi force-pushed the netperf-update_max_cpus branch from 1c5dac5 to 15c9fee Compare February 7, 2025 13:21
@bgartzi
Copy link
Contributor Author

bgartzi commented Feb 7, 2025

@rh-jugraham could you have a look these patches and confirm that it would help fix your issue?

@rh-jugraham
Copy link
Contributor

@bgartzi This does address the issue!!

To test it out, since there are currently very few machines with a CPU count > 256, I tested to ensure this change didn't negatively affect the tests when the machine already had fewer than 256 CPUs - all tests passed. Then, I made a slight modification to have the code set MAXCPUS to a smaller value, which in turn caused the tests to fail as expected - this showed that MAXCPUS was successfully modified and it had the expected effect on the tests.

max_cpus_file_path, current_max_cpus = self._get_current_max_cpus()
        if current_max_cpus >= n_cpus:
            LOG.debug("Bypassing netperf's MAXCPUS value modification")
            # return
        n_cpus = 32     # test that changing MAXCPUS affects the test
        LOG.info("Increasing netperf's MAXCPUS to %d" % n_cpus)
        sed_cmd = 'sed -i "s/^\(#define *MAXCPUS *\)[0-9]* /\\1%d/g" %s' % (
            n_cpus,
            max_cpus_file_path,
        )

@bgartzi
Copy link
Contributor Author

bgartzi commented Feb 10, 2025

Thanks for confirming @rh-jugraham!
@yiqianwei, @yanglei-rh could you share your thoughts about this patch? It shouldn't add any regression, but it could affect test cases based on netperf, so I think you might be interested too.
I'm concerned about windows guests, although I couldn't find any similar issue reported for that scenario.

  • This issue is related to number of CPUs, rather than the system being ARM or x86.
  • If I understand correctly, windows netperf testing is based on an .exe that is precompiled (see https://github.com/autotest/tp-qemu/blob/master/deps/netperf/netperf.exe). That is, it's not compiled right before the test as it happens with linux guests, at least on tp-qemu.
  • So this patch wouldn't fix the issue for windows guests with more than 256 CPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants