-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NVTX support via DrHook #23
Conversation
Hi Philippe thanks for pushing this, looks like a great feature!!! A comment perhaps is to rename the module names and subroutines (C,Fortran) with prefix "my..." to something more unambiguous like "fiat_", or better "drhook_" ? Another suggestion is to keep this feature optional even when using the NVIDIA compiler: ecbuild_add_option( DR_HOOK_NVTX ... )
# or instead : ecbuild_add_option( NVTX ... ) The DEFAULT could be ON for this option when NVIDIA compiler is detected if you wish. The nvtx files should then be compiled only when this feature is ON (
And guard the specific code that is currently guarded with A slightly larger refactoring, but useful, however is that drhook is also to be used from C code. @ioanhadade , @marsdeno could you also review please? |
Hello Willem, I followed your advice and renamed my* to dr_nvtx, then added the DR_NVTX option and I now use the HAVE_DR_NVTX macro everywhere instead of __PGI. All of this does not work with nvhpc/21.9 (I tried it on our cluster), but it works with nvhpc/24.3; I tried changing that in ./.github/tools/install-nvhpc.sh, but github tests fail with "No space left on device". What should I do ? |
I have seen a similar issue with nvhpc in Github actions. For CLOUDSC we use this "hack": |
I followed Balthasar advice, but it did not work. So CI is failing for nvhpc builds and I do not know what to do about it. |
@pmarguinaud I will have a look at making this work. In the mean time, please, could you look into getting nvtx to work via drhook.c ? We could then e.g. use |
Hi @pmarguinaud , in the mean time I have updated the develop branch to use nvhpc-24.3 in github actions. It also sets "NVHPC_ROOT" environment variable which you need for the You can now rebase this branch, minus the added GitHub actions changes, on develop. |
I have merged develop and added a test case for NVTX. Instrumenting C code is not a priority for me, but you can do it yourself if it is important for you. |
Hi @Andrew-Beggs-ECMWF discussing offline with @ioanhadade, @ioanhadade suggested for you to have a look at the possibility of integrating this within drhook itself. |
ilast++; | ||
stack[ilast] = elem; | ||
elem->calls ++; | ||
if (elem->calls >= 11 && elem->elapsed < 0.0001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pmarguinaud, I've been doing some work on this patch to integrate it into the core of drhook so that it's faster, works with all support drhook languages, and is easier for us to maintain.
This condition is giving me a bit of confusion though. My understanding is that it's meant to eliminate high call but low cumulative runtime drhook regions from the nvtx trace, so as to reduce noise in the trace. However, the way it's implemented (particularly lines 72-75) means that once you go over 10 calls, it will always be skipped. This also adds a fun situation where the following recursive code won't close any nvtx regions it opens:
void foo(void) {
static int calls = 0;
calls++;
int skipstart = dr_hook_nvtx_map_start("foo");
std::cout<<"In call "<<calls<<" Skipped? "<<!skipstart<<std::endl;
if (calls < 11) {
foo();
}
int skipend = dr_hook_nvtx_map_stop();
std::cout<<"Out call "<<calls<<" Skipped? "<<!skipend<<std::endl;
calls--;
return;
}
This gave the following output:
In call 1 Skipped? 0
In call 2 Skipped? 0
In call 3 Skipped? 0
In call 4 Skipped? 0
In call 5 Skipped? 0
In call 6 Skipped? 0
In call 7 Skipped? 0
In call 8 Skipped? 0
In call 9 Skipped? 0
In call 10 Skipped? 0
In call 11 Skipped? 1
Out call 11 Skipped? 1
Out call 10 Skipped? 1
Out call 9 Skipped? 1
Out call 8 Skipped? 1
Out call 7 Skipped? 1
Out call 6 Skipped? 1
Out call 5 Skipped? 1
Out call 4 Skipped? 1
Out call 3 Skipped? 1
Out call 2 Skipped? 1
Out call 1 Skipped? 1
Am I correct in thinking this is not the intended behaviour and I should attempt to fix it in my patch?
I've now finished my patch and have it as a branch on my own fork of fiat. Do we want to have it merged into pmarguinaud's branch (which should then make it part of this pr once it's merged) or start a new pr directly to ecmwf-ifs:develop? The outcome is the same, just a matter of how it's merged : ) |
Let's make a new PR for it please. |
@pmarguinaud this functionality has been merged with PR #28 . I will close this PR now. If you have more feedback please let us know. |
This PR enables NVTX instrumentation using DrHook (PGI only). This is enabled with the two following environment variables :
This code has been provided courtesy of Louis Stuber (NVIDIA) and Lukas Mosimann (NVIDIA).
Example of an ARPEGE nsys profile with NVTX enabled :