Register observer & correct clock setting #242

fjwillemsen · 2024-02-08T16:32:18Z

This pull request adds a built-in Register Observer. This observer works for the PyCUDA, CuPy, and CUDA-Python backends. On unsupported backends, it gives a NotImplementedError.
In addition, the pull request improves the efficiency with which clocks are set, and does not count the time spent doing so towards the benchmark time.

…backends

…ar one where the memory clock would always be seen as not-equal due to a rounding error

…goes to framework time instead of benchmark time

…ck on gpu-architecture compiler option, added gpu-architecture auto-adding to CuPy

…d tests for this function, removed setting --gpu-architecture for CuPy as it is already set internally

csbnw

I happened to stumble upon this PR and out of curiosity had a look at the changes and couldn't help to add some comments. My main 'complaint't is that some scope creep seems to have occurred regarding the L2 flushing. In my opinion, it should really be moved into a separate PR.

test/test_util_functions.py

csbnw · 2024-03-01T14:08:09Z

kernel_tuner/backends/cupy.py

@@ -46,6 +47,7 @@ def __init__(self, device=0, iterations=7, compiler_options=None, observers=None
        self.devprops = dev.attributes
        self.cc = dev.compute_capability
        self.max_threads = self.devprops["MaxThreadsPerBlock"]
+        self.cache_size_L2 = self.devprops["L2CacheSize"]


After reading the PR text, it comes as a surprise to me that L2 related things have been changed.

csbnw · 2024-03-01T14:11:13Z

kernel_tuner/backends/cupy.py

@@ -124,6 +126,7 @@ def compile(self, kernel_instance):
        compiler_options = self.compiler_options
        if not any(["-std=" in opt for opt in self.compiler_options]):
            compiler_options = ["--std=c++11"] + self.compiler_options
+        # CuPy already sets the --gpu-architecture by itself, as per https://github.com/cupy/cupy/blob/20ccd63c0acc40969c851b1917dedeb032209e8b/cupy/cuda/compiler.py#L145


Is there a recommend version of CuPy for KernelTuner? If so, consider replacing the link to e.g. https://github.com/cupy/cupy/blob/v13/cupy/cuda/compiler.py#L145 for v13 of CuPy. Something like https://pypi.org/project/bump2version/ may be helpful to keep such version numbers up to date going forward.

Good point, it should indeed be version independent. I'll change it to main instead for now.

kernel_tuner/backends/nvcuda.py

csbnw · 2024-03-01T14:12:52Z

kernel_tuner/backends/opencl.py

+        # TODO the L2 cache size request fails
+        # self.cache_size_L2 = self.ctx.devices[0].get_info(
+        #     cl.device_affinity_domain.L2_CACHE
+        # )


Open a new issue to keep track of this?

csbnw · 2024-03-01T14:18:40Z

kernel_tuner/runners/sequential.py

@@ -100,7 +100,7 @@ def run(self, parameter_space, tuning_options):
                params = process_metrics(params, tuning_options.metrics)

            # get the framework time by estimating based on other times
-            total_time = 1000 * (perf_counter() - self.start_time) - warmup_time
+            total_time = 1000 * ((perf_counter() - self.start_time) - warmup_time)    # TODO is it valid that we deduct the warmup time here?


It depends on when self.start_time (before or after the warm-up?) is set and whatever perf_counter() returns. I don't know about these KernelTuner details to give a definitive answer, though.

csbnw · 2024-03-01T14:19:04Z

kernel_tuner/util.py

@@ -221,7 +221,7 @@ def check_block_size_names(block_size_names):
        if not isinstance(block_size_names, list):
            raise ValueError("block_size_names should be a list of strings!")
        if len(block_size_names) > 3:
-            raise ValueError("block_size_names should not contain more than 3 names!")
+            raise ValueError(f"block_size_names should not contain more than 3 names! ({block_size_names=})")


Useful for debugging, but should it be included in this PR?

csbnw · 2024-03-01T14:20:03Z

kernel_tuner/util.py

@@ -570,6 +570,24 @@ def get_total_timings(results, env, overhead_time):
    return env


+def to_valid_nvrtc_gpu_arch_cc(compute_capability: str) -> str:
+    """Returns a valid Compute Capability for NVRTC `--gpu-architecture=`, as per https://docs.nvidia.com/cuda/nvrtc/index.html#group__options."""
+    valid_cc = ['50', '52', '53', '60', '61', '62', '70', '72', '75', '80', '87', '89', '90', '90a']    # must be in ascending order, when updating also update test_to_valid_nvrtc_gpu_arch_cc


Here you do have the Pascal, Volta and Turing architectures! Can't you put these in some global list and use them in the tests as well to avoid having this list duplicated?

kernel_tuner/util.py

csbnw · 2024-03-01T14:23:12Z

kernel_tuner/util.py

+    if len(compute_capability) < 2:
+        raise ValueError(f"Compute capability '{compute_capability}' must be at least of length 2, is {len(compute_capability)}")
+    if compute_capability in valid_cc:
+        return compute_capability
+    # if the compute capability does not match, scale down to the nearest matching
+    subset_cc = [cc for cc in valid_cc if compute_capability[0] == cc[0]]
+    if len(subset_cc) > 0:
+        # get the next-highest valid CC
+        highest_cc_index = max([i for i, cc in enumerate(subset_cc) if int(cc[1]) <= int(compute_capability[1])])
+        return subset_cc[highest_cc_index]


This seems like a rather complicated way of trying to match the input compute_capability to something in valid_cc. I am sure there must be a way to make it simpler and more readable.

I agree, the problem is we can't use integer comparison because it's also possible to have e.g. 90a, and we can only match within the major number. Any suggestions?

How about converting 90a to 901 and 90 to 900. I.e.: multiply the integer part of the number by 10 and add the ordinal value of the letter following (if any) to it.

fjwillemsen · 2024-03-01T15:58:22Z

I happened to stumble upon this PR and out of curiosity had a look at the changes and couldn't help to add some comments. My main 'complaint't is that some scope creep seems to have occurred regarding the L2 flushing. In my opinion, it should really be moved into a separate PR.

@csbnw good comments! There is indeed some feature creep in this PR due to ongoing research 😅 the L2 stuff was just in here so Ben could read along, but I'll indeed create a separate PR for it. Converting back to draft for now.

sonarqubecloud · 2024-03-01T16:55:16Z

Quality Gate passed

Issues
6 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

sonarqubecloud · 2024-04-22T08:57:33Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

fjwillemsen added 2 commits February 8, 2024 16:57

Added RegisterObserver with common interface among backends

81a68a4

Added test for RegisterObserver, added clause in case of mocktest

943b3c4

fjwillemsen added the enhancement label Feb 8, 2024

fjwillemsen requested a review from benvanwerkhoven February 8, 2024 16:32

fjwillemsen added 8 commits February 9, 2024 11:43

Added useful error message in case Register Observer is not supported

1681730

Added tests for Register Observer for OpenCL and HIP backends

f153945

Added instruction for pointing cache directory elsewhere

7bd7c2b

Non-argument streams are now correctly passed in the CuPy and NVCUDA …

9dea137

…backends

Fixed several issues pertaining to the setting of clocks, in particul…

df54145

…ar one where the memory clock would always be seen as not-equal due to a rounding error

Time spent setting NVML parameters (clock & memory frequency, power) …

4cc4a13

…goes to framework time instead of benchmark time

Time spent setting NVML parameters (clock & memory frequency, power) …

e309bc1

…goes to framework time instead of benchmark time

Removed redundant print statement

d6aac8b

fjwillemsen changed the title ~~Register observer~~ Register observer & correct clock setting Feb 16, 2024

loostrum mentioned this pull request Feb 16, 2024

Add Tegra Observer to control clocks on Jetson devices #243

Merged

fjwillemsen added 7 commits February 28, 2024 10:13

Added L2 cache size property to CUDA backends

a020791

Added specification to CUPY compiler options

6e6e5fb

Added L2 cache size property to OpenCL, HIP and mocked PyCUDA backends

f15338f

Added function to check for compute capability validity, improved che…

00ac419

…ck on gpu-architecture compiler option, added gpu-architecture auto-adding to CuPy

Added a flush kernel to clear the L2 cache between runs

55ab074

Added a flush kernel to clear the L2 cache between runs

e106bae

Made function for scaling the compute capability to a valid one, adde…

0cb5e3a

…d tests for this function, removed setting --gpu-architecture for CuPy as it is already set internally

csbnw reviewed Mar 1, 2024

View reviewed changes

fjwillemsen marked this pull request as draft March 1, 2024 15:58

Applied suggestions from comments by @csbnw

b682506

fjwillemsen marked this pull request as ready for review March 1, 2024 16:51

Removed redundant comments / printing

da907b1

benvanwerkhoven added 2 commits April 22, 2024 09:41

fix typo

1091513

simplified code, break if persistence mode cant be set

8883021

benvanwerkhoven added 5 commits April 22, 2024 09:51

removed unused import

caff3dc

formatted with black

2ec9417

simplified to_valid_nvrtc_gpu_arch_cc

ee16b86

forgot to enclose in list

bd0a40c

simplified and formatted with black

dfd3da9

benvanwerkhoven merged commit d792304 into master Apr 22, 2024

Register observer & correct clock setting #242

Register observer & correct clock setting #242

Uh oh!

Conversation

fjwillemsen commented Feb 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csbnw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjwillemsen commented Mar 1, 2024

Uh oh!

sonarqubecloud bot commented Mar 1, 2024

Quality Gate passed

Uh oh!

sonarqubecloud bot commented Apr 22, 2024

Quality Gate passed

Uh oh!

Uh oh!

fjwillemsen commented Feb 8, 2024 •

edited

Loading