-
Notifications
You must be signed in to change notification settings - Fork 0
HIP Example
The following example will introduce APEX using the HIP programming model.
APEX is integrated with the Roctracer, Rocprofiler, RocTX and ROCm-SMI libraries for HIP measurement support.
The following example is a matrix transpose example written with HIP.
The example has typical HIP API calls, as well as RocTX instrumentation added.
The apex_exec
wrapper script has several options for supporting HIP programs:
--apex:hip enable HIP/ROCTracer measurement (default: off)
--apex:hip_metrics enable HIP/ROCProfiler metric support (default: off)
--apex:hip_driver enable HIP/ROCTracer KSA driver API callbacks (default: off)
--apex:hip_details enable per-kernel statistics where available (default: off)
--apex:monitor_gpu enable GPU monitoring services (CUDA NVML, ROCm SMI)
--apex:gpu_memory enable CPU memory wrapper support
To enable basic HIP support, use the --apex:hip
flag:
[khuck@gilgamesh apex-tutorial]$ apex_exec --apex:hip --apex:tasktree ./build/bin/MatrixTranspose
___ ______ _______ __
/ _ \ | ___ \ ___\ \ / /
/ /_\ \| |_/ / |__ \ V /
| _ || __/| __| / \
| | | || | | |___/ /^\ \
\_| |_/\_| \____/\/ \/
APEX Version: v2.6.1-da0e52e-develop
Built on: 17:54:27 Feb 25 2023 (RelWithDebInfo)
C++ Language Standard version : 201402
Clang Compiler version : AMD Clang 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.2.0 22204 50d6d5d5b608d2abd6af44314abc6ad20036af3b)
Device name
System major 9
System minor 0
## Iteration (9) #################
PASSED!
## Iteration (8) #################
PASSED!
## Iteration (7) #################
PASSED!
## Iteration (6) #################
PASSED!
## Iteration (5) #################
PASSED!
## Iteration (4) #################
PASSED!
## Iteration (3) #################
PASSED!
## Iteration (2) #################
PASSED!
## Iteration (1) #################
PASSED!
## Iteration (0) #################
PASSED!
Start Date/Time: 26/02/2023 13:52:39
Elapsed time: 1.56751 seconds
Total processes detected: 1
HW Threads detected on rank 0: 96
Worker Threads observed on rank 0: 2
Available CPU time on rank 0: 3.13502 seconds
Available CPU time on all ranks: 3.13502 seconds
Counter : #samp | mean | max
--------------------------------------------------------------------------------
1 Minute Load average : 2 2.39 2.39
CPU Guest % : 1 0.00 0.00
CPU I/O Wait % : 1 0.00 0.00
CPU IRQ % : 1 0.04 0.04
CPU Idle % : 1 96.75 96.75
CPU Nice % : 1 0.00 0.00
CPU Steal % : 1 0.00 0.00
CPU System % : 1 0.79 0.79
CPU User % : 1 2.40 2.40
CPU soft IRQ % : 1 0.02 0.02
DRAM Energy : 1 1.00 1.00
GPU: Bytes Allocated: hipMalloc : 2 2.68e+08 2.68e+08
GPU: Bytes Freed: hipFree : 2 2.68e+08 2.68e+08
GPU: CopyDeviceToHost Bytes : 10 0.00 0.00
GPU: CopyHostToDevice Bytes : 10 0.00 0.00
GPU: Total Bytes Occupied on Device : 4 2.68e+08 5.37e+08
Package-0 Energy : 1 81.00 81.00
status:Threads : 2 3.00 4.00
status:VmData kB : 2 8.88e+05 1.36e+06
status:VmExe kB : 2 32.00 32.00
status:VmHWM kB : 2 3.98e+05 7.56e+05
status:VmLck kB : 2 0.00 0.00
status:VmLib kB : 2 1.37e+05 1.37e+05
status:VmPTE kB : 2 1070.00 1780.00
status:VmPeak kB : 2 5.74e+06 1.07e+07
status:VmPin kB : 2 0.00 0.00
status:VmRSS kB : 2 3.98e+05 7.56e+05
status:VmSize kB : 2 5.65e+06 1.06e+07
status:VmStk kB : 2 136.00 136.00
status:VmSwap kB : 2 0.00 0.00
status:nonvoluntary_ctxt_switches : 2 7.00 12.00
status:voluntary_ctxt_switches : 2 43.00 56.00
--------------------------------------------------------------------------------
GPU Timers : #calls| mean | total
--------------------------------------------------------------------------------
GPU: matrixTranspose(float*, float*, int) : 10 0.01 0.05
GPU: CopyDeviceToHost : 10 0.00 0.01
GPU: CopyHostToDevice : 10 0.00 0.01
--------------------------------------------------------------------------------
CPU Timers : #calls| mean | total
--------------------------------------------------------------------------------
APEX MAIN : 1 1.57 1.57
int apex_preload_main(int, char **, char **) : 1 1.52 1.52
Initialization : 1 0.79 0.79
matrixTransposeCPUReference : 1 0.76 0.76
While Loop range : 10 0.07 0.72
hipMemcpy : 20 0.02 0.48
Validation Step : 10 0.02 0.19
Memcpy wrapper : 10 0.02 0.17
LaunchKernel wrapper : 10 0.01 0.05
hipDeviceSynchronize : 31 0.00 0.05
Memory Free : 1 0.01 0.01
hipLaunchKernel : 10 0.00 0.00
hipFree : 2 0.00 0.00
hipMalloc : 2 0.00 0.00
hipGetDeviceProperties : 1 0.00 0.00
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Total timers : 140
Writing: .//apex_tasktree.csv
[khuck@gilgamesh apex-tutorial]$ apex-treesummary.py --ascii --dot
Reading tasktree...
Read 22 rows
Found 0 ranks, with max graph node index of 21 and depth of 5
building common tree...
Rank 0 ...
1-> 1.568 - 100.000% [1] {min=1.568, max=1.568, mean=1.568, threads=1} APEX MAIN
1 |-> 1.519 - 96.898% [1] {min=1.519, max=1.519, mean=1.519, threads=1} int apex_preload_main(int, char **, char **)
1 | |-> 0.788 - 50.250% [1] {min=0.788, max=0.788, mean=0.788, threads=1} Initialization
1 | | |-> 0.763 - 48.660% [1] {min=0.763, max=0.763, mean=0.763, threads=1} matrixTransposeCPUReference
1 | | |-> 0.000 - 0.011% [2] {min=0.000, max=0.000, mean=0.000, threads=1} hipMalloc
1 | |-> 0.723 - 46.105% [10] {min=0.723, max=0.723, mean=0.072, threads=1} While Loop range
1 | | |-> 0.308 - 19.657% [10] {min=0.308, max=0.308, mean=0.031, threads=1} hipMemcpy
1 | | | |-> 0.012 - 0.756% [10] {min=0.012, max=0.012, mean=0.001, threads=1} GPU: CopyHostToDevice
1 | | |-> 0.189 - 12.050% [10] {min=0.189, max=0.189, mean=0.019, threads=1} Validation Step
1 | | |-> 0.173 - 11.048% [10] {min=0.173, max=0.173, mean=0.017, threads=1} Memcpy wrapper
1 | | | |-> 0.173 - 11.040% [10] {min=0.173, max=0.173, mean=0.017, threads=1} hipMemcpy
1 | | | | |-> 0.012 - 0.763% [10] {min=0.012, max=0.012, mean=0.001, threads=1} GPU: CopyDeviceToHost
1 | | | |-> 0.000 - 0.001% [10] {min=0.000, max=0.000, mean=0.000, threads=1} hipDeviceSynchronize
1 | | |-> 0.052 - 3.307% [10] {min=0.052, max=0.052, mean=0.005, threads=1} LaunchKernel wrapper
1 | | | |-> 0.051 - 3.245% [10] {min=0.051, max=0.051, mean=0.005, threads=1} hipDeviceSynchronize
1 | | | |-> 0.001 - 0.049% [10] {min=0.001, max=0.001, mean=0.000, threads=1} hipLaunchKernel
1 | | | | |-> 0.051 - 3.230% [10] {min=0.051, max=0.051, mean=0.005, threads=1} GPU: matrixTranspose(float*, float*, int)
1 | | |-> 0.000 - 0.003% [10] {min=0.000, max=0.000, mean=0.000, threads=1} hipDeviceSynchronize
1 | |-> 0.007 - 0.426% [1] {min=0.007, max=0.007, mean=0.007, threads=1} Memory Free
1 | | |-> 0.000 - 0.013% [2] {min=0.000, max=0.000, mean=0.000, threads=1} hipFree
1 | | |-> 0.000 - 0.000% [1] {min=0.000, max=0.000, mean=0.000, threads=1} hipDeviceSynchronize
1 | |-> 0.000 - 0.001% [1] {min=0.000, max=0.000, mean=0.000, threads=1} hipGetDeviceProperties
23 total graph nodes
Task tree also written to tasktree.txt.
Computing new stats...
Building dot file
done.
[khuck@gilgamesh apex-tutorial]$ dot -Tsvg -O tasktree.dot
Adding the --apex:hip_details
flag will enable capturing detailed stats about each kernel invocation:
[khuck@gilgamesh apex-tutorial]$ apex_exec --apex:hip --apex:hip_details ./build/bin/MatrixTranspose
___ ______ _______ __
/ _ \ | ___ \ ___\ \ / /
/ /_\ \| |_/ / |__ \ V /
| _ || __/| __| / \
| | | || | | |___/ /^\ \
\_| |_/\_| \____/\/ \/
APEX Version: v2.6.1-da0e52e-develop
Built on: 17:54:27 Feb 25 2023 (RelWithDebInfo)
C++ Language Standard version : 201402
Clang Compiler version : AMD Clang 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.2.0 22204 50d6d5d5b608d2abd6af44314abc6ad20036af3b)
Device name
System major 9
System minor 0
## Iteration (9) #################
PASSED!
## Iteration (8) #################
PASSED!
## Iteration (7) #################
PASSED!
## Iteration (6) #################
PASSED!
## Iteration (5) #################
PASSED!
## Iteration (4) #################
PASSED!
## Iteration (3) #################
PASSED!
## Iteration (2) #################
PASSED!
## Iteration (1) #################
PASSED!
## Iteration (0) #################
PASSED!
Start Date/Time: 26/02/2023 13:56:41
Elapsed time: 1.68719 seconds
Total processes detected: 1
HW Threads detected on rank 0: 96
Worker Threads observed on rank 0: 2
Available CPU time on rank 0: 3.37438 seconds
Available CPU time on all ranks: 3.37438 seconds
Counter : #samp | mean | max
--------------------------------------------------------------------------------
1 Minute Load average : 2 2.20 2.20
CPU Guest % : 1 0.00 0.00
CPU I/O Wait % : 1 0.00 0.00
CPU IRQ % : 1 0.02 0.02
CPU Idle % : 1 96.93 96.93
CPU Nice % : 1 0.00 0.00
CPU Steal % : 1 0.00 0.00
CPU System % : 1 0.46 0.46
CPU User % : 1 2.57 2.57
CPU soft IRQ % : 1 0.02 0.02
DRAM Energy : 1 0.00 0.00
GPU: Bytes Allocated: hipMalloc : 2 2.68e+08 2.68e+08
GPU: Bytes Freed: hipFree : 2 2.68e+08 2.68e+08
GPU: CopyDeviceToHost Bytes : 10 0.00 0.00
GPU: CopyHostToDevice Bytes : 10 0.00 0.00
GPU: Total Bytes Occupied on Device : 4 2.68e+08 5.37e+08
GPU: dimBlocks.X: matrixTranspose(float*, float*, i… : 10 4.00 4.00
GPU: dimBlocks.Y: matrixTranspose(float*, float*, i… : 10 4.00 4.00
GPU: dimBlocks.Z: matrixTranspose(float*, float*, i… : 10 1.00 1.00
GPU: numBlocks.X: matrixTranspose(float*, float*, i… : 10 2048.00 2048.00
GPU: numBlocks.Y: matrixTranspose(float*, float*, i… : 10 2048.00 2048.00
GPU: numBlocks.Z: matrixTranspose(float*, float*, i… : 10 1.00 1.00
GPU: sharedMemBytes: matrixTranspose(float*, float*… : 10 0.00 0.00
Package-0 Energy : 1 83.00 83.00
status:Threads : 2 3.00 4.00
status:VmData kB : 2 8.88e+05 1.36e+06
status:VmExe kB : 2 32.00 32.00
status:VmHWM kB : 2 3.99e+05 7.57e+05
status:VmLck kB : 2 0.00 0.00
status:VmLib kB : 2 1.37e+05 1.37e+05
status:VmPTE kB : 2 1078.00 1792.00
status:VmPeak kB : 2 5.74e+06 1.07e+07
status:VmPin kB : 2 0.00 0.00
status:VmRSS kB : 2 3.99e+05 7.57e+05
status:VmSize kB : 2 5.65e+06 1.06e+07
status:VmStk kB : 2 136.00 136.00
status:VmSwap kB : 2 0.00 0.00
status:nonvoluntary_ctxt_switches : 2 4.00 7.00
status:voluntary_ctxt_switches : 2 42.50 57.00
--------------------------------------------------------------------------------
GPU Timers : #calls| mean | total
--------------------------------------------------------------------------------
GPU: matrixTranspose(float*, float*, int) : 10 0.01 0.05
GPU: CopyDeviceToHost : 10 0.00 0.02
GPU: CopyHostToDevice : 10 0.00 0.01
--------------------------------------------------------------------------------
CPU Timers : #calls| mean | total
--------------------------------------------------------------------------------
APEX MAIN : 1 1.69 1.69
int apex_preload_main(int, char **, char **) : 1 1.64 1.64
While Loop range : 10 0.09 0.85
Initialization : 1 0.77 0.77
matrixTransposeCPUReference : 1 0.75 0.75
hipMemcpy : 20 0.02 0.48
Validation Step : 10 0.03 0.32
Memcpy wrapper : 10 0.02 0.17
LaunchKernel wrapper : 10 0.01 0.05
hipDeviceSynchronize : 31 0.00 0.05
Memory Free : 1 0.01 0.01
hipLaunchKernel : 10 0.00 0.00
hipFree : 2 0.00 0.00
hipMalloc : 2 0.00 0.00
hipGetDeviceProperties : 1 0.00 0.00
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Total timers : 140
Adding the --apex:monitor_gpu
flag will enable the ROCm-SMI support:
[khuck@gilgamesh apex-tutorial]$ apex_exec --apex:hip --apex:monitor_gpu ./build/bin/MatrixTranspose
___ ______ _______ __
/ _ \ | ___ \ ___\ \ / /
/ /_\ \| |_/ / |__ \ V /
| _ || __/| __| / \
| | | || | | |___/ /^\ \
\_| |_/\_| \____/\/ \/
APEX Version: v2.6.1-da0e52e-develop
Built on: 17:54:27 Feb 25 2023 (RelWithDebInfo)
C++ Language Standard version : 201402
Clang Compiler version : AMD Clang 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.2.0 22204 50d6d5d5b608d2abd6af44314abc6ad20036af3b)
Device name
System major 9
System minor 0
## Iteration (9) #################
PASSED!
## Iteration (8) #################
PASSED!
## Iteration (7) #################
PASSED!
## Iteration (6) #################
PASSED!
## Iteration (5) #################
PASSED!
## Iteration (4) #################
PASSED!
## Iteration (3) #################
PASSED!
## Iteration (2) #################
PASSED!
## Iteration (1) #################
PASSED!
## Iteration (0) #################
PASSED!
Start Date/Time: 26/02/2023 14:01:49
Elapsed time: 2.87411 seconds
Total processes detected: 1
HW Threads detected on rank 0: 96
Worker Threads observed on rank 0: 2
Available CPU time on rank 0: 5.74821 seconds
Available CPU time on all ranks: 5.74821 seconds
Counter : #samp | mean | max
--------------------------------------------------------------------------------
1 Minute Load average : 3 79.69 79.69
CPU Guest % : 2 0.00 0.00
CPU I/O Wait % : 2 0.00 0.00
CPU IRQ % : 2 0.53 0.53
CPU Idle % : 2 2.10 2.33
CPU Nice % : 2 0.00 0.00
CPU Steal % : 2 0.00 0.00
CPU System % : 2 0.76 0.88
CPU User % : 2 96.59 96.94
CPU soft IRQ % : 2 0.04 0.04
DRAM Energy : 2 4.00 4.00
GPU: Bytes Allocated: hipMalloc : 2 2.68e+08 2.68e+08
GPU: Bytes Freed: hipFree : 2 2.68e+08 2.68e+08
GPU: CopyDeviceToHost Bytes : 10 0.00 0.00
GPU: CopyHostToDevice Bytes : 10 0.00 0.00
GPU: Device 0 Device Busy (%) : 3 0.00 0.00
GPU: Device 0 Memory Busy (%) : 3 0.00 0.00
GPU: Device 0 Memory Reserved Pages : 3 0.00 0.00
GPU: Device 0 Memory Used, GTT (GB) : 3 0.01 0.01
GPU: Device 0 Memory Used, VRAM (GB) : 3 0.19 0.55
GPU: Device 0 Memory Used, Vis. VRAM (GB) : 3 0.19 0.55
GPU: Device 0 Power (W) : 3 43.67 47.00
GPU: Device 0 Temperature (C) : 3 33.00 33.00
GPU: Device 0 Voltage (V) : 3 0.79 0.79
GPU: Total Bytes Occupied on Device : 4 2.68e+08 5.37e+08
Package-0 Energy : 2 170.00 171.00
status:Threads : 3 3.33 4.00
status:VmData kB : 3 1.00e+06 1.36e+06
status:VmExe kB : 3 32.00 32.00
status:VmHWM kB : 3 5.46e+05 1.02e+06
status:VmLck kB : 3 0.00 0.00
status:VmLib kB : 3 1.37e+05 1.37e+05
status:VmPTE kB : 3 1366.67 2300.00
status:VmPeak kB : 3 7.23e+06 1.07e+07
status:VmPin kB : 3 0.00 0.00
status:VmRSS kB : 3 5.46e+05 1.02e+06
status:VmSize kB : 3 7.08e+06 1.06e+07
status:VmStk kB : 3 141.33 152.00
status:VmSwap kB : 3 0.00 0.00
status:nonvoluntary_ctxt_switches : 3 35.67 71.00
status:voluntary_ctxt_switches : 3 57.33 111.00
--------------------------------------------------------------------------------
GPU Timers : #calls| mean | total
--------------------------------------------------------------------------------
GPU: matrixTranspose(float*, float*, int) : 10 0.01 0.05
GPU: CopyDeviceToHost : 10 0.00 0.01
GPU: CopyHostToDevice : 10 0.00 0.01
--------------------------------------------------------------------------------
CPU Timers : #calls| mean | total
--------------------------------------------------------------------------------
APEX MAIN : 1 2.87 2.87
int apex_preload_main(int, char **, char **) : 1 2.79 2.79
While Loop range : 10 0.17 1.68
hipMemcpy : 20 0.06 1.21
Initialization : 1 1.10 1.10
matrixTransposeCPUReference : 1 1.06 1.06
Memcpy wrapper : 10 0.04 0.41
Validation Step : 10 0.04 0.39
LaunchKernel wrapper : 10 0.01 0.07
hipDeviceSynchronize : 31 0.00 0.06
Memory Free : 1 0.02 0.02
hipLaunchKernel : 10 0.00 0.01
hipMalloc : 2 0.00 0.00
hipFree : 2 0.00 0.00
hipGetDeviceProperties : 1 0.00 0.00
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Total timers : 140
Adding the --apex:gpu_memory
flag will enable memory consumption/leak tracking for all hipMalloc*
calls. At the end of execution, any/all memory leaks will be reported to the user in a text file.
[khuck@gilgamesh apex-tutorial]$ apex_exec --apex:hip --apex:gpu_memory ./build/bin/MatrixTranspose
___ ______ _______ __
/ _ \ | ___ \ ___\ \ / /
/ /_\ \| |_/ / |__ \ V /
| _ || __/| __| / \
| | | || | | |___/ /^\ \
\_| |_/\_| \____/\/ \/
APEX Version: v2.6.1-da0e52e-develop
Built on: 17:54:27 Feb 25 2023 (RelWithDebInfo)
C++ Language Standard version : 201402
Clang Compiler version : AMD Clang 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.2.0 22204 50d6d5d5b608d2abd6af44314abc6ad20036af3b)
Device name
System major 9
System minor 0
## Iteration (9) #################
PASSED!
## Iteration (8) #################
PASSED!
## Iteration (7) #################
PASSED!
## Iteration (6) #################
PASSED!
## Iteration (5) #################
PASSED!
## Iteration (4) #################
PASSED!
## Iteration (3) #################
PASSED!
## Iteration (2) #################
PASSED!
## Iteration (1) #################
PASSED!
## Iteration (0) #################
PASSED!
Start Date/Time: 26/02/2023 14:04:52
Elapsed time: 1.52486 seconds
Total processes detected: 1
HW Threads detected on rank 0: 96
Worker Threads observed on rank 0: 2
Available CPU time on rank 0: 3.04971 seconds
Available CPU time on all ranks: 3.04971 seconds
Counter : #samp | mean | max
--------------------------------------------------------------------------------
1 Minute Load average : 2 17.12 17.12
CPU Guest % : 1 0.00 0.00
CPU I/O Wait % : 1 0.00 0.00
CPU IRQ % : 1 0.02 0.02
CPU Idle % : 1 98.51 98.51
CPU Nice % : 1 0.07 0.07
CPU Steal % : 1 0.00 0.00
CPU System % : 1 0.39 0.39
CPU User % : 1 1.00 1.00
CPU soft IRQ % : 1 0.01 0.01
DRAM Energy : 1 0.00 0.00
GPU: Bytes Allocated: hipMalloc : 2 2.68e+08 2.68e+08
GPU: Bytes Freed: hipFree : 2 2.68e+08 2.68e+08
GPU: CopyDeviceToHost Bytes : 10 0.00 0.00
GPU: CopyHostToDevice Bytes : 10 0.00 0.00
GPU: Total Bytes Occupied on Device : 4 2.68e+08 5.37e+08
Package-0 Energy : 1 68.00 68.00
status:Threads : 2 3.00 4.00
status:VmData kB : 2 8.88e+05 1.36e+06
status:VmExe kB : 2 32.00 32.00
status:VmHWM kB : 2 3.99e+05 7.57e+05
status:VmLck kB : 2 0.00 0.00
status:VmLib kB : 2 1.37e+05 1.37e+05
status:VmPTE kB : 2 1074.00 1784.00
status:VmPeak kB : 2 5.74e+06 1.07e+07
status:VmPin kB : 2 0.00 0.00
status:VmRSS kB : 2 3.99e+05 7.57e+05
status:VmSize kB : 2 5.65e+06 1.06e+07
status:VmStk kB : 2 136.00 136.00
status:VmSwap kB : 2 0.00 0.00
status:nonvoluntary_ctxt_switches : 2 5.50 9.00
status:voluntary_ctxt_switches : 2 44.50 59.00
--------------------------------------------------------------------------------
GPU Timers : #calls| mean | total| allocs | (bytes) | frees | (bytes)
---------------------------------------------------------------------------------------------------------------------
GPU: matrixTranspose(float*, float*, int) [{/proc/s… : 10 0.01 0.05
GPU: CopyDeviceToHost : 10 0.00 0.01
GPU: CopyHostToDevice : 10 0.00 0.01
---------------------------------------------------------------------------------------------------------------------
CPU Timers : #calls| mean | total| allocs| (bytes)| frees | (bytes)
---------------------------------------------------------------------------------------------------------------------
APEX MAIN : 1 1.52 1.52 0 0 0 0
int apex_preload_main(int, char **, char **) : 1 1.48 1.48 0 0 0 0
Initialization : 1 0.80 0.80 2 5.37e+08 0 0
matrixTransposeCPUReference : 1 0.77 0.77 0 0 0 0
While Loop range : 10 0.07 0.68 0 0 0 0
hipMemcpy : 20 0.02 0.45 0 0 0 0
Validation Step : 10 0.02 0.18 0 0 0 0
Memcpy wrapper : 10 0.01 0.14 0 0 0 0
LaunchKernel wrapper : 10 0.01 0.05 0 0 0 0
hipDeviceSynchronize : 31 0.00 0.05 0 0 0 0
hipLaunchKernel : 10 0.00 0.00 0 0 0 0
Memory Free : 1 0.00 0.00 0 0 0 0
hipMalloc : 2 0.00 0.00 0 0 0 0
hipFree : 2 0.00 0.00 0 0 2 5.37e+08
hipGetDeviceProperties : 1 0.00 0.00 0 0 0 0
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
Total timers : 140
APEX Memory Report: (see memory_report.0.txt)
sorting 0 leaks by size...
Aggregating leaks by task and writing report...
Reported 0 'actual' leaks.
Expect false positives if memory was freed after exit.
APEX tutorial, © Copyright 2023, University of Oregon. For more information on APEX, see https://github.com/UO-OACISS/apex