Stream stats #71

JRPan · 2024-05-14T03:54:39Z

Collect stats based on streams.

99% Ready. Creating a PR to trigger GitHub Actions to generate correlations.

Consult @JRPan before merging.

code changes for adding per-stream status.

Added optional arg to cache_stats::print_stats, cache_stats::print_fail_stats and their upstream functions. When streamID is specified, print stats from that stream. When not specified, print all stats. NOTE: current implementation depending on streamid never equals -1

Per-stream stats tracking feature

Multi-Stream stats

mattsinc · 2024-05-14T04:28:35Z

@JRPan let me know if you need something from UW here

mattsinc · 2024-06-22T21:58:11Z

@JRPan just checking again if you need anything from us?

JRPan · 2024-08-09T22:22:27Z

Sorry for the delay, this will be included in the next major release. accel-sim/accel-sim-framework#317

JRPan · 2024-08-13T20:15:33Z

@FJShen Lets try get this merged asap

FJShen · 2024-08-14T16:49:37Z

I need some more description of this PR. E.g., What feature does it add? What bug does it fix? Does "stream" refer to CUDA stream?

mattsinc · 2024-08-14T17:29:35Z

I need some more description of this PR. E.g., What feature does it add? What bug does it fix? Does "stream" refer to CUDA stream?

@FJShen : @JRPan took the original commit my students submitted to Accel-Sim and packaged it into this. Yes, "stream" means "CUDA stream". We have a full tech report that describes the support we added: https://arxiv.org/abs/2304.11136

Ultimately, what these commits do is it adds support to Accel-Sim/GPGPU-Sim to track statistics at a per-CUDA stream level.

JRPan · 2024-08-14T18:32:12Z

Thank you, Matt!

@FJShen
Basically, an extra dimension is added for all stats. Before, all stats are collected and aggregated. In this commit, stats are collected on a per-stream basis. Within each stream, stats are aggregated just like before. If there is only 1 stream, this should behave exactly as before. In multi-stream and concurrent enabled, the old stats model does not make sense because the kernels are running concurrently. The stats cannot be separated between the kernels. This is to address that. I needed this in my paper as well. Justin's (Shichen) version here is more robust and better than mine.

For each kernel, a stream id is always given (0 by default). The stats container is usually a std::map<stream_id, stat>. At the end of the kernel, print_stat prints the stat based on the stream id.

This mainly affects cache stats (L1 and L2), and cycles. Other stats like occupancy, DRAM, and IPC are not changed. These stats does not make sense to be separated.

I correlated GPU ubench and rodinia. Functionally, this is correct. I assigned you as the reviewer to check more about the C++ side of this :)
Christin used this for his work as well. I worked with him and validated everything.

I did not correlate this with any workload that runs with concurrent enabled, tho. Maybe we should write a ubench.

mattsinc · 2024-08-14T21:19:15Z

Thank you, Matt!

@FJShen Basically, an extra dimension is added for all stats. Before, all stats are collected and aggregated. In this commit, stats are collected on a per-stream basis. Within each stream, stats are aggregated just like before. If there is only 1 stream, this should behave exactly as before. In multi-stream and concurrent enabled, the old stats model does not make sense because the kernels are running concurrently. The stats cannot be separated between the kernels. This is to address that. I needed this in my paper as well. Justin's (Shichen) version here is more robust and better than mine.

For each kernel, a stream id is always given (0 by default). The stats container is usually a std::map<stream_id, stat>. At the end of the kernel, print_stat prints the stat based on the stream id.

This mainly affects cache stats (L1 and L2), and cycles. Other stats like occupancy, DRAM, and IPC are not changed. These stats does not make sense to be separated.

I correlated GPU ubench and rodinia. Functionally, this is correct. I assigned you as the reviewer to check more about the C++ side of this :) Christin used this for his work as well. I worked with him and validated everything.

I did not correlate this with any workload that runs with concurrent enabled, tho. Maybe we should write a ubench.

I believe we did correlate with concurrent streams in the report I linked, FWIW.

configs/tested-cfgs/SM7_TITANV/gpgpusim.config

FJShen · 2024-08-15T05:11:35Z

I found some possibly erroneous code (and sub-optimal coding practice). Please kindly address the reviews above.

JRPan · 2024-08-15T18:20:20Z

I only see one comment related to the config file. Is this correct?

src/gpgpu-sim/gpu-sim.cc

src/gpgpu-sim/gpu-cache.h

src/gpgpu-sim/gpu-cache.cc

src/gpgpu-sim/mem_fetch.h

src/stream_manager.cc

FJShen · 2024-08-15T20:30:01Z

@JRPan I forgot to "submit" the reviews. I hope they are now visible.

FJShen

Looks good to me.

* Temp commit for Justin and Cassie to sync on code changes for adding per-stream status. * Resolved compile errors. * Removed redundant parameter * Passed cuda_stream_id from accelsim to gpgpusim * Cleaned up unused changes * Changed vector to map, having operator problems. * StreamID defaults to zero * Implemented streams to inc_stats and so on * Fixed TOTAL_ACCESS counts * Implemented GLOBAL_TIMER. * Fixed m_shader->get_kernel SEGFAULT issue in shader.cc. * Use warp_init to track streamID instead of issue_warp * Removed temp debug print * Modified cache_stats to only print data from latest finished stream Added optional arg to cache_stats::print_stats, cache_stats::print_fail_stats and their upstream functions. When streamID is specified, print stats from that stream. When not specified, print all stats. NOTE: current implementation depending on streamid never equals -1 * Removed default arg values of streamID * modified constructor of mem_fetch to pass in streamID * changed get_streamid to get_streamID * Added TODO to gpgpusim_entrypoint.cc and power_stat.cc * Only collect power stats when enabled * print last finished stream in PTX mode using last_streamID * take out additional printf * Add a field to baseline cache to indicate cache level * save gpu object in cache * Print stream ID only once per kernel * rm test print * use -1 for default stream id * cleanup debug prints * remove GLOABL_TIMER * Automated clang-format * Should be correct to print everything in power model * addressing concerns & errors * Automated clang-format * add m_stats_pw in operator+ * Automated Format --------- Co-authored-by: Justin Qiao <[email protected]> Co-authored-by: Justin Qiao <[email protected]> Co-authored-by: Tim Rogers <[email protected]> Co-authored-by: JRPan <[email protected]> Co-authored-by: purdue-jenkins <[email protected]>

ShichenQiao and others added 25 commits October 12, 2022 23:30

Temp commit for Justin and Cassie to sync on

5173569

code changes for adding per-stream status.

Resolved compile errors.

dd72a42

Removed redundant parameter

a96006b

Passed cuda_stream_id from accelsim to gpgpusim

d0a624d

Cleaned up unused changes

993d4ce

Changed vector to map, having operator problems.

530b8a7

StreamID defaults to zero

b04221e

Implemented streams to inc_stats and so on

1747830

Fixed TOTAL_ACCESS counts

dbec4be

Implemented GLOBAL_TIMER.

8b9ca34

Fixed m_shader->get_kernel SEGFAULT issue in shader.cc.

3cbcaea

Use warp_init to track streamID instead of issue_warp

65c3410

Removed temp debug print

dea63b1

Removed default arg values of streamID

e8cc02d

modified constructor of mem_fetch to pass in streamID

dc67564

changed get_streamid to get_streamID

57751eb

Added TODO to gpgpusim_entrypoint.cc and power_stat.cc

fcdb666

Merge pull request #47 from ShichenQiao/dev

3db49ea

Per-stream stats tracking feature

Only collect power stats when enabled

c5fe72a

print last finished stream in PTX mode using last_streamID

5d53c04

take out additional printf

465ce24

Add a field to baseline cache to indicate cache level

7c0f074

Merge pull request #51 from JRPan/stream-stats

bae729c

Multi-Stream stats

save gpu object in cache

54af36f

Merge branch 'dev' into stream-stats

e5a3825

JRPan mentioned this pull request Jul 7, 2024

Accel-Sim New Release accel-sim/accel-sim-framework#317

Open

32 tasks

Print stream ID only once per kernel

c74d934

rm test print

9531c73

JRPan and others added 6 commits August 9, 2024 15:22

Merge branch 'dev' into stream-stats

1772306

use -1 for default stream id

68765f4

cleanup debug prints

a3da652

remove GLOABL_TIMER

5236525

Automated clang-format

e0de0e5

Should be correct to print everything in power model

a78b5b0

JRPan requested a review from FJShen August 13, 2024 18:49

FJShen reviewed Aug 15, 2024

View reviewed changes

configs/tested-cfgs/SM7_TITANV/gpgpusim.config Outdated Show resolved Hide resolved

FJShen requested changes Aug 15, 2024

View reviewed changes

JRPan and others added 5 commits August 15, 2024 16:58

addressing concerns & errors

3355617

Automated clang-format

ae559cd

Merge branch 'dev' into stream-stats

0236b5f

add m_stats_pw in operator+

31395b5

Automated Format

30c27cb

FJShen approved these changes Aug 21, 2024

View reviewed changes

JRPan added this pull request to the merge queue Aug 21, 2024

Merged via the queue into dev with commit 38b4df5 Aug 21, 2024
22 checks passed

Stream stats #71

Stream stats #71

Uh oh!

Conversation

JRPan commented May 14, 2024

Uh oh!

mattsinc commented May 14, 2024

Uh oh!

mattsinc commented Jun 22, 2024

Uh oh!

JRPan commented Aug 9, 2024

Uh oh!

JRPan commented Aug 13, 2024

Uh oh!

FJShen commented Aug 14, 2024

Uh oh!

mattsinc commented Aug 14, 2024

Uh oh!

JRPan commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattsinc commented Aug 14, 2024

Uh oh!

Uh oh!

FJShen commented Aug 15, 2024

Uh oh!

JRPan commented Aug 15, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FJShen commented Aug 15, 2024

Uh oh!

FJShen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JRPan commented Aug 14, 2024 •

edited

Loading