Skip to content

Stream stats #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Aug 21, 2024
Merged

Stream stats #71

merged 39 commits into from
Aug 21, 2024

Conversation

JRPan
Copy link

@JRPan JRPan commented May 14, 2024

Collect stats based on streams.

99% Ready. Creating a PR to trigger GitHub Actions to generate correlations.

Consult @JRPan before merging.

ShichenQiao and others added 25 commits October 12, 2022 23:30
code changes for adding per-stream status.
Added optional arg to cache_stats::print_stats, cache_stats::print_fail_stats and their upstream functions. When streamID is specified, print stats
from that stream. When not specified, print all stats.

NOTE: current implementation depending on streamid never equals -1
Per-stream stats tracking feature
@mattsinc
Copy link

@JRPan let me know if you need something from UW here

@mattsinc
Copy link

@JRPan just checking again if you need anything from us?

@JRPan
Copy link
Author

JRPan commented Aug 9, 2024

Sorry for the delay, this will be included in the next major release. accel-sim/accel-sim-framework#317

@JRPan JRPan requested a review from FJShen August 13, 2024 18:49
@JRPan
Copy link
Author

JRPan commented Aug 13, 2024

@FJShen Lets try get this merged asap

@FJShen
Copy link

FJShen commented Aug 14, 2024

I need some more description of this PR. E.g., What feature does it add? What bug does it fix? Does "stream" refer to CUDA stream?

@mattsinc
Copy link

I need some more description of this PR. E.g., What feature does it add? What bug does it fix? Does "stream" refer to CUDA stream?

@FJShen : @JRPan took the original commit my students submitted to Accel-Sim and packaged it into this. Yes, "stream" means "CUDA stream". We have a full tech report that describes the support we added: https://arxiv.org/abs/2304.11136

Ultimately, what these commits do is it adds support to Accel-Sim/GPGPU-Sim to track statistics at a per-CUDA stream level.

@JRPan
Copy link
Author

JRPan commented Aug 14, 2024

Thank you, Matt!

@FJShen
Basically, an extra dimension is added for all stats. Before, all stats are collected and aggregated. In this commit, stats are collected on a per-stream basis. Within each stream, stats are aggregated just like before. If there is only 1 stream, this should behave exactly as before. In multi-stream and concurrent enabled, the old stats model does not make sense because the kernels are running concurrently. The stats cannot be separated between the kernels. This is to address that. I needed this in my paper as well. Justin's (Shichen) version here is more robust and better than mine.

For each kernel, a stream id is always given (0 by default). The stats container is usually a std::map<stream_id, stat>. At the end of the kernel, print_stat prints the stat based on the stream id.

This mainly affects cache stats (L1 and L2), and cycles. Other stats like occupancy, DRAM, and IPC are not changed. These stats does not make sense to be separated.

I correlated GPU ubench and rodinia. Functionally, this is correct. I assigned you as the reviewer to check more about the C++ side of this :)
Christin used this for his work as well. I worked with him and validated everything.

I did not correlate this with any workload that runs with concurrent enabled, tho. Maybe we should write a ubench.

@mattsinc
Copy link

Thank you, Matt!

@FJShen Basically, an extra dimension is added for all stats. Before, all stats are collected and aggregated. In this commit, stats are collected on a per-stream basis. Within each stream, stats are aggregated just like before. If there is only 1 stream, this should behave exactly as before. In multi-stream and concurrent enabled, the old stats model does not make sense because the kernels are running concurrently. The stats cannot be separated between the kernels. This is to address that. I needed this in my paper as well. Justin's (Shichen) version here is more robust and better than mine.

For each kernel, a stream id is always given (0 by default). The stats container is usually a std::map<stream_id, stat>. At the end of the kernel, print_stat prints the stat based on the stream id.

This mainly affects cache stats (L1 and L2), and cycles. Other stats like occupancy, DRAM, and IPC are not changed. These stats does not make sense to be separated.

I correlated GPU ubench and rodinia. Functionally, this is correct. I assigned you as the reviewer to check more about the C++ side of this :) Christin used this for his work as well. I worked with him and validated everything.

I did not correlate this with any workload that runs with concurrent enabled, tho. Maybe we should write a ubench.

I believe we did correlate with concurrent streams in the report I linked, FWIW.

@FJShen
Copy link

FJShen commented Aug 15, 2024

I found some possibly erroneous code (and sub-optimal coding practice). Please kindly address the reviews above.

@JRPan
Copy link
Author

JRPan commented Aug 15, 2024

I only see one comment related to the config file. Is this correct?

@FJShen
Copy link

FJShen commented Aug 15, 2024

@JRPan I forgot to "submit" the reviews. I hope they are now visible.

Copy link

@FJShen FJShen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@JRPan JRPan added this pull request to the merge queue Aug 21, 2024
Merged via the queue into dev with commit 38b4df5 Aug 21, 2024
22 checks passed
JRPan added a commit to JRPan/gpgpu-sim_distribution that referenced this pull request Sep 30, 2024
* Temp commit for Justin and Cassie to sync on
code changes for adding per-stream status.

* Resolved compile errors.

* Removed redundant parameter

* Passed cuda_stream_id from accelsim to gpgpusim

* Cleaned up unused changes

* Changed vector to map, having operator problems.

* StreamID defaults to zero

* Implemented streams to inc_stats and so on

* Fixed TOTAL_ACCESS counts

* Implemented GLOBAL_TIMER.

* Fixed m_shader->get_kernel SEGFAULT issue in shader.cc.

* Use warp_init to track streamID instead of issue_warp

* Removed temp debug print

* Modified cache_stats to only print data from latest finished stream

Added optional arg to cache_stats::print_stats, cache_stats::print_fail_stats and their upstream functions. When streamID is specified, print stats
from that stream. When not specified, print all stats.

NOTE: current implementation depending on streamid never equals -1

* Removed default arg values of streamID

* modified constructor of mem_fetch to pass in streamID

* changed get_streamid to get_streamID

* Added TODO to gpgpusim_entrypoint.cc and power_stat.cc

* Only collect power stats when enabled

* print last finished stream in PTX mode using last_streamID

* take out additional printf

* Add a field to baseline cache to indicate cache level

* save gpu object in cache

* Print stream ID only once per kernel

* rm test print

* use -1 for default stream id

* cleanup debug prints

* remove GLOABL_TIMER

* Automated clang-format

* Should be correct to print everything in power model

* addressing concerns & errors

* Automated clang-format

* add m_stats_pw in operator+

* Automated Format

---------

Co-authored-by: Justin Qiao <[email protected]>
Co-authored-by: Justin Qiao <[email protected]>
Co-authored-by: Tim Rogers <[email protected]>
Co-authored-by: JRPan <[email protected]>
Co-authored-by: purdue-jenkins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants