-
Notifications
You must be signed in to change notification settings - Fork 83
Fixes for atomic coalescing at L1, correlated with QV100 hardware #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Sub core & some minor bug fix
- best case coalescing of atomic operations - full CAM based search - integrated with DPRINTF with ATOMICS Flag
- replaced full CAM coalescing with common case coalescing - correlated with QV100 GPU
- added ATOMICS_DETAIL trace flag - made ATOMICS prints concise - disabled tracing and restored default trace flags in QV100 tested-cfgs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correlation of atomic ubenches does not look significantly better compared to the latest (as of this review) dev branch of GPGPU-sim, atomic_add_bw_diverge still off by a lot. Code makes sense though.
Still waiting for the feedback of the other reviewers, and the original author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been lingering a long time.
@abhaumick , @cesar-avalos3 can you update me on the current state here?
Several years ago, when we checking the accuracy of the atomics we had a set of uBenches that tested latency and BW when warps were updating both the same address and different addresses. From what I remember the results were a disaster.
@abhaumick spent a non-trivial amount of time trying to fix this. And I think this PR is the results of that time.
Do we have uBenches in the regressions that test the atomics? I certainly remember a long email chain about our volta atomic correlations.
I think this code was a good fix. |
fixed atomic coalescing at L1
warp_inst_t::memory_coalescing_arch_atomic()
modified trace.h
DPRINTF_RAW()
to allow prints without gpu_sim_cyclegpu_sim_cycle
orgpu_tot_sim_cycle
variables used by DPRINTFadded config option
gpgpu_shmem_atomic_warp_parts
added trace streams
resolves mismatch reported in Meeting Minutes -- 3/20/20