Revised Unified Virtual Memory(UVM) support #107

yechen3 · 2025-03-07T11:03:24Z

Same to PR#85, but this has a cleaner commit history.

- Full functional implementation; Performance/time simulation pending - CUDA APIs for cudaMallocManaged, cudaMemPrefetchAsync, and cudaDeviceSynchronize - Data-structures to maintain mapping between CPU and GPU side memory during cudaMallocManaged - Track mapping during cudaSetupArgument to override CPU memory pointer by GPU side memory pointer

- Same benchmark with cudaMalloc and cudaMemcpy, i.e., old APIs - Same benchmark with cudaMallocManaged, i.e., new UVM API

…aged set

- Adding PCI-e latency as part of config (value subject to change based on architecture) - Parsing logic of latency and setting clock domain - Adding valid flag as part of page table implementation

This reverts commit f708eb3.

…Zhang

… gddr size constraint

- TLB look up - page table access/walk latency - multi-lane bidirectional PCI-E with far fetch latency

- Don't process if warp instruction's access queue is empty

- Change input size for bfs, nw, pathfinder, srad_v2 etc. to finish in reasonable time - Add pathfinder benchmark for managed code - Fixing BFS managed code

…ze, and a bollean flag to denote whether data is copied from cpu to gpu on first kernel launch

…device memory

…llocations from GPU to CPU

… queue is ready to complete. Determine based on whether dispatched warps from all SMs are stalling for PCI-E transfer and no progress can be made by any component of the simulator.

…ion write back or device synchronize check for allocations which were copied at the first place during kernel launch from CPU to GPU

…nslated to managed code

- kernel by kernel, data structure wise basic block access - need to implement actual detection logic and policy engine

- remaining policy engine

- from experiments on smart adaptive runtime with dynamic migration pattern detection and memory management - along with validation results

tgrogers

@yechen3 - one weird side-effect of the integration changes is that you ended up re-adding a bunch of files that have since been deleted. Jenkinsfile, bitbucket*, all the old configs in the old folders outside of tested and deprecated.

Please clean this up. Make sure you are only adding files that are actually added for this to work. Otherwise, the directory structure should be the same as the mainline dev.

yechen3 · 2025-03-17T23:56:51Z

@tgrogers Got it. I've already cleaned things up, so everything should be in order now. Let me know if there's anything else that needs attention.

DebashisGanguly and others added 30 commits February 13, 2018 20:54

Initial commit

cc44bee

Starting point is the dev branch of original gpgpu-sim distribution.

2e65599

Making GPGPU-Sim work with CUDA 8.0

afe053a

Adding benchmark to add two vectors

17e99e5

- Same benchmark with cudaMalloc and cudaMemcpy, i.e., old APIs - Same benchmark with cudaMallocManaged, i.e., new UVM API

Restructuring benchmark

b3355c2

Adding few mini benchmarks from rodinia suit, need to add them to Man…

f5f77d4

…aged set

Adding input data files for the benchmarks

7d14fcd

backprop, kmeans and pathfinder are missing from Managed

161107a

Timing Simulation: Part 1

f708eb3

- Adding PCI-e latency as part of config (value subject to change based on architecture) - Parsing logic of latency and setting clock domain - Adding valid flag as part of page table implementation

Revert "Timing Simulation: Part 1"

84ab23e

This reverts commit f708eb3.

Syncing with dev branch from Dec8, 2017 to today - on behalf of Ziyu …

02bb217

…Zhang

Basic methods to implement page table

d03c23a

Differentiate managed and unmanaged page allocations and also enforce…

fa4babb

… gddr size constraint

Timing simulation for on-demand paging for UVM

373e5b2

- TLB look up - page table access/walk latency - multi-lane bidirectional PCI-E with far fetch latency

Page eviction with LRU implementation

6115ba2

TLB size restriction and LRU replacement of TLB entries

b09ac52

Bug Fix: Unmanaged benchmarks stalls forever

1c86565

- Don't process if warp instruction's access queue is empty

Bug Fix: TLB invalidation was not registered

8bd2a73

Benchmark changes:

392fef0

- Change input size for bfs, nw, pathfinder, srad_v2 etc. to finish in reasonable time - Add pathfinder benchmark for managed code - Fixing BFS managed code

Updating managed allocation to hold gpu memory pointer, allocation si…

7f09ae7

…ze, and a bollean flag to denote whether data is copied from cpu to gpu on first kernel launch

Copy data initialized by CPU to GPU only at the first kernel launch

0f4af65

Writing back the data of a dirty page to the host upon eviction from …

7fc5791

…device memory

Bug fixes and enabling output print in benchmarks

65fd105

During device synchronization copy only dirty pages for all managed a…

c65b185

…llocations from GPU to CPU

Jump the simulator clock to the future when requests waiting in PCI-E…

50ba173

… queue is ready to complete. Determine based on whether dispatched warps from all SMs are stalling for PCI-E transfer and no progress can be made by any component of the simulator.

Bug Fix: while copying back dirty pages form GPU to CPU on page evict…

12d9ad0

…ion write back or device synchronize check for allocations which were copied at the first place during kernel launch from CPU to GPU

Managed backprop benchmark

cdc3b88

Bug Fix: jump cycle code

fbbd3f9

Removing kmeans benchmark as it uses texture memory and cannot be tra…

5c256c5

…nslated to managed code

DebashisGanguly and others added 21 commits June 15, 2020 19:00

Adding 3 new managed benchmarks

3a69c3d

Data structures and primitives for access pattern detection

fd6b2a2

- kernel by kernel, data structure wise basic block access - need to implement actual detection logic and policy engine

Complete implementation of pattern detection

e0a9af5

- remaining policy engine

Adding working set details for new benchmarks

985f920

Removing print statement and test code

2b36e3d

Removing unnecessary assert

967dd1b

Enabling policy making and adaptive memory management

32a5e88

Fixing minor index error with srad benchmark

2a4a8d0

Adding results (logs and extracted spreadsheet)

5601037

- from experiments on smart adaptive runtime with dynamic migration pattern detection and memory management - along with validation results

Updating paper reference

cfd6f10

Misc changes: 1. update config, 2. removal of validation results

b80fb10

Minor fix to run on aalp cluster

778dbff

Formatting

25561e9

Remove content of lib

448ddfb

Updating README with initial DATE'21 acceptance notification

bafab5d

Fixed include sequence to run on cluster

698973a

Merge branch 'DebashisGanguly:master' into master

5f358e1

A big commit that works with latest gpgpu-sim 4.0

6b30568

Remove unnessary benchmark folder

76a82f0

Merge remote-tracking branch 'uvmsmart/master' into merge-uvmsmart

b77cd21

Fix some bugs after merge

863b2d9

yechen3 requested a review from tgrogers March 7, 2025 11:03

Resolve merge conflicts in configs folder

1a0c1e7

tgrogers requested changes Mar 16, 2025

View reviewed changes

Remove stale files

70f0c29

yechen3 added 3 commits April 23, 2025 15:55

Change to UVA mode (always hit in TLB)

81e4d21

Fix the cicurlar dependency issue

0500eab

Merge branch 'dev' into uvmsmart

514c71b

yechen3 requested a review from tgrogers April 23, 2025 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revised Unified Virtual Memory(UVM) support #107

Revised Unified Virtual Memory(UVM) support #107

Uh oh!

yechen3 commented Mar 7, 2025

Uh oh!

tgrogers left a comment

Uh oh!

yechen3 commented Mar 17, 2025

Uh oh!

Uh oh!

Revised Unified Virtual Memory(UVM) support #107

Are you sure you want to change the base?

Revised Unified Virtual Memory(UVM) support #107

Uh oh!

Conversation

yechen3 commented Mar 7, 2025

Uh oh!

tgrogers left a comment

Choose a reason for hiding this comment

Uh oh!

yechen3 commented Mar 17, 2025

Uh oh!

Uh oh!