Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSU] Inference FSU with Shared memory #2969

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

DonghakPark
Copy link
Member

[FSU] Inference FSU with Shared memory

To reduce memory usage during inference by utilizing FSU, and to minimize speed degradation by performing loading during forwarding, changed to use shared memory. and ensure the existing swap in training mode is also performed normally.

Commit 1 : [FSU] Update FSU Forwarding (Load) Logis

  • Change FSU Forwarding Logic ( Load weight with look ahead)

Commit 2 : [FSU] Update swap device & cache element

  • Update Swap Device's function to Support FSU (Inference)

Commit 3 : [FSU] Update FSU mem allocate Logic

  • Update Memory Allocation to Shared Mem

Commit 4 : [FSU] add FSU file offset info

  • Add Weight bin file offset that can pass to swap device

Commit 5 : [FSU] Apply Shared Mem & FSU

  • Update Logic to support both Inference Mode & Training Mode

This PR was include #2957 #2927 #2949, So i will close previous PRs

DonghakPark and others added 4 commits February 25, 2025 15:35
Update FSU forwarding logic
- FSU will handle look ahead tensor inside of pool
- so we don't need to call Loadtensor for f + i

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Add memory ptr for allocate shared mem
- add mem_ptr
- add unmap - array for manage unmapped ptr

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
I have changed the method from using dynamic memory allocation to using static memory allocation.
In order to prevent multiple frees, I added a map to check whether the mem_address has already been processed. Previously, memory was allocated through buf, but now it is being allocated directly.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Co-authored-by: jijoong.moon <[email protected]>
Signed-off-by: Donghak PARK <[email protected]>
make neuralnet can pass path to the swap_device & weight offset (file offset)
it can make calculate weight file's offset

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Co-authored-by: hyeonseok <[email protected]>
Signed-off-by: Donghak PARK <[email protected]>
Apply Shared mem & FSU
- when inference mode : read from weight bin ( weight offset )
- when train mode : same logic with swap

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Fix Unittest Fail bug at Training Case Swap
- There are some issue on PutBuffer that can not free ptr

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Apply clang format at changed File

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Update FSU Unitter
- For now, we should set our weight & input size as pagesize * N
- For later i will add Page Align Algorithm

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant