-
Notifications
You must be signed in to change notification settings - Fork 42
Description
The following describes the set of tasks and their context for supporting reconfiguration of DMAs.
OPTION 1 : Circular DmaCpyNd -> Non-circular DmaCpyNd
This means all conversions around Circular DmaCpyNdOp should be based around Non-circular DmaCpyNd op.
Circular DmaCpyNdOp runs infinitely while Non-circular DmaCpyNdOp runs once for any given configuration; and the latter can be then made to run for some other configuration. This is the reconfiguration option we're currently choosing to go ahead with because of an important "UNKNOWN" factor in option 2 (mentioned towards the end).
Current state of the IR so far with option 1 : IR log
Phoenix test's log : reprogram_runtime_zero_phoenix.txt
OPTION 2 : Include a new op to reset BD IDs in Memtile_Dma
Context for
numIters:
In lower-to-aie pass numIters essentially gets hold of the canonicalized repeat counts needed to execute a DMA block.
This when multiplied with total bufferOps gives the total number of DMA blocks that are going to be invoked as part of a DMA chain started by aie.dma_start.
-- (HP) Create a separate/common utility that gets hold of numIters : we would want to make use of this information in one of the earlier pass and then again at lower-to-aie.
-- (LP) Perhaps create an attribute as part of amdaie.npu.circular_dma_cpy_nd to make use of this information later at lower-to-aie instead of re-invoking/recomputing the same information.
Context of
aie.memtile_dma+ Transcation Binary: (TO BE USED IN OPTION 1 AS WELL)
This is essentially used to create DMA operations pertaining to L2 to/from L1. So it contains DMA operation blocks marked with aie.dma_start(MM2S/S2MM) as the starting of the corresponding to/from transfer.
Therefore based on previous context, total DMA blocks under each aie.dma_start is going to be numIters * total bufferOps (2 since we use double buffering).
The concept of "controlcode" in the IR exists only in amdaie dialect as amdaie.controlcode that contains circular/non-circular DmaCpyNd ops to explain the DMA operations for L3<->L2<->L1 data transfers.
Using controlcode-to-transaction pass we create/store Transaction binary using amdaie.controlcode datas.
-- (HP) Create a amdaie.memtile_dma, amdaie.dma_start, amdaie.dma_bd and amdaie.next_bd op.
-- (HP) Create amdaie.memtile_dma similar to how aie.memtile_dma is created.
-- (HP) Have a 1:1 mapping from amdaie.memtile_dma to aie.memtile_dma as part of lower-to-aie. (This therefore means that at this stage we can perhaps rename createDMABlocks in lower-to-aie as createShimDMABlocks)
Context of
assign-bd-ids+ introduction ofbd_id_reset(discuss name) op:
This goes through the DMA blocks in aie.memtile_dma and using its algorithm assigns BD ID incrementally.
Since we know that on memtile we have 24 BD IDs available AND a given aie.dma_start DMA chain is governed by numIters * total bufferOps (2) - we need a way to ensure either :-
numIters*total bufferOps (2)of a particular DMA chain (or combination of previous DMA chain of a previousaie.dma_start) for a particularchannelshould NOT exceed24.- OR we should introduce/support a new operation
bd_id_reset(discuss naming) that, for a particulartileandchannelwill RESET the BD IDs to be used again.
We are going ahead with point 2 above.
-- (HP) Create bd_id_reset op.
-- (HP) Create a pass and device an algorithm through which bd_id_reset op can be introduced inside a DMA block when BD IDs ought to get exhausted (UNKNOWN but can perhaps be tackled based on how AIE dialect is handling it ?)
Eg:
BEFORE:
============================================
%3 = aie.dma_start(MM2S, 1, ^bb10, ^bb26)
^bb10: // 2 preds: ^bb9, ^bb25
aie.use_lock(%lock_0_1_9, AcquireGreaterEqual, 1)
aie.dma_bd(%buffer_0_1_6 ...
aie.next_bd ^bb11
^bb11: // pred: ^bb10
aie.dma_bd(%buffer_0_1_6 ...
aie.next_bd ^bb12
^bb12: // pred: ^bb11 <---- ASSUME BLOCK ^bb12 IS WHERE BD ID GETS EXHAUSTED
aie.dma_bd(%buffer_0_1_6 ...
aie.next_bd ^bb13
AFTER:
=============================================
%3 = aie.dma_start(MM2S, 1, ^bb10, ^bb26)
^bb10: // 2 preds: ^bb9, ^bb25
aie.use_lock(%lock_0_1_9, AcquireGreaterEqual, 1)
aie.dma_bd(%buffer_0_1_6 ...
aie.next_bd ^bb11
^bb11: // pred: ^bb10 <---- WE INSERT BD_ID_RESET IN BLOCK ^bb11
aie.dma_bd(%buffer_0_1_6 ...
bd_id_reset(/*tile=*/%tile_0_1, /*channel=*/1) <--- ALTHOUGH TILE IS SAME AS MEMTILE BUT
PERHAPS THIS MAKES IT MORE DESCRIPTIVE
aie.next_bd ^bb12
^bb12: // pred: ^bb11
aie.dma_bd(%buffer_0_1_6 ...
aie.next_bd ^bb13
-- (HP) Support bd_id_reset as part of assign-bd-ids (and possibly in the lower stack, if needed *) <----- UNKNOWN
Glossary
HP : High Priority
LP : Low Priority