Skip to content

Support reconfiguration of DMAs #1287

@Abhishek-Varma

Description

@Abhishek-Varma

The following describes the set of tasks and their context for supporting reconfiguration of DMAs.

OPTION 1 : Circular DmaCpyNd -> Non-circular DmaCpyNd

This means all conversions around Circular DmaCpyNdOp should be based around Non-circular DmaCpyNd op.
Circular DmaCpyNdOp runs infinitely while Non-circular DmaCpyNdOp runs once for any given configuration; and the latter can be then made to run for some other configuration. This is the reconfiguration option we're currently choosing to go ahead with because of an important "UNKNOWN" factor in option 2 (mentioned towards the end).

Current state of the IR so far with option 1 : IR log

Phoenix test's log : reprogram_runtime_zero_phoenix.txt

OPTION 2 : Include a new op to reset BD IDs in Memtile_Dma

Context for numIters:

In lower-to-aie pass numIters essentially gets hold of the canonicalized repeat counts needed to execute a DMA block.
This when multiplied with total bufferOps gives the total number of DMA blocks that are going to be invoked as part of a DMA chain started by aie.dma_start.

-- (HP) Create a separate/common utility that gets hold of numIters : we would want to make use of this information in one of the earlier pass and then again at lower-to-aie.
-- (LP) Perhaps create an attribute as part of amdaie.npu.circular_dma_cpy_nd to make use of this information later at lower-to-aie instead of re-invoking/recomputing the same information.

Context of aie.memtile_dma + Transcation Binary: (TO BE USED IN OPTION 1 AS WELL)

This is essentially used to create DMA operations pertaining to L2 to/from L1. So it contains DMA operation blocks marked with aie.dma_start(MM2S/S2MM) as the starting of the corresponding to/from transfer.
Therefore based on previous context, total DMA blocks under each aie.dma_start is going to be numIters * total bufferOps (2 since we use double buffering).

The concept of "controlcode" in the IR exists only in amdaie dialect as amdaie.controlcode that contains circular/non-circular DmaCpyNd ops to explain the DMA operations for L3<->L2<->L1 data transfers.
Using controlcode-to-transaction pass we create/store Transaction binary using amdaie.controlcode datas.

-- (HP) Create a amdaie.memtile_dma, amdaie.dma_start, amdaie.dma_bd and amdaie.next_bd op.
-- (HP) Create amdaie.memtile_dma similar to how aie.memtile_dma is created.
-- (HP) Have a 1:1 mapping from amdaie.memtile_dma to aie.memtile_dma as part of lower-to-aie. (This therefore means that at this stage we can perhaps rename createDMABlocks in lower-to-aie as createShimDMABlocks)

Context of assign-bd-ids + introduction of bd_id_reset (discuss name) op:

This goes through the DMA blocks in aie.memtile_dma and using its algorithm assigns BD ID incrementally.
Since we know that on memtile we have 24 BD IDs available AND a given aie.dma_start DMA chain is governed by numIters * total bufferOps (2) - we need a way to ensure either :-

  1. numIters * total bufferOps (2) of a particular DMA chain (or combination of previous DMA chain of a previous aie.dma_start) for a particular channel should NOT exceed 24.
  2. OR we should introduce/support a new operation bd_id_reset (discuss naming) that, for a particular tile and channel will RESET the BD IDs to be used again.

We are going ahead with point 2 above.

-- (HP) Create bd_id_reset op.
-- (HP) Create a pass and device an algorithm through which bd_id_reset op can be introduced inside a DMA block when BD IDs ought to get exhausted (UNKNOWN but can perhaps be tackled based on how AIE dialect is handling it ?)
Eg:

BEFORE:
============================================
%3 = aie.dma_start(MM2S, 1, ^bb10, ^bb26)
^bb10:  // 2 preds: ^bb9, ^bb25
    aie.use_lock(%lock_0_1_9, AcquireGreaterEqual, 1)
    aie.dma_bd(%buffer_0_1_6 ...
    aie.next_bd ^bb11
^bb11:  // pred: ^bb10
    aie.dma_bd(%buffer_0_1_6 ...
    aie.next_bd ^bb12
^bb12:  // pred: ^bb11                  <---- ASSUME BLOCK ^bb12 IS WHERE BD ID GETS EXHAUSTED
    aie.dma_bd(%buffer_0_1_6 ...
    aie.next_bd ^bb13


AFTER:
=============================================
%3 = aie.dma_start(MM2S, 1, ^bb10, ^bb26)
^bb10:  // 2 preds: ^bb9, ^bb25
    aie.use_lock(%lock_0_1_9, AcquireGreaterEqual, 1)
    aie.dma_bd(%buffer_0_1_6 ...
    aie.next_bd ^bb11
^bb11:  // pred: ^bb10                   <---- WE INSERT BD_ID_RESET IN BLOCK ^bb11
    aie.dma_bd(%buffer_0_1_6 ...
    bd_id_reset(/*tile=*/%tile_0_1, /*channel=*/1)   <--- ALTHOUGH TILE IS SAME AS MEMTILE BUT 
                                                          PERHAPS THIS MAKES IT MORE DESCRIPTIVE
    aie.next_bd ^bb12
^bb12:  // pred: ^bb11                  
    aie.dma_bd(%buffer_0_1_6 ...
    aie.next_bd ^bb13

-- (HP) Support bd_id_reset as part of assign-bd-ids (and possibly in the lower stack, if needed *) <----- UNKNOWN

Glossary

HP : High Priority
LP : Low Priority

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions