|
| 1 | +# Shader debugging via GPU shader patching |
| 2 | + |
| 3 | +Idea: instead of CPU emulation, do shader debugging directly on the GPU. |
| 4 | +Patching the shaders to store state, where it is interesting. |
| 5 | + |
| 6 | +Assumptions: |
| 7 | +- shader/pipeline recompilation is fast. Like <1s |
| 8 | +- For the beginning, we can just assume buffer device address support |
| 9 | + Saves us from the hustle with pipeline layouts, can pass the buffer dst |
| 10 | + address via specialization constant. |
| 11 | +- we need to know/support all pipeline extensions to make this work, |
| 12 | + allow proper recompliation. |
| 13 | + SPIRV extensions are only a minor concern but *might* cause issues when |
| 14 | + not supported. |
| 15 | + |
| 16 | +Flow: |
| 17 | +- user selects shader debugging in UI |
| 18 | + - the shader source code is shown |
| 19 | +- then, the user selects a line for a breakpoint. |
| 20 | + And can select the thread/vertex/pixel to be debugged. |
| 21 | + (could potentially be implicitly selected by "debug this vertex/pixel") |
| 22 | + - at this point, the UI installs a hook target |
| 23 | + - (future opt: already start shader patching, pipeline recompilation |
| 24 | + at this point, async) |
| 25 | +- work is submitted with hook: |
| 26 | + - when the target command is executed, instead use the patched pipeline. |
| 27 | + (block for the first time it is submitted? idk might be ok. |
| 28 | + But then need a caching mechanism. Hash by shader+breakpointLine or smth) |
| 29 | + - in the hook state, a buffer and struct layout + names is returned |
| 30 | + or nullopt, when breakpoint was not hit. |
| 31 | + (future: Could make this work for multiple breakpoints at once, |
| 32 | + returning multiple such pairs) |
| 33 | + |
| 34 | +Shader patching: |
| 35 | +- Probably easiest just by hand, without a framework. |
| 36 | +- Remember some header information |
| 37 | +- Iterate over instructions |
| 38 | + - Remember the OpLine where the breakpoint is set. |
| 39 | + - Remember all OpVariables (later: named instructions? if anyone is using it) |
| 40 | + and their function owner? |
| 41 | + - ideally we built the CFG here and check which OpVariables stores |
| 42 | + came before the breakpoint line. But that is for later, just |
| 43 | + consider every variable in the function scope now I think. |
| 44 | + - what about callstacks? for now, do not capture anything I guess. |
| 45 | + Later on: before any function call of the breakpoint function, |
| 46 | + write additional data (at least opLine, possibly also local state?). |
| 47 | + Repeat recursively for those functions, too. |
| 48 | + - hm just opline should be enough. If that call is selected in UI, |
| 49 | + just switch breakpoint to that position then. |
| 50 | +- Then, we know all variables at the breakpoint line. |
| 51 | + - build a buffer layout: just a linear list of all the (local?) variables |
| 52 | + known and accessible at the breakpoint position. Also capture global |
| 53 | + state, e.g. stage inputs? |
| 54 | + |
| 55 | +- allocate a device_address buffer with the needed size (known via type layout) |
| 56 | + - that buffer is hard-connected then to the patched module, |
| 57 | + same lifetime |
| 58 | +- Patch shader |
| 59 | + - Insert constant global value with the device address |
| 60 | + - After breakpoint OpLine, insert ops to copy all variables into |
| 61 | + that buffer device address at their respective offsets |
| 62 | +- create the shader module, compile the pipeline, if needed |
| 63 | + - what about shader objects? |
| 64 | +- draw/dispatch/traceRays with new module/pipeline |
| 65 | + - afterwards, copy from the associated buffer to hook-specific, host_local |
| 66 | + one |
| 67 | + - we have to make sure that not two queues can use the shader-buffer at the |
| 68 | + same time. But we disallow this shader debugging in local captures -> |
| 69 | + it can only be one queue. So shouldn't be an issue. |
| 70 | + |
| 71 | +``` |
| 72 | +// declare type of struct to save holding all variables to save |
| 73 | +%DstStruct = OpTypeStruct ... /* the variable types to store */ |
| 74 | +// declare physicalStorageBuffer pointer type for struct |
| 75 | +%PhysicalBufferPointerDst = OpTypePointer PhysicalStorageBuffer %DstStruct |
| 76 | +
|
| 77 | +// Create struct of variables to save |
| 78 | +%mem1 = OpLoad %var1 |
| 79 | +... |
| 80 | +%structSrc = OpCompositeConstruct %DstStruct %mem1 ... |
| 81 | +
|
| 82 | +// Create variable of type PhysicalBufferPointerDst with hard-coded buffer address |
| 83 | +%bufAddress = OpConstant %PhysicalBufferPointerDst /* hard-coded address */ |
| 84 | +// Create access chain for storing to buffer struct |
| 85 | +// TODO: not sure if int_0 is needed. Meh, types here look weird. |
| 86 | +%structDst = OpAccessChain %PhysicalBufferPointerDst %bufAddress %int_0 |
| 87 | + OpStore %structDst %structSrc |
| 88 | +
|
| 89 | +``` |
| 90 | + |
| 91 | +I dislike the hardcoded address a bit. |
| 92 | +Maybe at least use the same (fixed size, idk, 64k) buffer for all patched |
| 93 | +modules? can't be active at the same time. |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +WAIT, can we create a pointer with OpConstant? Not sure. |
| 98 | + |
| 99 | +--- |
| 100 | + |
| 101 | +I guess for cpu-side representation of the struct, we can use buffmt. |
| 102 | +It becomes just one vil::Type (and a LinearAllocator). |
| 103 | +So, we have in CommandHookState: |
| 104 | + |
| 105 | +``` |
| 106 | +struct CopiedShaderData { |
| 107 | + Type type; // always a struct |
| 108 | + OwnBuffer data; |
| 109 | +}; |
| 110 | +``` |
| 111 | + |
| 112 | +Later on, can add additional metadata (e.g. callstack) to dst data buffer. |
| 113 | +We build this Type during shader patching. |
| 114 | +For OpLine candidates for the breakpoint, remember the start of their |
| 115 | +function. When we have our selected candidate, evaluate all OpVariable |
| 116 | +values in that function (that came before the OpLine instruction itself). |
0 commit comments