-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dispatching a long-running compute shader causes system hang or abnormal behavior #6660
Comments
Could anyone follow up on this issue? |
I can reproduce. Once we have had a timeout, it looks like all subsequent jobs are dead. @mairacanal could you have a look at the timeout/reset code? Here is some of the complaints in dmesg:
(I happen to be in 6.14 kernel, but OP reported it on 6.6). |
I feel there should be a timeout - what is a sensible maximum? |
In the desktop world, which is likely also using 3d hardware, then you don't want it too high as it will stall gui. With the current broken state of the timeout, gui will be dead anyway, so the longer the timeout the better. Even if the timeout code is fixed so subsequent jobs continue to work, I wonder how many clients of this interface will be able to gracefully handle a timeout (resubmitting a job that times out sounds likely to timeout the next time). |
@popcornmix, I have already taken a look into this issue and can divide it into three separate issues.
Although providing a fix to (1) and (2) was quick, (3) involved some debugging and tracing to understand why this is failing on the RPi 5. In the end, we understood the issue and fixed it. I'll work on upstreaming the fixes in the next two days (also downstream as I had to make some DTB changes). Thanks for the complete report of the issue and an easy reproducible example! |
👍 |
Thanks Maira! |
I sent the patches upstream [1] and opened the PR with the fixes. There is just one thing that @notogawa might have missed: increasing the hang limit. After an analysis, I decided not to increase the hang limit. This doesn't mean that CSD jobs longer than 500ms will cause a GPU reset. This means that compute batches longer than 500ms will cause a GPU reset and it is pretty unlikely to have batches that will take longer than 500ms to run. The v3d driver checks if we are making progress with the batches before resetting and if so, it will skip the reset. The issue here is that your application uses a 1x1x1 workgroup, which is unusual. I tested other GPGPU applications, computing FFTs, and matrix multiplications, and I didn't have issues. My recommendation would be to split the work into smaller batches. |
Thank you @mairacanal . I know that splitting the workload into smaller batches would relax the constraints, but there is another reason why this is difficult: "the overhead of ioctl calls." Our compiler generates a single compute shader that fuses the entire deep learning computation graph, allowing us to inference with just "one submit_csd call" from start to finish. A "usual" approach is to generate separate smaller batched compute shaders for each layer, such as convolution, activation functions like ReLU, and matrix multiplication, and so on ... and then sequentially submit them with submit_csd one after another. However, this approach incurs significant overhead from multiple submit_csd calls, which negatively impacts inference performance. For the best, we must reject "usual" approach and accept "unusual" one. With this technique, we have achieved inference speed as shown in this youtube video on a PiZero with a VC4 (though not specifically on v3d). We have also applied the same technique to VC6 (with v3d). And we aim to do the same for VC7. As workloads offloaded to the GPU - even if, theoretically, the CPU on a Pi5 is about twice as fast as the GPU - are becoming increasingly heavy, with LLMs being a prime example, a small hang limit is becoming an even stricter constraint for such computations. |
No, it would be just one ioctl. Please, read the Mesa code to check the difference between jobs and batches [1]. You can use just one compute shader but will need to configure the CSD job differently. |
I already check that part five years ago. And I know how to impl a single-type computation, such as "one large matrix multiplication", using single
The reproducible code I provided is merely a simplified minimal implementation. The way we actually use CSD in our app differs from this code. |
Going through the repository you pointed to, I noticed that the number of batches isn't calculated as in Mesa [1]. Here is a snippet of the Mesa code: uint32_t batches_per_sg = DIV_ROUND_UP(wgs_per_sg * wg_size, 16);
uint32_t whole_sgs = num_wgs / wgs_per_sg;
uint32_t rem_wgs = num_wgs - whole_sgs * wgs_per_sg;
uint32_t num_batches = batches_per_sg * whole_sgs +
DIV_ROUND_UP(rem_wgs * wg_size, 16); We improved CSD handling about 3 years ago (2021) [2] with good performance improvements. Let me know if this snippet helps you. In case it doesn't, could you send me instructions on how to reproduce your application? This way I might provide a more precise answer on what could be done. [2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541 |
Yes, I also use the Mesa code only as a reference, and in practice, I determine the best parameters for us through experiments. The reproducible code is a compute shader with approximately 60,000 instructions, which we cannot publish due to confidentiality. Therefore, I would like to ask a question by giving an example of what is possible with our code. For instance, if you were to execute the entire computation described in this paper [1] within a single |
Are you saying that you can't create an example of a compute shader large enough to demonstrate the problem (and the code to launch it) that isn't confidential? |
Yes. This is obvious from the discussion so far, but it's because "why the hang limit matters" is directly connected to "how to use the QPU(s) for maximum performance" in the type of computations described at #6660 (comment) . Of course, if it is obvious how to specify workgroups, the number of batches, etc., to fit a computation like a newral network model architecture such as #6660 (comment) within only a single |
I'm sorry @notogawa but I don't have experience with neural networks, so I don't think I have the background needed to implement a compute shader for the paper you suggested. Also, because of confidentiality, I don't think I can help you configure the GPU in the best way for your case, as it would include knowledge about the GPU's inner workings and configuration. As a kernel maintainer of the upstream driver, I appreciate that you found the issues related to the GPU reset (to which I sent patches to fix), but I wouldn't be comfortable increasing the timeout for the CSD job (or using a kernel param for it). Our kernel driver ensures that the GPU won't reset if we are making progress with the batches. From my point of view, I see that your application is very specific [1], so I'd recommend you change the hang limit locally. But, I'd appreciate hearing @pelwell and @popcornmix opinions. [1] And I can't provide useful help without seeing the code or configuration |
Thank you for your consideration. That's unfortunate, but I understand. Personally, I hope that in the future, changes that break compatibility (= pi4/vc6 also affected this issue), such as preventing code that was previously usable from user space from working, will not be introduced. This point could potentially influence hardware or OS selection. If there is no further action to be taken, please close this issue on your end. |
If you can come up with something more persuasive than expecting us to understand the implications of a research paper then you might have a chance, but until then it's a no from me. |
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: raspberrypi#6660 Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: raspberrypi#6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: raspberrypi/linux#6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: raspberrypi/linux#6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: #6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
Describe the bug
When we dispatch a shader program using ioctl(SUBMIT_CSD) on a Raspberry Pi 5, if the shader program’s execution time exceeds 500 ms, ioctl(WAIT_BO) returns "Timer expired" or the system hangs.
Once "Timer expired" occurs, even subsequent shader programs that should complete within 500 ms also result in "Timer expired."
When the system hangs, I can’t do anything. Pressing the power button has no effects, and the LED stays green (on).
I suspect this line. Is there any difficulty in relaxing this limit? I think it is too tight for GPGPU.
Steps to reproduce the behaviour
This is an example program to reproduce. In this example, a shader is a busy nop loop.
Case 1: Normal
Case 2: Timer expired
Case 3: System hang
This example is a minimal reproducible program, so it’s just a no-op loop. In reality, however, we’re submitting programs like massive matrix–matrix multiplications.
Device (s)
Raspberry Pi 5
System
Logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: