Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatching a long-running compute shader causes system hang or abnormal behavior #6660

Closed
notogawa opened this issue Feb 10, 2025 · 18 comments · Fixed by #6692
Closed

Dispatching a long-running compute shader causes system hang or abnormal behavior #6660

notogawa opened this issue Feb 10, 2025 · 18 comments · Fixed by #6692

Comments

@notogawa
Copy link

Describe the bug

When we dispatch a shader program using ioctl(SUBMIT_CSD) on a Raspberry Pi 5, if the shader program’s execution time exceeds 500 ms, ioctl(WAIT_BO) returns "Timer expired" or the system hangs.

Once "Timer expired" occurs, even subsequent shader programs that should complete within 500 ms also result in "Timer expired."

When the system hangs, I can’t do anything. Pressing the power button has no effects, and the LED stays green (on).

I suspect this line. Is there any difficulty in relaxing this limit? I think it is too tight for GPGPU.

Steps to reproduce the behaviour

This is an example program to reproduce. In this example, a shader is a busy nop loop.

$ git clone https://gist.github.com/notogawa/4dcebe6db14f5898dee85babb85f7d37
$ cd 4dcebe6db14f5898dee85babb85f7d37
$ gcc -o main main.c
$ ./main N (N is nop-loop count)

Case 1: Normal

$ ./main 1000000
[loop:1000000]
0.008614 sec
$ ./main 1000000
[loop:1000000]
0.008624 sec
$ ./main 64000000
[loop:64000000]
0.271148 sec

Case 2: Timer expired

$ ./main 128000000
[loop:128000000]
wait_bo: Timer expired <- display after 10sec
$ ./main 1000000
[loop:1000000]
wait_bo: Timer expired

Case 3: System hang

$ ./main 128000000
[loop:128000000]
(hang.)

This example is a minimal reproducible program, so it’s just a no-op loop. In reality, however, we’re submitting programs like massive matrix–matrix multiplications.

Device (s)

Raspberry Pi 5

System

$ cat /etc/rpi-issue
Raspberry Pi reference 2024-11-19
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 891df1e21ed2b6099a2e6a13e26c91dea44b34d4, stage2
$ vcgencmd version
2024/09/23 14:02:56
Copyright (c) 2012 Broadcom
version 26826259 (release) (embedded)
$ uname -a
Linux pi5 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux

Logs

No response

Additional context

No response

@notogawa
Copy link
Author

Could anyone follow up on this issue?
Please let me know if you are unable to reproduce the problem.

@popcornmix
Copy link
Collaborator

I can reproduce. Once we have had a timeout, it looks like all subsequent jobs are dead.
So two issues:
1: a timeout should be recoverable and future smaller jobs should work
2: perhaps the timeout should be increased to allow larger jobs

@mairacanal could you have a look at the timeout/reset code? Here is some of the complaints in dmesg:

[62974.906781] v3d 1002000000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[62974.906789] v3d 1002000000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[62975.154380] v3d 1002000000.v3d: MMUC flush wait idle failed
[62975.154384] v3d 1002000000.v3d: MMU flush timeout
[62975.418810] Unable to handle kernel NULL pointer dereference at virtual address 00000000000005c7
[62975.427600] Mem abort info:
[62975.430384]   ESR = 0x0000000096000005
[62975.434126]   EC = 0x25: DABT (current EL), IL = 32 bits
[62975.439432]   SET = 0, FnV = 0
[62975.442477]   EA = 0, S1PTW = 0
[62975.445609]   FSC = 0x05: level 1 translation fault
[62975.450479] Data abort info:
[62975.453352]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[62975.458830]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[62975.463873]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[62975.469177] user pgtable: 16k pages, 47-bit VAs, pgdp=00000001c0e08000
[62975.475697] [00000000000005c7] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[62975.484395] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[62975.490654] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device binfmt_misc spidev brcmfmac_wcc aes_ce_blk aes_ce_cipher ghash_ce gf128mul libaes sha2_ce sha256_arm64 vc4 sha1_ce brcmfmac sha1_generic brcmutil snd_soc_hdmi_codec drm_display_helper raspberrypi_hwmon cfg80211 cec drm_client_lib drm_dma_helper snd_soc_core v3d snd_compress snd_pcm_dmaengine i2c_brcmstb spi_bcm2835 gpu_sched rpivid_hevc(C) rfkill snd_pcm pisp_be v4l2_mem2mem snd_timer drm_shmem_helper videobuf2_dma_contig rp1_pio snd drm_kms_helper gpio_keys videobuf2_memops videobuf2_v4l2 videodev raspberrypi_gpiomem rp1_adc rp1 rp1_mailbox videobuf2_common mc nvmem_rmem uio_pdrv_genirq uio i2c_dev fuse drm drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6
[62975.557569] CPU: 0 UID: 0 PID: 17494 Comm: kworker/0:1 Tainted: G         C         6.14.0-rc2-v8-16k #15
[62975.567128] Tainted: [C]=CRAP
[62975.570086] Hardware name: Raspberry Pi 5 Model B Rev 1.1 (DT)
[62975.575909] Workqueue: events drm_sched_job_timedout [gpu_sched]
[62975.581912] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[62975.588865] pc : v3d_job_start_stats.isra.0+0x48/0xd8 [v3d]
[62975.594432] lr : v3d_job_start_stats.isra.0+0x2c/0xd8 [v3d]
[62975.599998] sp : ffffc00084fcbc40
[62975.603303] x29: ffffc00084fcbc40 x28: ffff800080ca0d68 x27: ffff800040a22b80
[62975.610430] x26: ffff800080ca0c78 x25: 00000000ffffffff x24: 00000000000005ef
[62975.617558] x23: 0000000000000001 x22: ffff800040a5fd80 x21: ffff800080ca0000
[62975.624684] x20: ffffffffffffffff x19: 0000000000000003 x18: 00000000ffffffff
[62975.631811] x17: 203a544154535f52 x16: ffffd06fce586a98 x15: 524f5252452a205d
[62975.638937] x14: 5d6433765b207465 x13: 3030303130303030 x12: 7830203a54415453
[62975.646064] x11: 5f5252455f443356 x10: ffffd06fcfc87160 x9 : ffffd06fae8c7afc
[62975.653191] x8 : ffff800040a5fe00 x7 : 0000000000000000 x6 : ffff800080ca0408
[62975.660317] x5 : 0000000000000002 x4 : 0000000002800001 x3 : ffff800003e863c0
[62975.667444] x2 : 0000000000000003 x1 : 000000000000005f x0 : 000039469b77b3a3
[62975.674572] Call trace:
[62975.677009]  v3d_job_start_stats.isra.0+0x48/0xd8 [v3d] (P)
[62975.682576]  v3d_csd_job_run+0xbc/0x2a8 [v3d]
[62975.686926]  drm_sched_resubmit_jobs+0x98/0x238 [gpu_sched]
[62975.692492]  v3d_gpu_reset_for_timeout+0x84/0xd8 [v3d]
[62975.697624]  v3d_csd_job_timedout+0x68/0x80 [v3d]
[62975.702321]  drm_sched_job_timedout+0x7c/0x120 [gpu_sched]
[62975.707799]  process_one_work+0x15c/0x3c0
[62975.711803]  worker_thread+0x2e4/0x3f0
[62975.715545]  kthread+0x138/0x1f0
[62975.718765]  ret_from_fork+0x10/0x20
[62975.722334] Code: b9000861 d37b7e61 2a1303e2 8b010281 (b9456824) 
[62975.728418] ---[ end trace 0000000000000000 ]---

(I happen to be in 6.14 kernel, but OP reported it on 6.6).

@pelwell
Copy link
Contributor

pelwell commented Feb 14, 2025

I feel there should be a timeout - what is a sensible maximum?

@popcornmix
Copy link
Collaborator

I feel there should be a timeout - what is a sensible maximum?

In the desktop world, which is likely also using 3d hardware, then you don't want it too high as it will stall gui.
For non-desktop (or non-3d accelerated desktop) a longer timeout seems more acceptable.
Possibly an override (cmdline.txt/sysfs) would be a reasonable compromise.

With the current broken state of the timeout, gui will be dead anyway, so the longer the timeout the better.

Even if the timeout code is fixed so subsequent jobs continue to work, I wonder how many clients of this interface will be able to gracefully handle a timeout (resubmitting a job that times out sounds likely to timeout the next time).

@mairacanal
Copy link
Contributor

@popcornmix, I have already taken a look into this issue and can divide it into three separate issues.

  1. The scheduler loops resubmitting the same guilty job repeatedly.
  2. The hang limit timeout is too small for long compute shader jobs.
  3. After the reset, the GPU hangs, making it impossible to complete jobs from any of the queues and freezing the GUI.

Although providing a fix to (1) and (2) was quick, (3) involved some debugging and tracing to understand why this is failing on the RPi 5. In the end, we understood the issue and fixed it.

I'll work on upstreaming the fixes in the next two days (also downstream as I had to make some DTB changes). Thanks for the complete report of the issue and an easy reproducible example!

@notogawa
Copy link
Author

👍

@popcornmix
Copy link
Collaborator

Thanks Maira!

@mairacanal
Copy link
Contributor

I sent the patches upstream [1] and opened the PR with the fixes. There is just one thing that @notogawa might have missed: increasing the hang limit.

After an analysis, I decided not to increase the hang limit. This doesn't mean that CSD jobs longer than 500ms will cause a GPU reset. This means that compute batches longer than 500ms will cause a GPU reset and it is pretty unlikely to have batches that will take longer than 500ms to run. The v3d driver checks if we are making progress with the batches before resetting and if so, it will skip the reset.

The issue here is that your application uses a 1x1x1 workgroup, which is unusual. I tested other GPGPU applications, computing FFTs, and matrix multiplications, and I didn't have issues. My recommendation would be to split the work into smaller batches.

[1] https://lore.kernel.org/dri-devel/[email protected]/T/

@notogawa
Copy link
Author

Thank you @mairacanal .

I know that splitting the workload into smaller batches would relax the constraints, but there is another reason why this is difficult: "the overhead of ioctl calls."

Our compiler generates a single compute shader that fuses the entire deep learning computation graph, allowing us to inference with just "one submit_csd call" from start to finish. A "usual" approach is to generate separate smaller batched compute shaders for each layer, such as convolution, activation functions like ReLU, and matrix multiplication, and so on ... and then sequentially submit them with submit_csd one after another. However, this approach incurs significant overhead from multiple submit_csd calls, which negatively impacts inference performance. For the best, we must reject "usual" approach and accept "unusual" one.

With this technique, we have achieved inference speed as shown in this youtube video on a PiZero with a VC4 (though not specifically on v3d). We have also applied the same technique to VC6 (with v3d). And we aim to do the same for VC7.

As workloads offloaded to the GPU - even if, theoretically, the CPU on a Pi5 is about twice as fast as the GPU - are becoming increasingly heavy, with LLMs being a prime example, a small hang limit is becoming an even stricter constraint for such computations.

@mairacanal
Copy link
Contributor

I know that splitting the workload into smaller batches would relax the constraints, but there is another reason why this is difficult: "the overhead of ioctl calls."

No, it would be just one ioctl. Please, read the Mesa code to check the difference between jobs and batches [1]. You can use just one compute shader but will need to configure the CSD job differently.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/broadcom/vulkan/v3dv_cmd_buffer.c?ref_type=heads#L4309

@notogawa
Copy link
Author

I already check that part five years ago. And I know how to impl a single-type computation, such as "one large matrix multiplication", using single submit_csd with workgroup, num of batches, uniforms, threading flag, compute shader payload register files, and so on. What I need is a capability to impl sequence of various types of computations like such as convolution, activation functions, matrix multiplication, resizing, transpose, ... , using single submit_csd with the best performance.

The issue here is that your application uses a 1x1x1 workgroup, which is unusual.

The reproducible code I provided is merely a simplified minimal implementation. The way we actually use CSD in our app differs from this code.

@mairacanal
Copy link
Contributor

Going through the repository you pointed to, I noticed that the number of batches isn't calculated as in Mesa [1]. Here is a snippet of the Mesa code:

   uint32_t batches_per_sg = DIV_ROUND_UP(wgs_per_sg * wg_size, 16);
   uint32_t whole_sgs = num_wgs / wgs_per_sg;
   uint32_t rem_wgs = num_wgs - whole_sgs * wgs_per_sg;
   uint32_t num_batches = batches_per_sg * whole_sgs +
                          DIV_ROUND_UP(rem_wgs * wg_size, 16);

We improved CSD handling about 3 years ago (2021) [2] with good performance improvements. Let me know if this snippet helps you. In case it doesn't, could you send me instructions on how to reproduce your application? This way I might provide a more precise answer on what could be done.

[1] https://github.com/Idein/py-videocore6/blob/f14c853f5bf4bd22420fdd56558e38ebb5a1c097/videocore6/driver.py#L126

[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541

@notogawa
Copy link
Author

notogawa commented Mar 1, 2025

Yes, I also use the Mesa code only as a reference, and in practice, I determine the best parameters for us through experiments.

The reproducible code is a compute shader with approximately 60,000 instructions, which we cannot publish due to confidentiality. Therefore, I would like to ask a question by giving an example of what is possible with our code. For instance, if you were to execute the entire computation described in this paper [1] within a single submit_csd, how would you configure the CSD parameters?

[1] https://arxiv.org/pdf/1905.02244

@pelwell
Copy link
Contributor

pelwell commented Mar 2, 2025

Are you saying that you can't create an example of a compute shader large enough to demonstrate the problem (and the code to launch it) that isn't confidential?

@notogawa
Copy link
Author

notogawa commented Mar 2, 2025

Yes. This is obvious from the discussion so far, but it's because "why the hang limit matters" is directly connected to "how to use the QPU(s) for maximum performance" in the type of computations described at #6660 (comment) .

Of course, if it is obvious how to specify workgroups, the number of batches, etc., to fit a computation like a newral network model architecture such as #6660 (comment) within only a single submit_csd, then I would appreciate your guidance. In that case, provided that the performance remains comparable, the hang limit might not need to be as long.

@mairacanal
Copy link
Contributor

mairacanal commented Mar 2, 2025

I'm sorry @notogawa but I don't have experience with neural networks, so I don't think I have the background needed to implement a compute shader for the paper you suggested. Also, because of confidentiality, I don't think I can help you configure the GPU in the best way for your case, as it would include knowledge about the GPU's inner workings and configuration.

As a kernel maintainer of the upstream driver, I appreciate that you found the issues related to the GPU reset (to which I sent patches to fix), but I wouldn't be comfortable increasing the timeout for the CSD job (or using a kernel param for it). Our kernel driver ensures that the GPU won't reset if we are making progress with the batches. From my point of view, I see that your application is very specific [1], so I'd recommend you change the hang limit locally. But, I'd appreciate hearing @pelwell and @popcornmix opinions.

[1] And I can't provide useful help without seeing the code or configuration

@notogawa
Copy link
Author

notogawa commented Mar 3, 2025

Thank you for your consideration. That's unfortunate, but I understand.

Personally, I hope that in the future, changes that break compatibility (= pi4/vc6 also affected this issue), such as preventing code that was previously usable from user space from working, will not be introduced. This point could potentially influence hardware or OS selection.

If there is no further action to be taken, please close this issue on your end.

@pelwell
Copy link
Contributor

pelwell commented Mar 3, 2025

If you can come up with something more persuasive than expecting us to understand the implications of a research paper then you might have a chance, but until then it's a no from me.

mairacanal added a commit to mairacanal/linux-rpi that referenced this issue Mar 6, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: raspberrypi#6660
Signed-off-by: Maíra Canal <[email protected]>
pelwell pushed a commit that referenced this issue Mar 7, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Signed-off-by: Maíra Canal <[email protected]>
popcornmix pushed a commit to popcornmix/linux that referenced this issue Mar 12, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: raspberrypi#6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
pelwell pushed a commit that referenced this issue Mar 12, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
pelwell pushed a commit that referenced this issue Mar 12, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
popcornmix pushed a commit that referenced this issue Mar 13, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
popcornmix pushed a commit that referenced this issue Mar 13, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
popcornmix pushed a commit that referenced this issue Mar 17, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Mar 18, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: raspberrypi/linux#6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
popcornmix pushed a commit that referenced this issue Mar 24, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
popcornmix pushed a commit that referenced this issue Mar 25, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
popcornmix pushed a commit that referenced this issue Mar 25, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Mar 25, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: raspberrypi/linux#6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
popcornmix pushed a commit that referenced this issue Mar 31, 2025
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: #6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants