Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CM5 without wifi hangs on reboot #6647

Open
nbuchwitz opened this issue Feb 4, 2025 · 39 comments
Open

CM5 without wifi hangs on reboot #6647

nbuchwitz opened this issue Feb 4, 2025 · 39 comments

Comments

@nbuchwitz
Copy link
Contributor

nbuchwitz commented Feb 4, 2025

Describe the bug

We stumbled over an issue where all CM5 without wifi seem to hang when rebooted. After some waiting the reboot is completed whereas all CM5 with wifi show no such error (same base boards, same software). As is some care cases the reboot even worked on CM5 without wifi I started to debug it further.

When reboot hangs:

Dec 01 13:28:51 RevPi systemd[1]: Shutting down.
Dec 01 13:28:51 RevPi systemd[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
Dec 01 13:28:51 RevPi systemd[1]: Watchdog running with a hardware timeout of 10min.
Dec 01 13:28:51 RevPi kernel: watchdog: watchdog0: watchdog did not stop!
Dec 01 13:28:51 RevPi systemd-shutdown[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
Dec 01 13:28:52 RevPi systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
Dec 01 13:28:52 RevPi systemd-shutdown[1]: Syncing filesystems and block devices.
Dec 01 13:28:52 RevPi systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Dec 01 13:28:52 RevPi systemd-journald[167]: Received SIGTERM from PID 1 (systemd-shutdow).
Dec 01 13:28:52 RevPi systemd-journald[167]: Journal stopped

When reboot works immediately:

Dec 01 13:29:57 RevPi136828 systemd[1]: Shutting down.
Dec 01 13:29:58 RevPi136828 systemd[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
Dec 01 13:29:58 RevPi136828 systemd[1]: Watchdog running with a hardware timeout of 10min.
Dec 01 13:29:58 RevPi136828 kernel: mmc1: Failed to initialize a non-removable card
Dec 01 13:29:58 RevPi136828 kernel: watchdog: watchdog0: watchdog did not stop!
Dec 01 13:29:58 RevPi136828 systemd-shutdown[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
Dec 01 13:29:58 RevPi136828 systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
Dec 01 13:29:58 RevPi136828 systemd-shutdown[1]: Syncing filesystems and block devices.
Dec 01 13:29:58 RevPi136828 systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Dec 01 13:29:58 RevPi136828 systemd-journald[174]: Received SIGTERM from PID 1 (systemd-shutdow).
Dec 01 13:29:58 RevPi136828 systemd-journald[174]: Journal stopped

The culprit seems to be (always present when the reboot works):

Dec 01 13:29:58 RevPi136828 kernel: mmc1: Failed to initialize a non-removable card

So it looks like there might be an issue with the unused sdio/ mmc1 which is not used on the wifi less variant of CM5. In order to verify my suspicion I've created a simple overlay which deactivates sdio 2 completely:

[...]
       fragment@13 {
               target = <&sdio2>;
               __overlay__ {
                       status = "disabled";
               };
       };

With this the reboot works reliable in all tests so far. Even though it kinda works with a custom overlay it looks wrong. It also is not a reliable solution for production as during first boot only the cm5io dt loaded by the firmware is present and a subsequent reboot will fail very often.

Same works on CM4 with / without wifi (different overlay though, but should be irrelevant as it also happens with pure CM dt).

Any ideas / insights on this?

Steps to reproduce the behaviour

  1. Boot device with CM5 without wifi module
  2. sudo reboot

Device (s)

Raspberry Pi CM5

System

2024/09/23 14:02:56 
Copyright (c) 2012 Broadcom
version 26826259 (release) (embedded)

EEPROM release: 1727096576

Kernel: 6.6.74+rpt-rpi-v8

Logs

No response

Additional context

No response

@nbuchwitz
Copy link
Contributor Author

nbuchwitz commented Feb 6, 2025

I did some further research and noticed that /sys/kernel/debug/mmc1/ios differs in good and bad cases:

pi@RevPi136828:~/debug$ diff --side-by-side working/mmc1_ios notworking/mmc1_ios 
clock:		0 Hz					      |	clock:		100000 Hz
vdd:		0 (invalid)				      |	actual clock:	100000 Hz
							      >	vdd:		21 (3.3 ~ 3.4 V)
bus mode:	2 (push-pull)					bus mode:	2 (push-pull)
chip select:	0 (don't care)					chip select:	0 (don't care)
power mode:	0 (off)					      |	power mode:	2 (on)
bus width:	0 (1 bits)					bus width:	0 (1 bits)
timing spec:	0 (legacy)					timing spec:	0 (legacy)
signal voltage:	0 (3.30 V)					signal voltage:	0 (3.30 V)
driver type:	0 (driver type B)				driver type:	0 (driver type B)

What could be the reason that power mode is set to on in the non-working (=hangs during reboot) case?

It also seems that if the power mode is set to on it is reset to off after approx. 53 seconds (see attached debug log, first line is date, then uptime in seconds and then mmc1_ios content)

debug.txt

After I performed a firmware update to 1737505011 the time after the power mode is switched to off increased to ~ 83 seconds (~ +30 seconds, 1737983339 is about 10 seconds less).

debug-fw1737505011.txt

A downgrade to 1731427844 showed the same behavior as with 1727096576 (initial firmware on this compute module): power_mode is set to off after approx 53 seconds:

debug-fw1731427844.txt

Handover to OS is about 8-9 seconds, so I don't think that the difference is resulted by something like this.

So it seems to me that this might be a firmware related issue or at least it has some influence.

Did also some testing on a CM4 without wifi and there /sys/kernel/debug/mmc1/ios shows that the interface is disabled correctly upon boot.

@pelwell
Copy link
Contributor

pelwell commented Feb 10, 2025

Hi Nicolai, we'll look into disabling SDIO2 from the firmware for non-WiFi-enabled parts.

@nbuchwitz
Copy link
Contributor Author

Thanks Phil for the update

@pelwell
Copy link
Contributor

pelwell commented Feb 10, 2025

pieeprom_cm5nowifi.zip
Here's a trial build with a theoretical fix - it should disable sdio2 on a CM5 with no WiFi. I've tried it on a Pi 5 to confirm that it isn't completely broken, but I don't have a suitable CM5 to hand - the next task is to locate one.

@nbuchwitz
Copy link
Contributor Author

Give me some minutes and I will test it, I have modules at hand ...

@nbuchwitz
Copy link
Contributor Author

nbuchwitz commented Feb 10, 2025

I can confirm, mmc1 is gone with the test firmware:

pi@RevPi136828:~$ ls -d /sys/kernel/debug/mmc?
/sys/kernel/debug/mmc0
pi@RevPi136828:~$ rpi-eeprom-update 
BOOTLOADER: up to date
   CURRENT: Mon Feb 10 12:04:08 PM UTC 2025 (1739189048)
    LATEST: Wed Jan 22 12:16:51 AM UTC 2025 (1737505011)
   RELEASE: default (/usr/lib/firmware/raspberrypi/bootloader-2712/default)
            Use raspi-config to change the release.

Reboot is also working without hang / delay.

@pelwell
Copy link
Contributor

pelwell commented Feb 10, 2025

Great. We'll get that merged, then into a release at some point.

@nbuchwitz
Copy link
Contributor Author

Thanks. In the meantime I will do some thinking and come up with some tooling for our end of line tests, so we can update the modules in place.

@nbuchwitz
Copy link
Contributor Author

Just a note for others which might need to work around the issue that the first reboot after firmware update still hangs (which is fine as we're still running the old firmware):

# set power to permanently on in order to avoid timeout of probe cycles
echo on | sudo tee /sys/class/mmc_host/mmc1/device/power/control

# unbind driver on mmc1
basename $(realpath /sys/class/mmc_host/mmc1/../..) | sudo tee /sys/bus/platform/drivers/sdhci-brcmstb/unbind

@pelwell
Copy link
Contributor

pelwell commented Feb 11, 2025

It's odd that a non-WiFi CM5 is rebooting without issue for me. I've tried rebooting before the mmc1: Failed to initialize a non-removable card error message (which I don't always see), and I've tried afterwards. This is with the stock firmware 2024/09/23, and with the latest release (Wed 22 Jan 00:16:51 UTC 2025 (1737505011)). The worst I see is a stall of up to 40 seconds until the mmc driver gives up (mmc1: Failed to initialize a non-removable card).

The power mode difference is just an indicator of whether or not the kernel has given up on there being something on that SDIO bus - it turns off the power when it loses hope.

@nbuchwitz
Copy link
Contributor Author

Yes, at some point the device is rebooting (after the driver gives up on mmc1). The issue (at least for us) is, that this causes timeouts during end of line test, as the systems expects the DUT to reboot within a reasonable period. On CM5 this extra delay after reboot is (depending on how fast the provisioning of the HAT eeprom was) up to 60 seconds which will case a timeout. Also noteworthy that on CM4 with non wifi variants this works without additional delay.

@pelwell
Copy link
Contributor

pelwell commented Feb 11, 2025

The patch to disable sdio2 has been merged, so future EEPROM builds will include it. I do wonder though if the kernel retry mechanism can be adjusted to not take quite so long.

@nbuchwitz
Copy link
Contributor Author

I do wonder though if the kernel retry mechanism can be adjusted to not take quite so long.

That was also I was initially thinking when I raised this issue. Haven't had the time to dig deeper what the differences for bcm2711 and 2712 are here, but from a first look they share at least the same driver for mmc1.

timg236 added a commit to timg236/rpi-eeprom that referenced this issue Feb 11, 2025

Verified

This commit was signed with the committer’s verified signature.
gian1200 Giancarlo Calderón Cárdenas
* recovery: Walk partitions to delete recovery.bin
  Previously, recovery.bin would fail to delete itself
  if the bootrom loaded recovery.bin where there are multiple FAT
  partitions and the first partition does not contain recovery.bin
  Update the rename code to walk the partition table to find
  the recovery.bin file to delete.
* pi5: Add config filter for simple boot variable expressions (experimental)
  Add support for a new bootloader/config.txt conditional filter
  which tests the partition, boot_count and boot_arg1 variables.
  Syntax (no spaces):
  ARG boot_arg1, boot_count or partition (EEPROM config stage only)
  [ARG=VALUE]      selected if (ARG == VALUE)
  [ARG&MASK]       selected if ((ARG & VALUE) != 0))
  [ARG&MASK=VALUE] selected if ((ARG & MASK) == VALUE)
  [ARG<VALUE]      selected if (ARG < VALUE)
  [ARG>VALUE]      selected if (ARG > VALUE)
  where VALUE and MASK are unsigned integer constants and ARG
  corresponds to the value in the reset register before the
  config file is parsed.
* pi5: Add a boot-count bootloader variable (experimental)
  Store the boot-count in a reset register and increment just
  before the boot-order state-machine. The boot-count variable
  is visible via device-tree /proc/device-tree/chosen/bootloader/count
  and can be read/set via vcmailbox
  GET: sudo vcmailbox 0x0003008d 4 4 0
  SET to N: sudo vcmailbox 0x0003808d 4 4 N
* pi5: Add user-defined reboot argument (boot_arg1) (experimental)
  Add support for a user-defined boot parameter stored in a reset-safe
  scratch register on BCM2712.  This is visible via device-tree at
  /proc/device-tree/chosen/bootloader/arg1 and via vcmailboxes
  GET arg1: sudo vcmailbox 0x0003008c 8 8 1 0
  SET arg1 to 42: sudo vcmailbox 0x0003808c 8 8 1 42
  or via config.txt
  set_reboot_arg1=42
  The variable is NOT cleared automatically and will persist until
  a power-on-reset.
* Enable overriding of high partition numbers
  Previously, the PARTITION=N bootloader config setting would only
  be used at power on reset or if the partition number passed to
  reboot was zero.
  Change the behaviour so that the bootloader config PARTITION
  property can override the reboot partition number if the reboot
  parameter is > 31.
* Disable WiFi PMIC output on CM5 modules without WiFi
  Disable the 3.7V WiFi power supply on CM5 modules which do not have a
  WiFi module fitted. This fixes some stability issues where a CM5
  would shutdown due to a spurious over-voltage condition on the
  non-connected WiFi power supply.
* Add memory barrier to the mbox handler
  Firmware issue 1944 reports receiving kernel warnings about firmware
  requests where the status return code is 0. This should not be
  possible, as handle_mbox_property always sets the top bit of the return
  code, with the bottom bit indicating success or failure. If the firmware
  had died, the firmware driver would report a timeout due to the lack of
  a mailbox interrupt, and that isn't happening.
  See: raspberrypi/firmware#1944
* support dts files with size-cells of 2
  DTS files with a top-level #size-cells of 2 make a lot of sense for
  systems with a lot of RAM, but the firmware is currently inconsistent
  in its support for that. Fix up the other cases to honor #size-cells
  and #address-cells.
* Disable SDIO2 for CM5s without WiFi
  It has been observed that CM5s without WiFi hang on reboot. To prevent
  that, disable the sdio2 node on those devices.
  See: raspberrypi/linux#6647
* arm_dt: Use dtoverlay_enable_node
  Convert the open-coded DT node status changes to use the new dtoverlay
  method dtoverlay_enable_node.
* dtoverlay: Add dtoverlay_enable_node
  Add a helper function for setting the status of a node.
timg236 added a commit to raspberrypi/rpi-eeprom that referenced this issue Feb 11, 2025

Verified

This commit was signed with the committer’s verified signature.
gian1200 Giancarlo Calderón Cárdenas
* recovery: Walk partitions to delete recovery.bin
  Previously, recovery.bin would fail to delete itself
  if the bootrom loaded recovery.bin where there are multiple FAT
  partitions and the first partition does not contain recovery.bin
  Update the rename code to walk the partition table to find
  the recovery.bin file to delete.
* pi5: Add config filter for simple boot variable expressions (experimental)
  Add support for a new bootloader/config.txt conditional filter
  which tests the partition, boot_count and boot_arg1 variables.
  Syntax (no spaces):
  ARG boot_arg1, boot_count or partition (EEPROM config stage only)
  [ARG=VALUE]      selected if (ARG == VALUE)
  [ARG&MASK]       selected if ((ARG & VALUE) != 0))
  [ARG&MASK=VALUE] selected if ((ARG & MASK) == VALUE)
  [ARG<VALUE]      selected if (ARG < VALUE)
  [ARG>VALUE]      selected if (ARG > VALUE)
  where VALUE and MASK are unsigned integer constants and ARG
  corresponds to the value in the reset register before the
  config file is parsed.
* pi5: Add a boot-count bootloader variable (experimental)
  Store the boot-count in a reset register and increment just
  before the boot-order state-machine. The boot-count variable
  is visible via device-tree /proc/device-tree/chosen/bootloader/count
  and can be read/set via vcmailbox
  GET: sudo vcmailbox 0x0003008d 4 4 0
  SET to N: sudo vcmailbox 0x0003808d 4 4 N
* pi5: Add user-defined reboot argument (boot_arg1) (experimental)
  Add support for a user-defined boot parameter stored in a reset-safe
  scratch register on BCM2712.  This is visible via device-tree at
  /proc/device-tree/chosen/bootloader/arg1 and via vcmailboxes
  GET arg1: sudo vcmailbox 0x0003008c 8 8 1 0
  SET arg1 to 42: sudo vcmailbox 0x0003808c 8 8 1 42
  or via config.txt
  set_reboot_arg1=42
  The variable is NOT cleared automatically and will persist until
  a power-on-reset.
* Enable overriding of high partition numbers
  Previously, the PARTITION=N bootloader config setting would only
  be used at power on reset or if the partition number passed to
  reboot was zero.
  Change the behaviour so that the bootloader config PARTITION
  property can override the reboot partition number if the reboot
  parameter is > 31.
* Disable WiFi PMIC output on CM5 modules without WiFi
  Disable the 3.7V WiFi power supply on CM5 modules which do not have a
  WiFi module fitted. This fixes some stability issues where a CM5
  would shutdown due to a spurious over-voltage condition on the
  non-connected WiFi power supply.
* Add memory barrier to the mbox handler
  Firmware issue 1944 reports receiving kernel warnings about firmware
  requests where the status return code is 0. This should not be
  possible, as handle_mbox_property always sets the top bit of the return
  code, with the bottom bit indicating success or failure. If the firmware
  had died, the firmware driver would report a timeout due to the lack of
  a mailbox interrupt, and that isn't happening.
  See: raspberrypi/firmware#1944
* support dts files with size-cells of 2
  DTS files with a top-level #size-cells of 2 make a lot of sense for
  systems with a lot of RAM, but the firmware is currently inconsistent
  in its support for that. Fix up the other cases to honor #size-cells
  and #address-cells.
* Disable SDIO2 for CM5s without WiFi
  It has been observed that CM5s without WiFi hang on reboot. To prevent
  that, disable the sdio2 node on those devices.
  See: raspberrypi/linux#6647
* arm_dt: Use dtoverlay_enable_node
  Convert the open-coded DT node status changes to use the new dtoverlay
  method dtoverlay_enable_node.
* dtoverlay: Add dtoverlay_enable_node
  Add a helper function for setting the status of a node.
@pelwell
Copy link
Contributor

pelwell commented Feb 11, 2025

The rescan code tries 3 different card types at 4 different clock frequencies. All of those tests involve timeouts of specific durations, so they shouldn't simply be shortened. The other approach would be to make the scanning interruptable at some granularity - at least between frequencies. There may be a way to mark that the interface is being shut down - perhaps using the rescan_disable flag - but it's not something I'd want to do hastily.

@neonblind
Copy link

same issue with Pi5

Feb 24 19:23:12 RaspberryPi5 systemd[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
Feb 24 19:23:12 RaspberryPi5 systemd[1]: Watchdog running with a hardware timeout of 10min.
Feb 24 19:23:12 RaspberryPi5 kernel: watchdog: watchdog0: watchdog did not stop!
Feb 24 19:23:12 RaspberryPi5 systemd-shutdown[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
Feb 24 19:23:12 RaspberryPi5 systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
Feb 24 19:23:12 RaspberryPi5 systemd-shutdown[1]: Syncing filesystems and block devices.
Feb 24 19:23:12 RaspberryPi5 systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Feb 24 19:23:12 RaspberryPi5 systemd-journald[292]: Received SIGTERM from PID 1 (systemd-shutdow).
Feb 24 19:23:12 RaspberryPi5 systemd-journald[292]: Journal stopped

@Muny
Copy link

Muny commented Mar 17, 2025

This issue smells very similar to this: https://forums.raspberrypi.com/viewtopic.php?t=288866

Just a note for others which might need to work around the issue that the first reboot after firmware update still hangs (which is fine as we're still running the old firmware):

# set power to permanently on in order to avoid timeout of probe cycles
echo on | sudo tee /sys/class/mmc_host/mmc1/device/power/control

# unbind driver on mmc1
basename $(realpath /sys/class/mmc_host/mmc1/../..) | sudo tee /sys/bus/platform/drivers/sdhci-brcmstb/unbind

After running these two commands (with mmc0), I am able to shutdown my CM5Lite, booted off NVMe, no SD card inserted, with no hang. Though, the unbind takes ~24s intermittently (sometimes <100ms, sometimes 20-50s) which is not ideal.

My current workaround is to just disable the interface entirely with a dtoverlay...but it would be nice to be able to still have an SD card work.

/dts-v1/;
/plugin/;

/ {
    compatible = "brcm,bcm2712";

    fragment@0 {
        target = <&sdio1>;
        __overlay__ {
            status = "disabled";
        };
    };
};

@nbuchwitz
Copy link
Contributor Author

nbuchwitz commented Mar 18, 2025

There is already an overlay for this (its called disable-wifi or wlan i think). But this shouldn't be necessary with the firmware update. Did you already update the eeprom on your cm5? If not: sudo rpi-eeprom-update -a

@hasan-akbulak
Copy link

still having this problem with the raspberry pi compute module 5 with linux 6.6.51 6.6.74 and 6.12.19 from rpi-update. I also have the latest eeprom with sudo rpi-eeprom-update -a. It works once after updating the linux version but after rebooting once it goes back to the same issue where it is stuck on watchdog0 or systemd halt when doing both reboot and halt. This is from a fresh install of raspberry pi os lite 64 bit from raspberry pi imager. Can anybody help me with this issue?

@popcornmix
Copy link
Collaborator

Report output of vcgencmd bootloader_version

@hasan-akbulak
Copy link

root@raspberrypi:~# vcgencmd bootloader_version
2025/03/19 13:41:26
version cec1d3ae40f4a1cb24fe3c42d60153968695385b (release)
timestamp 1742391686
update-time 1742896386
capabilities 0x0000007f

@popcornmix
Copy link
Collaborator

Okay, that should contain the fix referenced here.

@hasan-akbulak
Copy link

whenever i try to shutdown or reboot the pi it still has the same issue of stalling inbetween 20-50 seconds and dmesg still reports mmc errors even when the sd card is disabled in config.txt i dont know how to fix this issue and i have also tried multiple io boards with still the same issue

@pelwell
Copy link
Contributor

pelwell commented Mar 26, 2025

What does sudo vclog -m report?

@hasan-akbulak
Copy link

tc@raspberrypi:~ $ sudo vclog -m
005414.426: Initial voltage 800000 temp 42226
005614.834: avs_2712: AVS pred 8945 894500 temp 42226
005618.442: vpred 894 mV +0
005632.134: FB framebuffer_swap 1
005651.534: Select resolution HDMI0/2 hotplug 1 max_mode 2
005667.959: HDMI0 edid block 0 offset 0
005670.339: 00ffffffffffff00410c55c17e7d0000
005676.011: 2a1e010380351e782a0565a756529c27
005681.684: 0f5054bfef00d1c0b300950081808140
005687.357: 81c001010101023a801871382d40582c
005693.030: 45000f282100001e2a4480a070382740
005698.703: 302035000f282100001a000000fc0050
005704.376: 484c2032343356370a202020000000fd
005710.049: 00324c1e5311000a2020202020200115
005728.097: HDMI0 edid block 1 offset 128
005730.654: 02031ef14b101f051404130312021101
005736.327: 230907078301000065030c0010008c0a
005742.000: d08a20e02d10103e96000f2821000018
005747.673: 011d007251d01e206e2855000f282100
005753.346: 001e8c0ad08a20e02d10103e96000f28
005759.018: 210000188c0ad090204031200c405500
005764.691: 0f282100001800000000000000000000
005770.364: 000000000000000000000000000000cd
005776.055: HDMI0: best-mode 2 (limit 2) 1920x1080 60 Hz CEA modes 3e001f80000000000000000000000000 extensions 1
005787.649: Select resolution HDMI1/2 hotplug 0 max_mode 2
005794.571: FB0 disp 0 max-fb 2 1920x1080 stride 3840 base 0x3f800000
006127.100: dtb_file 'bcm2712-rpi-cm5l-cm5io.dtb'
006204.752: Loaded overlay 'bcm2712d0'
006301.854: dtparam: i2c_arm=on
006318.480: dtparam: audio=on
006324.419: Unknown dtparam 'audio' - ignored
006353.119: Loaded overlay 'audioinjector-isolated-soundcard'
006459.728: Loaded overlay 'vc4-kms-v3d-pi5'
006570.091: Loaded overlay 'dwc2'
006571.952: dtparam: dr_mode=peripheral
006577.367: dtparam: pciex1_gen=3
006591.870: dtparam: uart0_console=true
006645.601: Loaded overlay 'disable-bt-pi5'
006666.529: Loaded overlay 'disable-wifi-pi5'
006669.455: dtparam: i2c_vc=on
006685.972: dtparam: i2c_arm=on
006759.391: Loaded overlay 'vc4-kms-v3d-pi5'
006906.559: Loaded overlay 'vc4-kms-dsi-waveshare-panel'
006910.434: dtparam: 7_0_inchC=true
006916.606: dtparam: i2c1=true
006920.790: dtparam: sd_poll_once=true
006929.620: Unknown dtparam 'sd_poll_once' - ignored
006933.151: dtparam: fan_temp0=40000
006943.016: dtparam: fan_temp0_hyst=5000
006950.342: dtparam: fan_temp0_speed=70
006971.193: dtparam: fan_temp1=50000
006978.208: dtparam: fan_temp1_hyst=5000
006985.578: dtparam: fan_temp1_speed=120
007006.353: dtparam: fan_temp2=60000
007013.405: dtparam: fan_temp2_hyst=5000
007020.806: dtparam: fan_temp2_speed=150
007041.643: dtparam: fan_temp3=75000
007048.726: dtparam: fan_temp3_hyst=5000
007056.213: dtparam: fan_temp3_speed=255
007077.000: dtparam: sd=off
007083.077: Unknown dtparam 'sd' - ignored
007442.377: RPM 9052, max RPM 9052
009190.107: Starting OS 9190 ms
009195.631: 00000040: -> 00000480
009197.484: 00000030: -> 00100080
009202.196: 00000034: -> 00100080
009206.909: 00000038: -> 00100080
009211.622: 0000003c: -> 00100080
009321.194: sdram: sdram refresh 2081->4162 (2)
069314.739: initial_turbo of 60 deactivated

@pelwell
Copy link
Contributor

pelwell commented Mar 26, 2025

Thanks.

006645.601: Loaded overlay 'disable-bt-pi5'
006666.529: Loaded overlay 'disable-wifi-pi5'

These lines show that the firmware has detected your no-WiFi CM5 and disabled Bluetooth and WiFi (or at least attempted to).

The rest shows that you have several other overlays and parameters in there. Please remove them (or comment them out) for testing purposes.

@hasan-akbulak
Copy link

hasan-akbulak commented Mar 26, 2025

i am sorry these parameters were from a non fresh install let me do a fresh install to remove any extra variabels.

default settings everything i only did a sudo apt update and upgrade.

When rebooting the issue persists. Here is my sudo vclog -m:

005410.141: Initial voltage 800000 temp 43875
005610.558: avs_2712: AVS pred 8945 894500 temp 44424
005614.166: vpred 894 mV +0
005627.756: FB framebuffer_swap 1
005647.141: Select resolution HDMI0/2 hotplug 1 max_mode 2
005663.564: HDMI0 edid block 0 offset 0
005665.944: 00ffffffffffff00410c55c17e7d0000
005671.617: 2a1e010380351e782a0565a756529c27
005677.290: 0f5054bfef00d1c0b300950081808140
005682.962: 81c001010101023a801871382d40582c
005688.635: 45000f282100001e2a4480a070382740
005694.308: 302035000f282100001a000000fc0050
005699.981: 484c2032343356370a202020000000fd
005705.654: 00324c1e5311000a2020202020200115
005723.702: HDMI0 edid block 1 offset 128
005726.260: 02031ef14b101f051404130312021101
005731.932: 230907078301000065030c0010008c0a
005737.605: d08a20e02d10103e96000f2821000018
005743.278: 011d007251d01e206e2855000f282100
005748.951: 001e8c0ad08a20e02d10103e96000f28
005754.624: 210000188c0ad090204031200c405500
005760.297: 0f282100001800000000000000000000
005765.969: 000000000000000000000000000000cd
005771.660: HDMI0: best-mode 2 (limit 2) 1920x1080 60 Hz CEA modes 3e001f80000000000000000000000000 extensions 1
005783.255: Select resolution HDMI1/2 hotplug 0 max_mode 2
005790.175: FB0 disp 0 max-fb 2 1920x1080 stride 3840 base 0x3f800000
006495.363: dtb_file 'bcm2712-rpi-cm5l-cm5io.dtb'
006576.053: Loaded overlay 'bcm2712d0'
006673.594: dtparam: audio=on
006682.439: Unknown dtparam 'audio' - ignored
006736.419: Loaded overlay 'vc4-kms-v3d-pi5'
006848.267: Loaded overlay 'dwc2'
006850.127: dtparam: dr_mode=host
007157.316: RPM 7824, max RPM 7824
008912.678: Starting OS 8912 ms
008918.200: 00000040: -> 00000480
008920.053: 00000030: -> 00100080
008924.765: 00000034: -> 00100080
008929.478: 00000038: -> 00100080
008934.191: 0000003c: -> 00100080
009043.765: sdram: sdram refresh 2081->4162 (2)
069008.455: initial_turbo of 60 deactivated

This is my version of raspberry pi os lite 64 bit:

tc@raspberrypi:~ $ sudo uname -a
Linux raspberrypi 6.6.74+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.74-1+rpt1 (2025-01-27) aarch64 GNU/Linux

And here is my bootloader version:

tc@raspberrypi:~ $ vcgencmd bootloader_version
2025/03/19 13:41:26
version cec1d3ae40f4a1cb24fe3c42d60153968695385b (release)
timestamp 1742391686
update-time 1742896386
capabilities 0x0000007f`
fresh config.txt:

tc@raspberrypi:~ $ sudo nano /boot/firmware/config.txt
  GNU nano 7.2                              /boot/firmware/config.txt

camera_auto_detect=1

# Automatically load overlays for detected DSI displays
display_auto_detect=1

# Automatically load initramfs files, if found
auto_initramfs=1

# Enable DRM VC4 V3D driver
dtoverlay=vc4-kms-v3d
max_framebuffers=2

# Don't have the firmware create an initial video= setting in cmdline.txt.
# Use the kernel's default instead.
disable_fw_kms_setup=1

# Run in 64-bit mode
arm_64bit=1

# Disable compensation for displays with overscan
disable_overscan=1

# Run as fast as firmware / board allows
arm_boost=1

[cm4]
# Enable host mode on the 2711 built-in XHCI USB controller.
# This line should be removed if the legacy DWC2 controller is required
# (e.g. for USB device mode) or if USB support is not required.
otg_mode=1

[cm5]
dtoverlay=dwc2,dr_mode=host

[all]

still hanging after watchdog 0:

Image

1/10 times it reboots instantly but 9/10 times hanging between 20-50 secs. No SD card is inserted.
no idea what to do from here.

@hasan-akbulak
Copy link

sorry i don't know how to fix the layout issue to make it more readable i am quite new to github

@hasan-akbulak
Copy link

also just found this in dmesg:
'''
[ 7.107845] Bluetooth: hci0: command 0xfc18 tx timeout
[ 7.107856] Bluetooth: hci0: BCM: failed to write update baudrate (-110)
[ 7.107858] Bluetooth: hci0: Failed to set baudrate
[ 9.123848] Bluetooth: hci0: command 0xfc18 tx timeout
[ 9.123859] Bluetooth: hci0: BCM: Reset failed (-110)
'''
it isn't disabling bluetooth by default with my cm5 without wifi or bluetooth.

@pelwell
Copy link
Contributor

pelwell commented Mar 26, 2025

What do these commands report?

$ od -An -tx4 --endian=big  /proc/device-tree/chosen/rpi-boardrev-ext
$ grep -a . /proc/device-tree/soc@107c000000/serial@7d50c000/status
$ grep -a . /proc/device-tree/axi/mmc@1100000/status

@pelwell
Copy link
Contributor

pelwell commented Mar 26, 2025

[ I've added to the list of things to try ]

@hasan-akbulak
Copy link

hasan-akbulak commented Mar 26, 2025

tc@raspberrypi:~ $ od -An -tx4 --endian=big  /proc/device-tree/chosen/rpi-boardrev-ext
 c0000000
tc@raspberrypi:~ $ grep -a . /proc/device-tree/soc@107c000000/serial@7d50c000/status
grep: /proc/device-tree/soc@107c000000/serial@7d50c000/status: No such file or directory
tc@raspberrypi:~ $ grep -a . /proc/device-tree/axi/mmc@1100000/status
okay
tc@raspberrypi:/proc/device-tree $ ls
'#address-cells'   cam_dummy_reg   cpus               memory@0        pwr_button       '#size-cells'
 aliases           chosen          dummy              memreserve      reserved-memory   soc
 arm-pmu           clk-108M        hvs@107c580000     model           rp1_firmware      __symbols__
 axi               clk-27M         i2c0if             name            rp1_vdd_3v3       system
 cam0_clk          clocks          i2c0mux            __overrides__   sd_io_1v8_reg     thermal-zones
 cam0_reg          compatible      interrupt-parent   phy             sd_vcc_reg        timer
 cam1_clk          cooling_fan     leds               psci            serial-number     wl_on_reg
tc@raspberrypi:/proc/device-tree $
tc@raspberrypi:~ $ grep -a . /proc/device-tree/soc/serial@7d50c000/status
okay

Do you know how i can entirely disable mmc0 for now because i have an sd card slot which is nice but i am using the NVME drive for booting so i am currently not using the sd card slot. At least i think the problem is because of mmc0 because when an sd card is inserted the issue is gone but without one it is there.

also it isn't disabling wifi like the person that originally had the problem and updated the eeprom of the raspberry pi.

tc@raspberrypi:~ $ ls -d /sys/kernel/debug/mmc?
/sys/kernel/debug/mmc0  /sys/kernel/debug/mmc1
tc@raspberrypi:~ $ sudo rpi-eeprom-update
BOOTLOADER: up to date
   CURRENT: Mon 10 Mar 17:10:37 UTC 2025 (1741626637)
    LATEST: Mon 10 Mar 17:10:37 UTC 2025 (1741626637)
   RELEASE: default (/usr/lib/firmware/raspberrypi/bootloader-2712/default)
            Use raspi-config to change the release.

@hasan-akbulak
Copy link

hasan-akbulak commented Mar 26, 2025

when inserting an sd card dmesg shows this.

tc@raspberrypi:~ $ dmesg | grep mmc
[    3.117790] mmc1: CQHCI version 5.10
[    3.117855] mmc0: CQHCI version 5.10
[    3.164095] mmc0: SDHCI controller on 1000fff000.mmc [1000fff000.mmc] using ADMA 64-bit
[    3.274404] mmc0: new ultra high speed SDR104 SDXC card at address 0001
[    3.281482] mmcblk0: mmc0:0001 SD64G 58.2 GiB
[    3.286944]  mmcblk0:
[    3.289529] mmcblk0: mmc0:0001 SD64G 58.2 GiB
[    3.318350] mmc1: SDHCI controller on 1001100000.mmc [1001100000.mmc] using ADMA 64-bit

with this it reboots and shutdowns properly :/

when testing with it inserted when booting and removing before halting or rebooting in hangs again

tc@raspberrypi:~ $ dmesg | grep mmc
[    3.112370] mmc0: CQHCI version 5.10
[    3.116447] mmc1: CQHCI version 5.10
[    3.156154] mmc0: SDHCI controller on 1000fff000.mmc [1000fff000.mmc] using ADMA 64-bit
[    3.264926] mmc0: new ultra high speed SDR104 SDXC card at address 0001
[    3.272016] mmcblk0: mmc0:0001 SD64G 58.2 GiB
[    3.277525]  mmcblk0:
[    3.280112] mmcblk0: mmc0:0001 SD64G 58.2 GiB
[    3.310298] mmc1: SDHCI controller on 1001100000.mmc [1001100000.mmc] using ADMA 64-bit
[   18.627897] mmc0: card 0001 removed

when testing with the card removed when booting and adding the card while rebooting it also reboots and halts just fine.

also i noticed that it sometimes just hanged even with the sd card but after adding dtoverlay=disable-wifi it fixed that part of the issue but i still have issues with mmc0 i believe.

@pelwell
Copy link
Contributor

pelwell commented Mar 27, 2025

I see you've gone back and added a lot of important information to your comments - this isn't great, because we don't get any nofication that you've done so. The most significant addition is this one:

i am using the NVME drive for booting so i am currently not using the sd card slot. At least i think the problem is because of mmc0 because when an sd card is inserted the issue is gone but without one it is there.

It seems as though your issue isn't really about the WiFi and BT interfaces any more, but rather it's the normal SD card that must be present in order to guarantee a prompt reboot. This is a variant of the same problem - that SD/MMC probing can be slow to interrupt - but the difference is that the firmware doesn't know you won't be using an SD card and therefore doesn't (and currently can't) disable the SD interface.

This isn't an issue on Pi 5 because there is a card detect signal, but not so on Pi 5, so some other approach is required. It would be simple to add a dtparam or dtoverlay to disable the mmc0 SD interface - getting the SD interface to give up quicker when rebooting is likely to be more difficult.

@hasan-akbulak
Copy link

hasan-akbulak commented Mar 27, 2025

Yes sorry for editting my posts and now making new ones i was in the middle of testing and did not want to make 10 new posts under here. There should already exist a dt param to disable sd card but for me it didnt detect it. Could you guide me into disabling the SD card because it is non essential? Should i make a new issue on this topic or can we continue in this thread?

i also always notice the problem goes away after getting a fresh install or removing the wifi module in config.txt but after a maximum of two reboots just returns immediatly. The compute module 5 should also have an sd detect pin directly to the sd card slot so i am confused as to why this issue exists.

Also i am currently testing with the raspberry pi cm5io board and the waveshare cm5io POE board. The results are the same on both.

pelwell added a commit to pelwell/linux that referenced this issue Mar 27, 2025
The CM5 lacks a card detect signal, so it can be useful to be able
to disable the external SD card slot (or onboard EMMC on a non-Lite
board).

See: raspberrypi#6647

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
@pelwell
Copy link
Contributor

pelwell commented Mar 27, 2025

The new (to CM5) dtparams sd and sd_poll_once are added by #6744. After about 40 minutes (once the build checks have completed) you'll be able to install a trial 6.12 kernel including the new dtparams using sudo rpi-update pulls/6744.

You would test it with dtparam=sd=off.

@hasan-akbulak
Copy link

okay i will prepare a fresh drive and try it on it then. i will edit my post with it working or not.

@hasan-akbulak
Copy link

hasan-akbulak commented Mar 27, 2025

on a fresh install i keep getting this error after doing sudo apt update not upgrade and afterwards sudo rpi-update pulls/6744

tc@raspberrypi:~ $ sudo rpi-update pulls/6674
 *** Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom
 *** Performing self-update
 *** Relaunching after update
 *** Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom
FW_REV:
BOOTLOADER_REV:7f66ffe483698e788858fe51000217849fa6331f
 *** We're running for the first time
 *** Backing up files (this will take a few minutes)
 *** Remove old firmware backup
 *** Backing up firmware
 *** Remove old modules backup
 *** Backing up modules 6.6.51+rpt-rpi-2712
WANT_32BIT:0 WANT_64BIT:1 WANT_64BIT_RT:0 WANT_PI4:1 WANT_PI5:1
Downloading bootloader tools
Downloading bootloader images
 *** Downloading specific artifact revision (this will take a few minutes)
curl  -L https://builds.raspberrypi.com/github/linux/f0cbcb9f227ce41cfcb487239e764a80491d07a2/bcmrpi | zcat | tar xf - -C //root/.rpi-firmware --strip-components=2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 24.6M  100 24.6M    0     0   517k      0  0:00:48  0:00:48 --:--:--  616k
curl  -L https://builds.raspberrypi.com/github/linux/f0cbcb9f227ce41cfcb487239e764a80491d07a2/bcm2709 | zcat | tar xf - -C //root/.rpi-firmware --strip-components=2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 25.3M  100 25.3M    0     0   567k      0  0:00:45  0:00:45 --:--:--  681k
curl  -L https://builds.raspberrypi.com/github/linux/f0cbcb9f227ce41cfcb487239e764a80491d07a2/bcm2711 | zcat | tar xf - -C //root/.rpi-firmware --strip-components=2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 26.7M  100 26.7M    0     0   746k      0  0:00:36  0:00:36 --:--:--  750k
curl  -L https://builds.raspberrypi.com/github/linux/f0cbcb9f227ce41cfcb487239e764a80491d07a2/bcm2711_arm64 | zcat | tar xf - -C //root/.rpi-firmware --strip-components=2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 28.8M  100 28.8M    0     0   681k      0  0:00:43  0:00:43 --:--:--  698k
curl  -L https://builds.raspberrypi.com/github/linux/f0cbcb9f227ce41cfcb487239e764a80491d07a2/bcm2711_rt | zcat | tar xf - -C //root/.rpi-firmware --strip-components=2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

gzip: stdin: unexpected end of file
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
Invalid artifact specified. Response: 404.
tc@raspberrypi:~ $

edit: NVM noticed my mistake and put the wrong number after pulls/ i should just copy it instead of typing over

@hasan-akbulak
Copy link

okay after updating i am at kernel version 6.12.20-v8-16k+

tc@raspberrypi:~ $ uname -a
Linux raspberrypi 6.12.20-v8-16k+ #1 SMP PREEMPT Thu Mar 27 15:21:01 UTC 2025 aarch64 GNU/Linux
tc@raspberrypi:~ $ dmesg | grepmmc
-bash: grepmmc: command not found
tc@raspberrypi:~ $ dmesg | grep mmc
[    3.288762] mmc0: CQHCI version 5.10
[    3.292862] mmc1: CQHCI version 5.10
[    3.331210] mmc0: SDHCI controller on 1000fff000.mmc [1000fff000.mmc] using ADMA 64-bit
[    3.485741] mmc1: SDHCI controller on 1001100000.mmc [1001100000.mmc] using ADMA 64-bit
[   65.436676] mmc1: Failed to initialize a non-removable card

wifi is still not being disabled by default which i find weird.

i am also getting a weird bug that i had earlier when running regular sudo raspi-update where randomly when running certain commands like nano cpu load spikes to 25% from 0 and stays there and hangs my ssh connection.

also when booting it hangs at two spots for around 30 seconds

Image

Image

i did confirm that using dtparam=sd=off removed the mmc0 module and it reboots and shutdowns without hanging. However because of the instability and long boot time i cannot use this build currently for regular use.

tc@raspberrypi:~ $ dmesg | grep mmc
[    3.293632] mmc1: CQHCI version 5.10
[    3.483757] mmc1: SDHCI controller on 1001100000.mmc [1001100000.mmc] using ADMA 64-bit
[   55.213158] mmc1: Failed to initialize a non-removable card

If this issue can be fixed for the next stable branch, patch update, or how i can add it myself will be a huge help.

@hasan-akbulak
Copy link

hasan-akbulak commented Mar 27, 2025

it seems the hang after 3 seconds was fixed by disabling wifi and bluetooth. It still liked to randomly hang when using commands like dmesg and nano. when trying the command it hangs and cpu load goes to 25 percent. probably something corrupt with the drive after updating. that it is taking this long.

after idling for a while i get this in dmesg.

[  261.547233] nvme nvme0: I/O tag 662 (5296) opcode 0x1 (I/O Cmd) QID 2 timeout, aborting req_op:WRITE(1) size:8192
[  261.557722] nvme nvme0: Abort status: 0x0

it does have something to do with the boot drive after updating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants