Skip to content

Conversation

@ycsin
Copy link
Member

@ycsin ycsin commented Nov 14, 2025

Add support for stacktrace in dummy thread which is used to run
the early system initialization code before the kernel switches
to the main thread.

On RISC-V, the dummy thread will be running temporarily on the
interrupt stack, but currently we do not initialize the stack
info for the dummy thread, hence check the address against the
interrupt stack.


before:

ASSERTION FAIL [!k_is_pre_kernel()] @ ZEPHYR_BASE/include/zephyr/kernel.h:736
       k_current_get called pre-kernel
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os:  mcause: 11, Environment call from M-mode
[0101_000000.000] <err> os:   mtval: 0
[0101_000000.000] <err> os:      a0: 0000000000000000    t0: 0000000000000000
[0101_000000.000] <err> os:      a1: 0000000000000000    t1: 0000000000000000
[0101_000000.000] <err> os:      a2: 0000000000000000    t2: 0000000000000000
[0101_000000.000] <err> os:      a3: 0000000000000000    t3: 0000000000000000
[0101_000000.000] <err> os:      a4: 0000000000000000    t4: 0000000000000000
[0101_000000.000] <err> os:      a5: 0000000000000000    t5: 0000000000000000
[0101_000000.000] <err> os:      a6: 0000000000000000    t6: 0000000000000000
[0101_000000.000] <err> os:      a7: 0000000000000000
[0101_000000.000] <err> os:      sp: 0000000000000000
[0101_000000.000] <err> os:      ra: 0000000000000000
[0101_000000.000] <err> os:    mepc: 0000000000000000
[0101_000000.000] <err> os: mstatus: 0000000000000000
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os:      s0: 0000000000000000    s6: 0000000000000000
[0101_000000.000] <err> os:      s1: 0000000000000000    s7: 0000000000000000
[0101_000000.000] <err> os:      s2: 0000000000000000    s8: 0000000000000000
[0101_000000.000] <err> os:      s3: 0000000000000000    s9: 0000000000000000
[0101_000000.000] <err> os:      s4: 0000000000000000   s10: 0000000000000000
[0101_000000.000] <err> os:      s5: 0000000000000000   s11: 0000000000000000
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os: call trace:
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[0101_000000.000] <err> os: Current thread: 0x00000000 (unknown)
[0101_000000.000] <err> os:  mcause: 11, Environment call from M-mode
[0101_000000.000] <err> os: mdcause: 0, unknown
[0101_000000.000] <err> os:   mtval: 0
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os: Halting system

after:

ASSERTION FAIL [!k_is_pre_kernel()] @ ZEPHYR_BASE/include/zephyr/kernel.h:736
       k_current_get called pre-kernel
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os:  mcause: 11, Environment call from M-mode
[0101_000000.000] <err> os:   mtval: 0
[0101_000000.000] <err> os:      a0: 0000000000000000    t0: 0000000000000000
[0101_000000.000] <err> os:      a1: 0000000000000000    t1: 0000000000000000
[0101_000000.000] <err> os:      a2: 0000000000000000    t2: 0000000000000000
[0101_000000.000] <err> os:      a3: 0000000000000000    t3: 0000000000000000
[0101_000000.000] <err> os:      a4: 0000000000000000    t4: 0000000000000000
[0101_000000.000] <err> os:      a5: 0000000000000000    t5: 0000000000000000
[0101_000000.000] <err> os:      a6: 0000000000000000    t6: 0000000000000000
[0101_000000.000] <err> os:      a7: 0000000000000000
[0101_000000.000] <err> os:      sp: 0000000000000000
[0101_000000.000] <err> os:      ra: 0000000000000000
[0101_000000.000] <err> os:    mepc: 0000000000000000
[0101_000000.000] <err> os: mstatus: 0000000000000000
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os:      s0: 0000000000000000    s6: 0000000000000000
[0101_000000.000] <err> os:      s1: 0000000000000000    s7: 0000000000000000
[0101_000000.000] <err> os:      s2: 0000000000000000    s8: 0000000000000000
[0101_000000.000] <err> os:      s3: 0000000000000000    s9: 0000000000000000
[0101_000000.000] <err> os:      s4: 0000000000000000   s10: 0000000000000000
[0101_000000.000] <err> os:      s5: 0000000000000000   s11: 0000000000000000
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os: call trace:
[0101_000000.000] <err> os:       0: fp: 0000000000000000 ra: 0000000000000000 [assert_post_action+0x10]
[0101_000000.000] <err> os:       1: fp: 0000000000000000 ra: 0000000000000000 [k_current_get+0x4e]
[0101_000000.000] <err> os:       2: fp: 0000000000000000 ra: 0000000000000000 [k_current_get+0x4e]
[0101_000000.000] <err> os:       3: fp: 0000000000000000 ra: 0000000000000000 [sys_trace_thread_switched_out_user+0xc]
[0101_000000.000] <err> os:       4: fp: 0000000000000000 ra: 0000000000000000 [z_cstart+0x1e0]
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[0101_000000.000] <err> os: Current thread: 0x00000000 (unknown)
[0101_000000.000] <err> os:  mcause: 11, Environment call from M-mode
[0101_000000.000] <err> os: mdcause: 0, unknown
[0101_000000.000] <err> os:   mtval: 0
[0101_000000.000] <err> os: 
[0101_000000.000] <err> os: Halting system

@ycsin ycsin marked this pull request as ready for review November 14, 2025 18:12
@zephyrbot zephyrbot added the area: RISCV RISCV Architecture (32-bit & 64-bit) label Nov 14, 2025
Copy link

@npitre npitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use is_thread_dummy(thread) instead.

Yet... this is legitimate only at boot time. Otherwise, after boot,
the dummy thread is masquerading the real thread that is dying andusing
IRQ stack isn't right in that case.

Why not simply initializing _thread_dummy.stack_info.start and
_thread_dummy.stack_info.size after the call to z_dummy_thread_init()
in kernel/init.c ?

@ycsin
Copy link
Member Author

ycsin commented Nov 16, 2025

You might want to use is_thread_dummy(thread) instead.

Didn't know that function exists, removed the compiler guards so that it is always available.

Yet... this is legitimate only at boot time. Otherwise, after boot,
the dummy thread is masquerading the real thread that is dying and using
IRQ stack isn't right in that case.

Yeah, this PR is basically special handling the dummy thread during init, my understanding is that once we jump to main thread, the dummy thread is dead and basically not used anymore so this patch should be fine?

Why not simply initializing _thread_dummy.stack_info.start and _thread_dummy.stack_info.size after the call to z_dummy_thread_init() in kernel/init.c ?

Each architecture has different names for the stack they used for the dummy thread, I find handling that with #ifdefs in z_dummy_thread_init() kinda messy unless I create new inline functions that every arch has to implement (i.e. z_dummy_thread_stack_init(uintptr_t start, size_t size) which seems overkill, so in the end I decided to take this shortcut instead

Since these helper functions are read-only, mark the `thread`
arg as `const` so that we can pass const thread to it without
triggering warnings.

Signed-off-by: Yong Cong Sin <[email protected]>
Signed-off-by: Yong Cong Sin <[email protected]>
@ycsin ycsin force-pushed the pr/riscv-stacktrace-dummy branch from ac8d2a4 to 4dc1f2d Compare November 16, 2025 18:07
@ycsin ycsin requested a review from npitre November 16, 2025 18:08
@npitre
Copy link

npitre commented Nov 17, 2025 via email

Add support for stacktrace in dummy thread which is used to run
the early system initialization code before the kernel switches
to the main thread.

On RISC-V, the dummy thread will be running temporarily on the
interrupt stack, but currently we do not initialize the stack
info for the dummy thread, hence check the address against the
interrupt stack.

Signed-off-by: Yong Cong Sin <[email protected]>
Signed-off-by: Yong Cong Sin <[email protected]>
@ycsin ycsin force-pushed the pr/riscv-stacktrace-dummy branch from 4dc1f2d to ee58a56 Compare November 17, 2025 17:38
@ycsin ycsin changed the title arch: riscv: stacktrace: support stacktrace in dummy thread arch: riscv: stacktrace: support stacktrace in early system init Nov 17, 2025
@ycsin
Copy link
Member Author

ycsin commented Nov 17, 2025

Then it would be best if you made it explicit in the code with a comment.

Updated the patch with comment and updated the commit message, hopefully that makes the intent of the code clear.
Thanks for the suggestions!

@sonarqubecloud
Copy link

@nashif nashif merged commit 3c5807f into zephyrproject-rtos:main Nov 18, 2025
34 checks passed
@ycsin ycsin deleted the pr/riscv-stacktrace-dummy branch November 19, 2025 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Kernel area: RISCV RISCV Architecture (32-bit & 64-bit)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants