Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction when importing onnxruntime #23957

Open
feiticeir0 opened this issue Mar 9, 2025 · 18 comments · May be fixed by #23978
Open

Illegal instruction when importing onnxruntime #23957

feiticeir0 opened this issue Mar 9, 2025 · 18 comments · May be fixed by #23978
Assignees
Labels
quantization issues related to quantization

Comments

@feiticeir0
Copy link

Describe the issue

In a fresh Raspberry PI OS Bookworm 64 bits install on a Raspberry PI 4, after installing onnxruntime using pip in a VirtualEnvironment, when importing it, i get:

(artStyle) pi@artsyStyle:~/artStyle $ python
Python 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onnxruntime
Illegal instruction

Here's the system specs:

Raspberry PI OS:

cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Python Version:

python -V
Python 3.11.2

ONNX Runtime version installed by pip:

onnxruntime-1.21.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl

To reproduce

  • Fresh Install Raspberry PI OS Bookworm 64 bits
  • Create a virtual environment with pip
  • Install onnxruntime
  • try to import it.

Urgency

I'm not saying it's urgent, but it's definitely an issue. Raspberry PIs are used at the edge all the time.

Platform

Linux

OS Version

Debian GNU/Linux 12 (bookworm)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-1.21.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@mikeesto
Copy link

mikeesto commented Mar 9, 2025

I ran into this too. Seems like a regression with 1.21.0 as 1.20.1 works well.

If it helps anyone, this is the strace:

futex(0x7f9a633764, FUTEX_WAKE_PRIVATE, 2147483647) = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x7f9a7028f4} ---
+++ killed by SIGILL +++
Illegal instruction

@feiticeir0
Copy link
Author

I ran into this too. Seems like a regression with 1.21.0 as 1.20.1 works well.

If it helps anyone, this is the strace:

futex(0x7f9a633764, FUTEX_WAKE_PRIVATE, 2147483647) = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x7f9a7028f4} ---
+++ killed by SIGILL +++
Illegal instruction

Thank you. The previous version works well !
No illegal instruction error .

@snnn
Copy link
Member

snnn commented Mar 10, 2025

I uploaded a debug version to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/onnxruntime/overview/1.21.0 , could you please help me test it?

python3 -m pip install coloredlogs flatbuffers numpy packaging protobuf sympy 
python3 -m pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ onnxruntime==1.21.0

Then when please run your program with gdb, and send me the stacktrace.

@snnn
Copy link
Member

snnn commented Mar 10, 2025

The new package is about 440MB large.

@snnn
Copy link
Member

snnn commented Mar 10, 2025

I have a Raspberry PI 4, but it needs an HDMI adapter. I search all over my house but couldn't find one.

@mikeesto
Copy link

No problem. Looks like the version you uploaded works well

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Debug (cpuinfo): parsed kernel_max value of 255 from /sys/devices/system/cpu/kernel_max
Debug (cpuinfo): system maximum processors count: 256
Debug (cpuinfo): maximum possible processors count: 4
Debug (cpuinfo): maximum present processors count: 4
Debug (cpuinfo): parsed /proc/cpuinfo Revision = "b03114"
Debug (cpuinfo): unknown /proc/cpuinfo key: Model
Debug (cpuinfo): parsed processor 0 MIDR 0x410fd083
Debug (cpuinfo): parsed processor 1 MIDR 0x410fd083
Debug (cpuinfo): parsed processor 2 MIDR 0x410fd083
Debug (cpuinfo): parsed processor 3 MIDR 0x410fd083
Warning in cpuinfo: chipset detection failed: /proc/cpuinfo Hardware string did not match known signatures
Warning in cpuinfo: No SVE support on this machine
Debug (cpuinfo): parsed max frequency value of 1800000 KHz for logical processor 0 from /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
Debug (cpuinfo): parsed min frequency value of 600000 KHz for logical processor 0 from /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq
Debug (cpuinfo): parsed package id value of 0 for logical processor 0 from /sys/devices/system/cpu/cpu0/topology/physical_package_id
Debug (cpuinfo): parsed max frequency value of 1800000 KHz for logical processor 1 from /sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq
Debug (cpuinfo): parsed min frequency value of 600000 KHz for logical processor 1 from /sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_min_freq
Debug (cpuinfo): parsed package id value of 0 for logical processor 1 from /sys/devices/system/cpu/cpu1/topology/physical_package_id
Debug (cpuinfo): parsed max frequency value of 1800000 KHz for logical processor 2 from /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_max_freq
Debug (cpuinfo): parsed min frequency value of 600000 KHz for logical processor 2 from /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_min_freq
Debug (cpuinfo): parsed package id value of 0 for logical processor 2 from /sys/devices/system/cpu/cpu2/topology/physical_package_id
Debug (cpuinfo): parsed max frequency value of 1800000 KHz for logical processor 3 from /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_max_freq
Debug (cpuinfo): parsed min frequency value of 600000 KHz for logical processor 3 from /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_min_freq
Debug (cpuinfo): parsed package id value of 0 for logical processor 3 from /sys/devices/system/cpu/cpu3/topology/physical_package_id
Debug (cpuinfo): processor 0 clustered with processor 0 as inferred from system siblings lists
Debug (cpuinfo): processor 1 clustered with processor 1 as inferred from system siblings lists
Debug (cpuinfo): processor 2 clustered with processor 2 as inferred from system siblings lists
Debug (cpuinfo): processor 3 clustered with processor 3 as inferred from system siblings lists
Debug (cpuinfo): detected 4 core clusters
Debug (cpuinfo): post-analysis processor 0: MIDR 410fd083 frequency 1800000
Debug (cpuinfo): post-analysis processor 1: MIDR 410fd083 frequency 1800000
Debug (cpuinfo): post-analysis processor 2: MIDR 410fd083 frequency 1800000
Debug (cpuinfo): post-analysis processor 3: MIDR 410fd083 frequency 1800000
Debug (cpuinfo): post-sort processor 0: system id 3 MIDR 410fd083 frequency 1800000
Debug (cpuinfo): post-sort processor 1: system id 2 MIDR 410fd083 frequency 1800000
Debug (cpuinfo): post-sort processor 2: system id 1 MIDR 410fd083 frequency 1800000
Debug (cpuinfo): post-sort processor 3: system id 0 MIDR 410fd083 frequency 1800000
[New Thread 0x7fe9aef180 (LWP 2129)]
[New Thread 0x7fe92df180 (LWP 2130)]
[New Thread 0x7fe8acf180 (LWP 2131)]
[Thread 0x7fe8acf180 (LWP 2131) exited]
[Thread 0x7fe92df180 (LWP 2130) exited]
[Thread 0x7fe9aef180 (LWP 2129) exited]
[Inferior 1 (process 2125) exited normally]
(gdb)

@snnn
Copy link
Member

snnn commented Mar 10, 2025

I found a device that can reproduce the issue.

@snnn
Copy link
Member

snnn commented Mar 10, 2025

Here is the stacktrace:

#0  0x0000ffffe9073df4 in _sub_I_65535_0.0 ()
   from /home/chasun/.local/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-311-aarch64-linux-gnu.so
#1  0x0000fffff7fc2918 in call_init (env=0xfffffffff3a0, argv=0xfffffffff388, argc=2, l=<optimized out>)
    at dl-init.c:70
#2  call_init (l=<optimized out>, argc=2, argv=0xfffffffff388, env=0xfffffffff3a0) at dl-init.c:26
#3  0x0000fffff7fc2a1c in _dl_init (main_map=0xaaaaaab2c780, argc=2, argv=0xfffffffff388, env=0xfffffffff3a0)
    at dl-init.c:117
#4  0x0000fffff79d53e0 in _dl_catch_exception () from /lib64/libc.so.6
#5  0x0000fffff7fc8ad0 in dl_open_worker (a=0xffffffffd378) at dl-open.c:808
#6  0x0000fffff79d5384 in _dl_catch_exception () from /lib64/libc.so.6
#7  0x0000fffff7fc8e90 in _dl_open (
    file=0xffffea0ab1b0 "/home/chasun/.local/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-311-aarch64-linux-gnu.so", mode=-2147483646, caller_dlopen=0xfffff7caacc4 <_PyImport_FindSharedFuncptr+148>,
    nsid=-2, argc=2, argv=0xfffffffff388, env=0xfffffffff3a0) at dl-open.c:884
#8  0x0000fffff7920cd8 in dlopen_doit () from /lib64/libc.so.6
#9  0x0000fffff79d5384 in _dl_catch_exception () from /lib64/libc.so.6
#10 0x0000fffff79d5454 in _dl_catch_error () from /lib64/libc.so.6
#11 0x0000fffff7920704 in _dlerror_run () from /lib64/libc.so.6
#12 0x0000fffff7920dcc in dlopen@GLIBC_2.17 () from /lib64/libc.so.6
#13 0x0000fffff7caacc4 in _PyImport_FindSharedFuncptr () from /lib64/libpython3.11.so.1.0
#14 0x0000fffff7ca8ac0 in _imp_create_dynamic () from /lib64/libpython3.11.so.1.0
#15 0x0000fffff7be6d8c in cfunction_vectorcall_FASTCALL () from /lib64/libpython3.11.so.1.0
#16 0x0000fffff7bca968 in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#17 0x0000fffff7bc085c in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#18 0x0000fffff7bdf57c in object_vacall () from /lib64/libpython3.11.so.1.0
#19 0x0000fffff7c0d77c in PyObject_CallMethodObjArgs () from /lib64/libpython3.11.so.1.0
#20 0x0000fffff7c0a1f0 in PyImport_ImportModuleLevelObject () from /lib64/libpython3.11.so.1.0
#21 0x0000fffff7bcbd80 in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#22 0x0000fffff7bc085c in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#23 0x0000fffff7c6755c in PyEval_EvalCode () from /lib64/libpython3.11.so.1.0
#24 0x0000fffff7c8a970 in builtin_exec () from /lib64/libpython3.11.so.1.0
#25 0x0000fffff7bd66a4 in cfunction_vectorcall_FASTCALL_KEYWORDS () from /lib64/libpython3.11.so.1.0
#26 0x0000fffff7bca968 in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#27 0x0000fffff7bc085c in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#28 0x0000fffff7bdf57c in object_vacall () from /lib64/libpython3.11.so.1.0
#29 0x0000fffff7c0d77c in PyObject_CallMethodObjArgs () from /lib64/libpython3.11.so.1.0
#30 0x0000fffff7c0a1f0 in PyImport_ImportModuleLevelObject () from /lib64/libpython3.11.so.1.0
#31 0x0000fffff7bcbd80 in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#32 0x0000fffff7bc085c in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#33 0x0000fffff7c6755c in PyEval_EvalCode () from /lib64/libpython3.11.so.1.0
#34 0x0000fffff7c8a970 in builtin_exec () from /lib64/libpython3.11.so.1.0
#35 0x0000fffff7bd66a4 in cfunction_vectorcall_FASTCALL_KEYWORDS () from /lib64/libpython3.11.so.1.0
#36 0x0000fffff7bca968 in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#37 0x0000fffff7bc085c in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#38 0x0000fffff7bdf57c in object_vacall () from /lib64/libpython3.11.so.1.0
#39 0x0000fffff7c0d77c in PyObject_CallMethodObjArgs () from /lib64/libpython3.11.so.1.0
#40 0x0000fffff7c0a1f0 in PyImport_ImportModuleLevelObject () from /lib64/libpython3.11.so.1.0
#41 0x0000fffff7bcbd80 in _PyEval_EvalFrameDefault () from /lib64/libpython3.11.so.1.0
#42 0x0000fffff7bc085c in _PyEval_Vector () from /lib64/libpython3.11.so.1.0
#43 0x0000fffff7c6755c in PyEval_EvalCode () from /lib64/libpython3.11.so.1.0
#44 0x0000fffff7c940e0 in run_eval_code_obj () from /lib64/libpython3.11.so.1.0
#45 0x0000fffff7c8eb88 in run_mod () from /lib64/libpython3.11.so.1.0
#46 0x0000fffff7cad5c0 in pyrun_file () from /lib64/libpython3.11.so.1.0
#47 0x0000fffff7cac800 in _PyRun_SimpleFileObject () from /lib64/libpython3.11.so.1.0
#48 0x0000fffff7cac334 in _PyRun_AnyFileObject () from /lib64/libpython3.11.so.1.0
#49 0x0000fffff7ca38c8 in Py_RunMain () from /lib64/libpython3.11.so.1.0
#50 0x0000fffff7c54240 in Py_BytesMain () from /lib64/libpython3.11.so.1.0

@snnn
Copy link
Member

snnn commented Mar 10, 2025

It's a fp16 instruction I believe

@snnn
Copy link
Member

snnn commented Mar 10, 2025

Most likely it was because of PR #23597

@jywu-msft
Copy link
Member

@fajin-corp

@fajin-corp
Copy link
Contributor

here is the fix #23978

@feiticeir0
Copy link
Author

I uploaded a debug version to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/onnxruntime/overview/1.21.0 , could you please help me test it?

python3 -m pip install coloredlogs flatbuffers numpy packaging protobuf sympy
python3 -m pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ onnxruntime==1.21.0

Then when please run your program with gdb, and send me the stacktrace.

I'm sorry for the late reply. I missed these emails from github somehow.
It's no longer necessary to try it, I'm assuming...

@snnn
Copy link
Member

snnn commented Mar 11, 2025

I updated a new version to the nightly feed:
https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/onnxruntime/overview/1.22.0.dev20250310006

The version number is 1.22.0.dev20250310006. You may install it by using the following command:

python3 -m pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ onnxruntime==1.22.0.dev20250310006

The package was built from @fajin-corp 's dev branch with the bug fix. @feiticeir0 / @mikeesto , would you please help verify it?

@feiticeir0
Copy link
Author

I updated a new version to the nightly feed: https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/onnxruntime/overview/1.22.0.dev20250310006

The version number is 1.22.0.dev20250310006. You may install it by using the following command:

python3 -m pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ onnxruntime==1.22.0.dev20250310006

The package was built from @fajin-corp 's dev branch with the bug fix. @feiticeir0 / @mikeesto , would you please help verify it?

I tried to install it, but it asks for a user for a repository when trying to install numpy (I know because I've canceled it and it's the error):

Image

Image

@snnn
Copy link
Member

snnn commented Mar 11, 2025

Please run

python3 -m pip install coloredlogs flatbuffers numpy packaging protobuf sympy 

before running that installation command

@feiticeir0
Copy link
Author

feiticeir0 commented Mar 12, 2025

Please run

python3 -m pip install coloredlogs flatbuffers numpy packaging protobuf sympy 

before running that installation command

It works. I can import it without any errors.

@snnn
Copy link
Member

snnn commented Mar 12, 2025

Thanks

@snnn snnn added the quantization issues related to quantization label Mar 12, 2025
NeonDaniel added a commit to NeonGeckoCom/NeonCore that referenced this issue Mar 12, 2025
# Description
Update core_modules to validate latest alpha version compat.

# Issues
Blacklists an onnxruntime version to mitigate
microsoft/onnxruntime#23957

# Other Notes
Based on [this run from an old tag
failing](https://github.com/NeonGeckoCom/NeonCore/actions/runs/13816049514/job/38649389777),
it appears that the failures are related to some change in GitHub
Actions runners or some low-level dependency change.

Cause for Pi image failures is unknown, but updating to a newer CPU arch
in the shared action appears to resolve the issue. Note that the
previously spec'd A53 was used in the RPi3 and the now spec'd A76 is
used in the RPi5

Issue was traced back to a new onnxruntime release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants