Skip to content

Conversation

@anutosh491
Copy link
Collaborator

Description

Discovered this while moving to emscripten 4-x for xeus-cpp.

Problem : The following works for our native kernel but not our wasm kernel

  1. native
image 2) wasm image

Type of change

Please tick all options which are relevant.

  • Bug fix
  • New feature
  • Added/removed dependencies
  • Required documentation updates

@anutosh491
Copy link
Collaborator Author

anutosh491 commented Nov 11, 2025

What's Happening here ?

So on native builds, clang automatically discovers <xlocale.h> through the system SDK or libc include paths.
In the Emscripten sysroot, however, legacy headers like <xlocale.h> reside under /include/compat, which isn’t part of the default search list.

How emcc does it

anutosh491@Anutoshs-MacBook-Air xeus-cpp % cat xlocale_test.c
#include <stdio.h>
#include <locale.h>
#include <xlocale.h>

int main(void) {
    locale_t loc = newlocale(LC_NUMERIC_MASK, "C", NULL);

    const char *s = "3.14159";
    long double v = strtold_l(s, NULL, loc);

    printf("Parsed value: %.5Lf\n", v);

    freelocale(loc);
    return 0;
}

anutosh491@Anutoshs-MacBook-Air xeus-cpp % emcc -v xlocale_test.c -o xlocale_test.html
 /Users/anutosh491/work/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/Users/anutosh491/work/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -v -c xlocale_test.c -o /var/folders/m1/cdn74f917994jd99d_2cpf440000gn/T/emscripten_temp_v4f2uvqa/xlocale_test.o
clang version 22.0.0git (https:/github.com/llvm/llvm-project c13ac9cadf1f9b4fa886b82d1e84a5feb0439023)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /Users/anutosh491/work/emsdk/upstream/bin
 (in-process)
 "/Users/anutosh491/work/emsdk/upstream/bin/clang-22" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name xlocale_test.c -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -fvisibility=hidden -debugger-tuning=gdb -fdebug-compilation-dir=/Users/anutosh491/work/xeus-cpp -v -fcoverage-compilation-dir=/Users/anutosh491/work/xeus-cpp -resource-dir /Users/anutosh491/work/emsdk/upstream/lib/clang/22 -D EMSCRIPTEN -isysroot /Users/anutosh491/work/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /Users/anutosh491/work/emsdk/upstream/lib/clang/22/include -internal-isystem /Users/anutosh491/work/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten -internal-isystem /Users/anutosh491/work/emsdk/upstream/emscripten/cache/sysroot/include -ferror-limit 19 -fmessage-length=141 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fignore-exceptions -fcolor-diagnostics -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /var/folders/m1/cdn74f917994jd99d_2cpf440000gn/T/emscripten_temp_v4f2uvqa/xlocale_test.o -x c xlocale_test.c
clang -cc1 version 22.0.0git based upon LLVM 22.0.0git default target 
ignoring nonexistent directory "/Users/anutosh491/work/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten"
#include "..." search starts here:
#include <...> search starts here:
 /Users/anutosh491/work/emsdk/upstream/emscripten/cache/sysroot/include/compat
 /Users/anutosh491/work/emsdk/upstream/lib/clang/22/include
 /Users/anutosh491/work/emsdk/upstream/emscripten/cache/sysroot/include
End of search list.

This shows 3 paths
i) sysroot/include and lib/clang/22/include (resource dir) which we address as of now
ii) What we don't address is sysroot/include/compat !!!!

As can be seen this is addressed through -iwithsysroot/include/compat as can be seen above and here

This PR adds the same (check right side of the img below as now we're able to search through compat for xlocale.h)

image

@codecov-commenter
Copy link

codecov-commenter commented Nov 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.05%. Comparing base (5860e95) to head (25b61b9).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #409      +/-   ##
==========================================
- Coverage   82.40%   82.05%   -0.35%     
==========================================
  Files          21       21              
  Lines         858      858              
  Branches       89       89              
==========================================
- Hits          707      704       -3     
- Misses        151      154       +3     
Files with missing lines Coverage Δ
src/xinterpreter.cpp 90.85% <ø> (ø)

... and 1 file with indirect coverage changes

Files with missing lines Coverage Δ
src/xinterpreter.cpp 90.85% <ø> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@anutosh491
Copy link
Collaborator Author

anutosh491 commented Nov 11, 2025

cc @mcbarton

The same change would be needed in cppinterop to run the emscripten tests (cause iostream and other stream based headers from emsdk 4-x started including xlocale.h) once we migrate.

Otherwise some tests would just fail through

1: [ RUN      ] CppInterOpTest/InProcessJIT.ScopeReflectionTestIsBuiltin
1: In file included from <<< inputs >>>:1:
1: In file included from input_line_2:1:
1: In file included from /include/c++/v1/complex:273:
1: In file included from /include/c++/v1/sstream:323:
1: In file included from /include/c++/v1/__ostream/basic_ostream.h:20:
1: In file included from /include/c++/v1/__ostream/put_character_sequence.h:19:
1: In file included from /include/c++/v1/__locale_dir/pad_and_output.h:16:
1: In file included from /include/c++/v1/ios:223:
1: In file included from /include/c++/v1/__locale:17:
1: /include/c++/v1/__locale_dir/locale_base_api.h:131:16: fatal error: 'xlocale.h' file not found
1:   131 | #      include <xlocale.h>
1:       |                ^~~~~~~~~~~
1: wasm://wasm/13cfc366:1

I just ran the cppinterop tests (after building both cppinterop and llvm against latest emsdk 4-x) and this is where I caught the above.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mcbarton
Copy link
Collaborator

cc @mcbarton

The same change would be needed in cppinterop to run the emscripten tests (cause iostream from emsdk 4-x started including xlocale.h) once we migrate.

Otherwise some tests would just fail through

1: [ RUN      ] CppInterOpTest/InProcessJIT.ScopeReflectionTestIsBuiltin
1: In file included from <<< inputs >>>:1:
1: In file included from input_line_2:1:
1: In file included from /include/c++/v1/complex:273:
1: In file included from /include/c++/v1/sstream:323:
1: In file included from /include/c++/v1/__ostream/basic_ostream.h:20:
1: In file included from /include/c++/v1/__ostream/put_character_sequence.h:19:
1: In file included from /include/c++/v1/__locale_dir/pad_and_output.h:16:
1: In file included from /include/c++/v1/ios:223:
1: In file included from /include/c++/v1/__locale:17:
1: /include/c++/v1/__locale_dir/locale_base_api.h:131:16: fatal error: 'xlocale.h' file not found
1:   131 | #      include <xlocale.h>
1:       |                ^~~~~~~~~~~
1: wasm://wasm/13cfc366:1

I just ran the cppinterop tests (after building both cppinterop and llvm against latest emsdk 4-x) and this is where I caught the above.

Thanks the tip @anutosh491 . I will look into this. Did you run the llvm Emscripten tests to see if a change is needed to get them to still pass too?

@anutosh491
Copy link
Collaborator Author

Did you run the llvm Emscripten tests to see if a change is needed to get them to still pass too?

Not yet !

But we probably won't need any change there as this is the only real change I could spot. The llvm emscripten tests don't really process any "#Include" 's as such which means no include path searching change !


#if defined(XEUS_CPP_EMSCRIPTEN_WASM_BUILD)
TEST_CASE("headers found in sysroot/include/compat")
{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a simple test confirming we can access headers from the location of interest. Technically only need for the wasm build cause its straightforward for the native kernel (so probably not needed)

But the cc1 command as shown in our tests now cover all 3 paths that emcc covers as I've pointed above

clang version 20.1.8 (https://github.com/emscripten-forge/recipes 26347a2820b7a44b0329d3b9bd524c56ff3cdce7)
Failed to detect the resource-dir
exec: : Function not implemented
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: 
 "" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name "<<< inputs >>>" -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/ -v -fcoverage-compilation-dir=/ -resource-dir /home/runner/micromamba/envs/xeus-cpp-wasm-host -internal-isystem /include/wasm32-emscripten/c++/v1 -internal-isystem /include/c++/v1 -internal-isystem /home/runner/micromamba/envs/xeus-cpp-wasm-host/include -internal-isystem /include/wasm32-emscripten -internal-isystem /include -std=c++14 -fdeprecated-macro -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -iwithsysroot/include/compat -fincremental-extensions -o "<<< inputs >>>.o" -x c++ "<<< inputs >>>"
clang -cc1 version 20.1.8 based upon LLVM 20.1.8 default target wasm32-unknown-emscripten
ignoring nonexistent directory "/include/wasm32-emscripten/c++/v1"
ignoring nonexistent directory "/home/runner/micromamba/envs/xeus-cpp-wasm-host/include"
ignoring nonexistent directory "/include/wasm32-emscripten"
#include "..." search starts here:
#include <...> search starts here:
 /include/compat
 /include/c++/v1
 /include
End of search list.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"


#ifdef __EMSCRIPTEN__
ClangArgs.push_back("-Xclang");
ClangArgs.push_back("-iwithsysroot/include/compat");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that need to go to the emscripten toolchain file in clang ideally?

Copy link
Collaborator Author

@anutosh491 anutosh491 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure (possibly no ?)

Because the compat/ headers are added by emcc via -Xclang -iwithsysroot/include/compat only at the tool level at the very end https://github.com/emscripten-core/emscripten/blob/3622274db64d8ce706690197d1c24cd02715d937/tools/compile.py#L144

The upstream Clang WebAssembly toolchain ( Webassembly.cpp looks target-generic atleast for the path based utilties (for eg AddClangSystemIncludeArgs, addLibCxxIncludePaths) and won’t encode Emscripten SDK layout quirks like include/compat just yet

But I do see emscripten specific exception handling (for eg) so maybe we can have emscripten specific stuff here ?!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking a bit more I think no

  1. The emscripten specifc exception handling (-enable-emscripten-*) flags I mentioned are present so the compiler backend can reproduce Emscripten’s codegen semantics.
  2. The file does not have a reason to know that “Emscripten’s sysroot layout has an /include/compat folder.” So Sysroot layout quirks like /include/compat belong in the Emscripten wrapper (emcc) or, for direct Clang users, must be added manually.

Something like

# Path to the Emscripten sysroot from the SDK
SYSROOT=$HOME/work/emsdk/upstream/emscripten/cache/sysroot

/path/to/your/clang \
  --target=wasm32-unknown-emscripten \
  -isysroot $SYSROOT \
  -Xclang -iwithsysroot/include/compat \
  -c a.cpp -o a.o

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that include path is something that is required to compile emscripten code this should be part of the clang emscripten toolchain file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yeah that's what I was pointing out that technically in the toolchains folder we don't see an "emscripten" based file but more of a "webassembly" target-generic file

https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/WebAssembly.cpp

So not sure this can handle emscripten specifics but let me ask around. Sam Clegg from emscripten should be able to help me out here.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants