Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang + NVPTX reports ptxas fatal : Unresolved extern function '__assert_fail' #125782

Open
clellsolomon opened this issue Feb 4, 2025 · 5 comments
Labels
clang Clang issues not falling into any other category cuda

Comments

@clellsolomon
Copy link

I'm getting the following error when compiling a file without -DNDEBUG:

ptxas fatal   : Unresolved extern function '__assert_fail'
clang++: error: ptxas command failed with exit code 255 (use -v to see invocation)
clang version 21.0.0git (https://github.com/llvm/llvm-project.git d13940ee263ff50b7a71e21424913cc0266bf9d4)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/projects/eap/users/csolomon/local/install/cos3/venado/clang-trunk/bin

I'm using CUDA 12.5 which i understand is marked only partially supported.

The file does compile when -DNDEBUG is defined on the compile line.

I cannot (currently) share the reproducer as the source is controlled. I've worked (briefly) at generating a reproducer without success.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Feb 4, 2025
@Artem-B
Copy link
Member

Artem-B commented Feb 5, 2025

__assert_fail() for the GPU side is provided by __clang_cuda_runtime_wrapper.h which is pre-included with every CUDA compilation done with clang:

__device__ static inline void __assert_fail(const char *__message,

Running a single GPU-side sub-compilation with --cuda-device-only -E -dD and checking what gets included or not will probably point you in the right direction towards figuring out why the function is missing.

Also, CUDA on ARM is something that is substantially less well tested. It works reasonably well with CUDA-12.8, but I can't tell what to expect with older CUDA versions.So, if you run into troubles with CUDA-12.5 try with a newer version.

@clellsolomon
Copy link
Author

Looking at the Preprocessed source file the compilation error generates, I can see that these lines seem to be getting included:

// __assertfail() used to have a `noreturn` attribute. Unfortunately that
// contributed to triggering the longstanding bug in ptxas when assert was used
// in sufficiently convoluted code. See
// https://bugs.llvm.org/show_bug.cgi?id=27738 for the details.
__device__ void __assertfail(const char *__message, const char *__file,
                             unsigned __line, const char *__function,
                             size_t __charSize);

// In order for standard assert() macro on linux to work we need to
// provide device-side __assert_fail()
__device__ static inline void __assert_fail(const char *__message,
                                            const char *__file, unsigned __line,
                                            const char *__function) {
  __assertfail(__message, __file, __line, __function, sizeof(char));
}

They are coming from the __clang_cuda_runtime_wrapper.h file.

@Artem-B
Copy link
Member

Artem-B commented Feb 6, 2025

I can see that these lines seem to be getting included:

Is that the verbatim output of compilation with "-E -dD" ? What exactly do you see for these two functions?

If you can reproduce your issue on godbolt.org that would help figuring out what's going on.

@clellsolomon
Copy link
Author

From using -D -dD I see this block

#undef __NV_TEX_SPARSE
# 389 "/usr/projects/eap/users/csolomon/local/install/cos3/venado/clang-trunk/lib/clang/21/include/__clang_cuda_runtime_wrapper.h" 2 3






#undef __CUDABE__
#define __CUDACC__ 

extern "C" {





__attribute__((device)) int vprintf(const char *, const char *);
__attribute__((device)) void free(void *) __attribute((nothrow));
__attribute__((device)) void *malloc(size_t) __attribute((nothrow)) __attribute__((malloc));





__attribute__((device)) void __assertfail(const char *__message, const char *__file,
                             unsigned __line, const char *__function,
                             size_t __charSize);



__attribute__((device)) static inline void __assert_fail(const char *__message,
                                            const char *__file, unsigned __line,
                                            const char *__function) {
  __assertfail(__message, __file, __line, __function, sizeof(char));
}



__attribute__((device)) int printf(const char *, ...);
}

coming from __clang_cuda_runtime_wrapper.h

I also see multiple places where <cassert> and ultimately assert.h is included which produces:

# 63 "/usr/include/assert.h" 3 4

extern "C" {


extern void __assert_fail (const char *__assertion, const char *__file,
      unsigned int __line, const char *__function)
     throw () __attribute__ ((__noreturn__));


extern void __assert_perror_fail (int __errnum, const char *__file,
      unsigned int __line, const char *__function)
     throw () __attribute__ ((__noreturn__));




extern void __assert (const char *__assertion, const char *__file, int __line)
     throw () __attribute__ ((__noreturn__));


}

Then I see multiple places where the assert macro is expanded, e.g.,

(static_cast <bool> (names.size() == space_dimensionality) ? void (0) : __assert_fail ("names.size() == space_dimensionality", "/users/csolomon/eap/users/csolomon/spack_instances/spack-develop_20230420-a639b22.4/opt/spack/linux-sles15-neoverse_n1/clang-trunk_gcc12-m/kokkos-4.3.01-qqgn6ik3i2vjr4upwn7zbs2ehtkpm7wb/include/Kokkos_Tuners.hpp", 339, __extension__ __PRETTY_FUNCTION__));

Note that I don't know that it is this specific instance that is causing the problem.

I cannot put the source into Compiler Explorer because it is controlled...I understand the complication this adds to helping resolve the issue.

@Artem-B
Copy link
Member

Artem-B commented Feb 6, 2025

So, the function itself does appear to be present in the source code. The question is -- why its definition does not make it to the PTX.

For what ti's worth, asserts are commonly used all over the place in CUDA code and are working. Most likely the issue is specific to your build. Unless you can reduce the resproducer to something you can make public, there's not much I can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category cuda
Projects
None yet
Development

No branches or pull requests

4 participants