- Install dependencies
- Building
- Running tests
- Formatting
- IDE integration
- Debugging
- Implementation notes
- Notes on Clang internals
- Notes on Windows
- Bazelisk: This handles Bazel versions transparently.
Bazel manages the C++ toolchain and other tool dependencies like formatters, so they don't need to be downloaded separately. (For unclear reasons, Bazel still requires a host toolchain to be present for configuring something but it will not be used for building the code in this project.)
(The dev
config is for local development.)
# macOS
bazel build //... --spawn_strategy=local --config=dev
# Linux
bazel build //... --config=dev
The indexer binary will be placed at bazel-bin/indexer/scip-clang
.
On macOS, --spawn_strategy=local
provides a dramatic improvement
in incremental build times (~10x) and is highly recommended.
If you are more paranoid, instead use
--experimental_reuse_sandbox_directories
which cuts down
on build times by 2x-3x, while maintaining sandboxing.
Example invocation for a CMake project:
# This will generate a compilation database under build/
# See https://clang.llvm.org/docs/JSONCompilationDatabase.html
cmake -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON <args>
# Invoked scip-clang from the project root (not the build root)
path/to/scip-clang --compdb-path build/compile_commands.json
Consult --help
for user-facing flags, and --help-all
for both user-facing and internal flags.
Run all tests:
bazel test //test --spawn_strategy=local --config=dev
Update snapshot tests:
bazel test //update --spawn_strategy=local --config=dev
Run ./tools/reformat.sh
to reformat code and config files.
Run ./tools/regenerate-compdb.sh
to generate a compilation database
at the root of the repository. It will be automatically
picked up by clangd-based editor extensions (you may
need to reload the editor).
The default mode of UBSan will not print stack traces on failures.
I recommend maintaining a parallel build of LLVM
at the same commit as in fetch_deps.bzl.
UBSan needs a llvm-symbolizer
binary on PATH
to print stack traces, which can provided via the separate build.
PATH="$PWD/../llvm-project/build/bin:$PATH" UBSAN_OPTIONS=print_stacktrace=1 <scip-clang invocation>
Anecdotally, on macOS, this can take 10s+ the first time around, so don't hit Ctrl+C if UBSan seems to be stuck.
In the default mode of operation, the worker which runs semantic analysis and emits the index, runs in a separate process and performs IPC to communicate with the driver. This makes using a debugger tedious.
If you want to attach a debugger, run the worker directly instead.
- First, run the original
scip-clang
invocation with--log-level=debug
and a short timeout (say--receive-timeout-seconds=10
). This will print job ids (<compdb-index>.<subtask-index>
) around when a task is being processed. - Subset out the original compilation database using
jq
or similar.jq '[.[<compdb-index>]]' compile_commands.json > bad.json
- Run
scip-clang --worker-mode=compdb --compdb-path bad.json
(the originalscip-clang
invocation will have printed more arguments which were passed to the worker, but most of them should be unnecessary).
If you have not used LLDB before, check out this LLDB cheat sheet.
There is a VM setup script available to configure a GCP VM for building scip-clang. We recommend using Ubuntu 20.04+ with 16 cores or more.
Print the AST nodes:
clang -Xclang -ast-dump file.c
clang -Xclang -ast-dump=json file.c
Another option is to use clang-query (tutorial).
In case of a crash, it may be possible to automatically reduce it using C-Reduce.
Important:
On macOS, use brew install --HEAD creduce
,
as the default version is very outdated.
There is a helper script tools/reduce.py
which can coordinate scip-clang
and creduce
,
since correctly handling different kinds of paths in a compilation database
is a bit finicky in the general case.
It can be invoked like so:
# Pre-conditions:
# 1. CWD is project root
# 2. bad.json points to a compilation database with a single entry
# known to cause the crash
/path/to/tools/reduce.py bad.json
After completion, a path to a reduced C++ file will be printed out which still reproduces the crash.
See the script's --help
text for information about additional flags.
The LLVM monorepo contains a tool pp-trace which can be used to understand the preprocessor callbacks being invoked without having to resort to print debugging inside scip-clang itself.
First, build pp-trace
from source in your LLVM checkout,
making sure to include clang-tools-extra
in LLVM_ENABLE_PROJECTS
.
After that, it can be invoked like so:
/path/to/llvm-project/build/bin/pp-trace mytestfile.cpp --extra-arg="-isysroot" --extra-arg="$(xcrun --show-sdk-path)"
The isysroot
argument is particularly important,
as pp-trace
will not find standard library headers without it.
See the pp-trace docs
or the --help
text for information about other supported flags.
Some useful non-indexer specific logic is adapted from the Sorbet
codebase and is marked with a NOTE(ref: based-on-sorbet)
.
In particular, we reuse the infrastructure for ENFORCE
macros,
which are essentially assertions which are instrumented so
that the cost can be measured easily.
We could technically have used assert
,
but having a separate macro makes it easier to change
the behavior in scip-clang exclusively, whereas there is a
greater chance of mistakes if we want to separate out the
cost of assertions in Clang itself vs in our code.
See docs/SourceLocation.md for information about how source locations are handled in Clang.
We have limited familiarity with Windows overall, so this section includes detailed steps to (try to) build the code on Windows.
- Spin up a Windows Server 2022 machine on GCP. This generally takes a bit more time than Linux machines.
- Install Microsoft Remote Desktop through the App Store.
- Run the GCP command: (via RDP dropdown > View gcloud command to reset password)
This will print a password.
gcloud compute reset-windows-password --zone "<your zone>" --project <your project>" "<instane name>"
- In the GCP UI, download the RDP file for remote login.
- Open the RDP file using Microsoft Remote Desktop.
- Enter the password from step 3.
- Start Powershell.exe as Admin and install Chocolatey
- Install Git for Windows.
- Run Git Bash as Admin and install Python and Bazelisk:
After this, you may need to restart Git Bash for Python to be found. If after restarting, check if
choco install -yv bazelisk python3
python3 --version
andpython --version
work. Ifpython3 --version
doesn't work, then copy over the binarycp "$(which python)" "$(dirname "$(which python)")/python3"
- Before invoking Bazel, make sure to run:
export MSYS2_ARG_CONV_EXCL="*"
for correctly handling //
in Bazel targets.
After this, you should be able to run the build as usual.