Skip to content

Replace QDox with tree-sitter for Java source parsing #8349

@rkophs

Description

@rkophs

👋 I'm posting this as an issue to trigger discussion to identify the best path forward. Adding a draft PR in case the community wishes to move in the direction of tree-sitter.

Problem

QDox fails to parse valid Java source files that use annotations in type parameter positions. For example, Guava's ImmutableMap.java (source):

public static <T extends @Nullable Object, K, V>
    Collector<T, ?, ImmutableMap<K, V>> toImmutableMap(

This throws a ParseException in QDox, which has no error recovery — meaning we lose all symbol information for the entire file. This pattern is common in well-maintained libraries (Guava, Error Prone, Checker Framework-annotated code).

Additional Issues

QDox is officially end-of-life; see README. Outstanding bugs for modern Java features were closed as "Won't fix" on the same day:

  • qdox#182 — annotated type parameters (<T extends @Nullable Object>) — the bug that triggered this work
  • qdox#272 — sealed interfaces cause StackOverflowError

There will be no further QDox releases. A parser replacement is needed.

Alternatives considered

Criterion QDox 2.2.0 JavaParser 3.28 tree-sitter-ng 0.26
Error recovery None (throws) Partial Excellent (always produces tree)
<T extends @Nullable Object> Fails Mostly works Full support
Relative speed Fast ~36x slower than tree-sitter Fastest
Java version coverage Incomplete Java 1–25 Tracks spec via grammar
Scope Name extraction Full AST (overkill) Visit only declarations
Native dependency No No Yes (bundled for x86_64/aarch64 on macOS/Linux/Windows)

JavaParser was passed over because it builds a full AST including method bodies and expressions — work we don't need for extracting declaration names and positions. Its error recovery operates at the statement level inside method bodies, which doesn't help our use case.

tree-sitter is the best fit because:

  1. It was purpose-built as an incremental parser for editor experiences and is quickly becoming the de facto parsing library used by language servers and IDEs (VS Code, Neovim, Zed, Helix)
  2. Error recovery is a core design goal — invalid regions get ERROR nodes while surrounding declarations parse correctly with accurate positions
  3. O(n) C-based parser, critical for indexing thousands of JDK source files on project import
  4. Grammar explicitly models annotated_type at all type-use positions — no workarounds
  5. We only visit declaration nodes, skipping method bodies entirely

The trade-off is a JNI/native library dependency. The io.github.bonede:tree-sitter bindings bundle natives for the platforms that cover >99% of developer workstations.

Proposed changes

  • Replace com.thoughtworks.qdox:qdox with io.github.bonede:tree-sitter + io.github.bonede:tree-sitter-java
  • Rewrite JavaMtags to walk the tree-sitter CST instead of using JavaProjectBuilder
  • Introduce lightweight case classes to decouple JavadocIndexer from the parser model
  • Add a JavadocParser utility for structured Javadoc comment extraction

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions