👋 I'm posting this as an issue to trigger discussion to identify the best path forward. Adding a draft PR in case the community wishes to move in the direction of tree-sitter.
Problem
QDox fails to parse valid Java source files that use annotations in type parameter positions. For example, Guava's ImmutableMap.java (source):
public static <T extends @Nullable Object, K, V>
Collector<T, ?, ImmutableMap<K, V>> toImmutableMap(
This throws a ParseException in QDox, which has no error recovery — meaning we lose all symbol information for the entire file. This pattern is common in well-maintained libraries (Guava, Error Prone, Checker Framework-annotated code).
Additional Issues
QDox is officially end-of-life; see README. Outstanding bugs for modern Java features were closed as "Won't fix" on the same day:
- qdox#182 — annotated type parameters (
<T extends @Nullable Object>) — the bug that triggered this work
- qdox#272 — sealed interfaces cause StackOverflowError
There will be no further QDox releases. A parser replacement is needed.
Alternatives considered
| Criterion |
QDox 2.2.0 |
JavaParser 3.28 |
tree-sitter-ng 0.26 |
| Error recovery |
None (throws) |
Partial |
Excellent (always produces tree) |
<T extends @Nullable Object> |
Fails |
Mostly works |
Full support |
| Relative speed |
Fast |
~36x slower than tree-sitter |
Fastest |
| Java version coverage |
Incomplete |
Java 1–25 |
Tracks spec via grammar |
| Scope |
Name extraction |
Full AST (overkill) |
Visit only declarations |
| Native dependency |
No |
No |
Yes (bundled for x86_64/aarch64 on macOS/Linux/Windows) |
JavaParser was passed over because it builds a full AST including method bodies and expressions — work we don't need for extracting declaration names and positions. Its error recovery operates at the statement level inside method bodies, which doesn't help our use case.
tree-sitter is the best fit because:
- It was purpose-built as an incremental parser for editor experiences and is quickly becoming the de facto parsing library used by language servers and IDEs (VS Code, Neovim, Zed, Helix)
- Error recovery is a core design goal — invalid regions get
ERROR nodes while surrounding declarations parse correctly with accurate positions
- O(n) C-based parser, critical for indexing thousands of JDK source files on project import
- Grammar explicitly models
annotated_type at all type-use positions — no workarounds
- We only visit declaration nodes, skipping method bodies entirely
The trade-off is a JNI/native library dependency. The io.github.bonede:tree-sitter bindings bundle natives for the platforms that cover >99% of developer workstations.
Proposed changes
- Replace
com.thoughtworks.qdox:qdox with io.github.bonede:tree-sitter + io.github.bonede:tree-sitter-java
- Rewrite
JavaMtags to walk the tree-sitter CST instead of using JavaProjectBuilder
- Introduce lightweight case classes to decouple
JavadocIndexer from the parser model
- Add a
JavadocParser utility for structured Javadoc comment extraction
👋 I'm posting this as an issue to trigger discussion to identify the best path forward. Adding a draft PR in case the community wishes to move in the direction of tree-sitter.
Problem
QDox fails to parse valid Java source files that use annotations in type parameter positions. For example, Guava's
ImmutableMap.java(source):This throws a
ParseExceptionin QDox, which has no error recovery — meaning we lose all symbol information for the entire file. This pattern is common in well-maintained libraries (Guava, Error Prone, Checker Framework-annotated code).Additional Issues
QDox is officially end-of-life; see README. Outstanding bugs for modern Java features were closed as "Won't fix" on the same day:
<T extends @Nullable Object>) — the bug that triggered this workThere will be no further QDox releases. A parser replacement is needed.
Alternatives considered
<T extends @Nullable Object>JavaParser was passed over because it builds a full AST including method bodies and expressions — work we don't need for extracting declaration names and positions. Its error recovery operates at the statement level inside method bodies, which doesn't help our use case.
tree-sitter is the best fit because:
ERRORnodes while surrounding declarations parse correctly with accurate positionsannotated_typeat all type-use positions — no workaroundsThe trade-off is a JNI/native library dependency. The
io.github.bonede:tree-sitterbindings bundle natives for the platforms that cover >99% of developer workstations.Proposed changes
com.thoughtworks.qdox:qdoxwithio.github.bonede:tree-sitter+io.github.bonede:tree-sitter-javaJavaMtagsto walk the tree-sitter CST instead of usingJavaProjectBuilderJavadocIndexerfrom the parser modelJavadocParserutility for structured Javadoc comment extraction