-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Make code graph APIs SCIP-oriented #59470
Comments
Great write up @varungandhi-src! Adding a link to the customer discussion that prompted this so it's easier to find when they ask about it. |
Recently, we hit an issue https://github.com/sourcegraph/sourcegraph/issues/59733 where the hover doesn't show up despite a source range having precise code intelligence. Right now, the client-side code hard-codes a list of If the client-side code had access to the precise SCIP document itself, it could directly check the source range without having to check a separate "interactive" allowlist, avoiding the need to coordinate between different precise indexers and client-side code to make sure that they agree on which tokens can potentially have hovers. |
Thinking through this a little bit, and I had a couple of questions:
|
Right, I think this would be part of the API contract via an enum in GraphQL.
Generally, it should be fine to mix highlighting data with navigation data from a single source. Today, we already have this - the syntax highlighter sends a single SCIP document with information about both highlighting and code nav for locals & params (when available) based on Tree-sitter. However, for the general case, I think we want to have separate SCIP Documents for different data sources for code nav specifically, because otherwise, the client may not be able to identify which Occurrence came from which source (and we would not want to modify SCIP to track that on a per-occurrence basis)
Yes, this can happen in a few ways:
I think we shouldn't have any special handling for zero values. For example, if a file just has comments, then a precise data source will not populate any Occurrences or SymbolInformations in that file. In such a case, it would be incorrect to specially treat the empty array as an error case, and fall back to search-based code nav. If the new API silently converts errors into zero values, I would consider that as an API design bug. |
Related feature request: |
Status Quo
At the moment, the way precise code graph APIs work is based on source ranges. For example, when you do Find references, you end up calling something like this: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/cmd/frontend/graphqlbackend/codeintel.codenav.graphql?L114-128
The consequences of this design centered around positions is that:
Proposed direction
From the start, the vision for SCIP has been to serve as part of the core vocabulary at Sourcegraph. We have already incorporated that to some extent with code navigation for locals, where the syntax highlighter can provide information about occurrences for locals and parameters for a growing set of languages, and the client-side code can retrieve a SCIP document1: SCIP document (1, 2), without having to make further requests to the server.
Integrating SCIP into the precise code graph APIs would involve adding support for:
With 1. available, when attempting to do Go to definition/Find references, the client-side code could surface a choice to the user when multiple symbols have occurrences for the same token. This would address #57347.
With 1. available, even when a source range only has an occurrence for a single symbol, the client-side code could use the SCIP symbol name (or a slightly cleaned up version of it), to form URLs. For example, a URL like:
would become something like:
So if there were changes to previous lines, e.g. new imports were added, but if the code continued to have precise code graph data, the URL would still work, because the client-side code could fetch the SCIP document, and locate the nearest source range to
L27:13-27:30
that has the matching symbol (and symbol names for top-level symbols themselves do not change based on source locations2), and (optionally) rewrite the URL. We should still probably maintain the source range in the URL so that the blob view can highlight the intended range in the source file.With 2. available, the client-side code could only fetch precise data for a single symbol instead of presenting a union.
Accommodating new SCIP data sources
When the work on batch indexing is complete (tentatively planned for Q1 FY25), we'll have a new source of code graph data which is technically not precise, but it will be SCIP-oriented. We should be designing APIs and UI changes (e.g. for the ref panel and URLs) taking this into mind. For example:
any
data source setting).Miscellaneous suggestions
Debugging
Footnotes
The field is unfortunately named
lsif
in the GraphQL API because SCIP was originally called 'LSIF Typed' before the public announcement. ↩The only major exception is the symbol naming scheme for macros in C and C++ ↩
The text was updated successfully, but these errors were encountered: