Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Make code graph APIs SCIP-oriented #59470

Closed as not planned
Closed as not planned
@varungandhi-src

Description

@varungandhi-src

Status Quo

At the moment, the way precise code graph APIs work is based on source ranges. For example, when you do Find references, you end up calling something like this: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/cmd/frontend/graphqlbackend/codeintel.codenav.graphql?L114-128

references(line: Int!, character: Int!, ...)

The consequences of this design centered around positions is that:

  • Ref panel URLs use source ranges. This means that the ref panel URLs that are not permalinks (i.e. not pinned a specific commit) can easily break if there were some unrelated changes earlier in the file causing the range for the occurrence to change.
  • Client-side code cannot distinguish the cases where multiple semantically different symbols are present at the same source range, from the case where only one unique symbol is present. Related issues:

Proposed direction

From the start, the vision for SCIP has been to serve as part of the core vocabulary at Sourcegraph. We have already incorporated that to some extent with code navigation for locals, where the syntax highlighter can provide information about occurrences for locals and parameters for a growing set of languages, and the client-side code can retrieve a SCIP document1: SCIP document (1, 2), without having to make further requests to the server.

Integrating SCIP into the precise code graph APIs would involve adding support for:

  1. Getting the precise SCIP Document corresponding to a file.
  2. Looking up defs/refs/impls etc based on SCIP symbol names (or suffixes)

With 1. available, when attempting to do Go to definition/Find references, the client-side code could surface a choice to the user when multiple symbols have occurrences for the same token. This would address #57347.

With 1. available, even when a source range only has an occurrence for a single symbol, the client-side code could use the SCIP symbol name (or a slightly cleaned up version of it), to form URLs. For example, a URL like:

https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/auth.ts?L27:13-27:30#tab=references

would become something like:

https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/auth.ts?L27:13-27:30?$symbol=@sourcegraph\/web/1.10.1/src\/auth.ts\/AuthenticatedUser\##tab=references

So if there were changes to previous lines, e.g. new imports were added, but if the code continued to have precise code graph data, the URL would still work, because the client-side code could fetch the SCIP document, and locate the nearest source range to L27:13-27:30 that has the matching symbol (and symbol names for top-level symbols themselves do not change based on source locations2), and (optionally) rewrite the URL. We should still probably maintain the source range in the URL so that the blob view can highlight the intended range in the source file.

With 2. available, the client-side code could only fetch precise data for a single symbol instead of presenting a union.

Accommodating new SCIP data sources

When the work on batch indexing is complete (tentatively planned for Q1 FY25), we'll have a new source of code graph data which is technically not precise, but it will be SCIP-oriented. We should be designing APIs and UI changes (e.g. for the ref panel and URLs) taking this into mind. For example:

  • The client should request data with a setting for a "data source" (precise indexer? tree-sitter? any?), rather than having a tailored API specifically for precise data. This would apply both to the new APIs for fetching SCIP documents, as well as the APIs for fetching defs/refs/impls etc.
  • Results that are returned should include (or be extensible to include) accompanying information about the data source (this matters for the any data source setting).
  • The data source should be implied by URLs, so that link sharing shows consistent results. Right now, the "Mix search-based and precise" setting is a user setting which means that it doesn't work if you're logged out 🙃, and if you're logged in, you may see different results compared to your colleague.

Miscellaneous suggestions

  • Instead of having specialized APIs for defs/refs/impls etc., it might make sense to have a single API where the desired kind of output is specified as a string (this only applies for the output where it is a list of occurrences, this wouldn't apply to potential features like call hierarchy where the result would be a graph of occurrences). There seems to be an unnecessary amount of client-side complexity and code duplication because of the shape of the API.
  • It would be helpful to document guarantees related to duplicates, ordering etc. to avoid unnecessary client-side code related to sorting/uniquing/merging.

Debugging

Footnotes

  1. The field is unfortunately named lsif in the GraphQL API because SCIP was originally called 'LSIF Typed' before the public announcement.

  2. The only major exception is the symbol naming scheme for macros in C and C++

Metadata

Metadata

Labels

graph/apiGraphQL API parts owned by Team Graphteam/graphGraph Team (previously Code Intel/Language Tools/Language Platform)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions