diff --git a/.github/prompts/plan-scipIndexImport.prompt.md b/.github/prompts/plan-scipIndexImport.prompt.md new file mode 100644 index 000000000..69d840543 --- /dev/null +++ b/.github/prompts/plan-scipIndexImport.prompt.md @@ -0,0 +1,80 @@ +# Plan: `scip-index-import` Domain + +**TL;DR**: Create a standalone `domains/scip-index-import/` domain that (1) imports SCIP type-graph CSVs into Neo4j by splitting source files into single-statement `.cypher` files, (2) enriches the data for `projectionFunctions.sh` compatibility, and (3) creates SCIP-scoped structural nodes and SCIP-specific query variants for cyclic-deps and external-deps compatibility — without polluting shared labels. + +**Decided**: domain name = `scip-index-import`, always cleanup before import, pass `referenceCount` as `$dependencies_projection_weight_property` parameter, assume CSVs pre-placed in Neo4j import dir, reuse `cypher/Dependency_Enrichment/` language-agnostic queries directly, keep all SCIP nodes under SCIP-specific labels only. + +--- + +## Phase 1: Import Queries (split source files) + +Files in `domains/scip-index-import/queries/`: + +1. `Cleanup_SCIP_Type_Nodes.cypher`: copied from `getting-started-with-scip/type-graph/cleanup-code-unit-csv-from-neo4j.cypher`; single DETACH DELETE statement; strip semicolons per convention +2. `Create_SCIP_Type_Constraint.cypher`: Step 1 from `import-code-unit-csv-to-neo4j.cypher` (CREATE CONSTRAINT scip_type_symbol_unique) +3. `Import_SCIP_Type_Internal_Nodes.cypher`: Step 2 (LOAD CSV WHERE row.file <> '') +4. `Import_SCIP_Type_External_Nodes.cypher`: Step 3 (LOAD CSV WHERE row.file = '') +5. `Import_SCIP_Type_Edges.cypher`: Step 4 (LOAD CSV edges, MERGE DEPENDS_ON) + +*Cypher convention: strip all semicolons; first line = description comment; one statement per file.* + +## Phase 2: Projection-Compatibility Enrichment + +6. `Set_Incoming_SCIP_Type_Dependencies.cypher`: MATCH `(target:SCIPType)` WHERE `incomingDependencies IS NULL`; OPTIONAL MATCH source nodes; SET `incomingDependencies`, `incomingDependenciesWeight` +7. `Set_Outgoing_SCIP_Type_Dependencies.cypher`: mirror of above for outgoing +8. `Set_SCIP_Type_Test_Marker_Integer.cypher`: `SET n.testMarkerInteger = CASE WHEN n.isTest THEN 1 ELSE 0 END` WHERE `n.testMarkerInteger IS NULL`; MATCH `(n:SCIPType)` + +*Script reuses `cypher/Dependency_Enrichment/Set_Dependency_Degree.cypher` and `Set_Dependency_Degree_Rank.cypher` directly — no copies.* + +## Phase 3: Structural Node Enrichment (SCIP-scoped) + +Structural nodes carry only SCIP-specific labels to avoid collision with jQAssistant data. No `:Type`, `:Package`, `:Artifact`, or `:ExternalType` labels are added. + +9. `Create_SCIP_Artifact_Nodes.cypher`: MERGE `:SCIP:SCIPArtifact` nodes from unique `(module, version, packageManager)` on SCIPType; `fqn = module + ' ' + version`, `name = module`, `fileName = packageId` +10. `Create_SCIP_Module_Nodes_For_Internal_Types.cypher`: MERGE `:SCIP:SCIPModule` nodes from unique directory portion of `file` on `:SCIPInternalType` nodes; `fqn` = raw directory path (language-agnostic: `left(file, size(file) - size(split(file, '/')[-1]) - 1)`); no language-specific stripping +11. `Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher`: MATCH SCIPModule by `fqn` equal to the derived directory of `file`, MATCH SCIPInternalType, MERGE `(module)-[:CONTAINS]->(type)` +12. `Link_SCIP_Artifact_CONTAINS_SCIP_Module.cypher`: MATCH SCIPArtifact by module+version, MATCH SCIPModule by module, MERGE `(artifact)-[:CONTAINS]->(module)` +13. `Link_SCIP_Artifact_CONTAINS_SCIP_ExternalType.cypher`: External types have no package path; link SCIPArtifact directly to SCIPExternalType via CONTAINS + +## Phase 4: SCIP-specific Domain Query Variants + +Existing domain queries use `:Package`, `:Type`, `:Artifact`, `:ExternalType` — labels not present on SCIP nodes. SCIP variants are placed in `domains/scip-index-import/queries/` for now (domain not yet integrated). + +14. `Cyclic_SCIP_Type_Dependencies.cypher`: adapted from `domains/cyclic-dependencies/queries/Cyclic_Dependencies.cypher`; replace `:Package` with `:SCIPModule`, `:Type` with `:SCIPType`, `:Artifact` with `:SCIPArtifact`; same logic and output columns +15. `External_SCIP_Type_Package_Usage_Overall.cypher`: adapted from `domains/external-dependencies/queries/External_package_usage_overall.cypher`; replace `:ExternalType` with `:SCIPExternalType`, `:Package` with `:SCIPModule`, `:Type` with `:SCIPType`; same logic and output columns + +## Phase 5: Entry-point Shell Script + +16. `domains/scip-index-import/importScipIndexData.sh`: + - Header: shebang, blank line, description comment, `set -o errexit -o pipefail -o nounset`, `IFS=$'\n\t'` + - Source `executeQueryFunctions.sh` (via SCRIPTS_DIR resolution pattern from `prepareAnalysis.sh`) + - QUERIES_DIR defined relative to script location + - DEPENDENCY_ENRICHMENT_CYPHER_DIR path to `cypher/Dependency_Enrichment/` + - Runs in sequence: cleanup → constraint → import nodes (2 queries) → import edges → incoming → outgoing → test marker → Set_Dependency_Degree → Set_Dependency_Degree_Rank → artifact nodes → module nodes → contains links (3 queries) + - Log each step with echo prefix `importScipIndexData:` + +## Verification + +1. `shellcheck domains/scip-index-import/importScipIndexData.sh` +2. Copy test CSVs from `temp/simple-project-for-scip-java-comparision/import/` to Neo4j import dir; run script +3. Verify nodes exist: `MATCH (n:SCIPType) RETURN count(n)` +4. Verify projection readiness: run `Dependencies_0_Verify_Projectable.cypher` with params `dependencies_projection_node=SCIPType`, `dependencies_projection_weight_property=referenceCount` +5. Verify cyclic-deps SCIP variant: `domains/scip-index-import/queries/Cyclic_SCIP_Type_Dependencies.cypher` +6. Verify external-deps SCIP variant: `domains/scip-index-import/queries/External_SCIP_Type_Package_Usage_Overall.cypher` + +## Gap Analysis + +✅ Enabled after this plan: +- `projectionFunctions.sh` with `SCIPType` node + `referenceCount` property (graph algorithms, anomaly detection, node embeddings) +- `Cyclic_SCIP_Type_Dependencies.cypher`: via SCIPModule/SCIPArtifact/CONTAINS + SCIPType +- `External_SCIP_Type_Package_Usage_Overall.cypher`: via SCIPExternalType + SCIPModule/SCIPType + +❌ Still missing (not in this plan): +- Internal-deps queries use `:Java:Package`, `:Java:Type` - SCIP-specific variants not planned here +- TypeScript-specific internal-deps queries - different schema entirely +- git-history domain - explicitly out of scope +- Queries using `globalFqn` or `fqn` on types - SCIP types use `symbol`; `name` is the display name + +## Further Considerations + +1. **Weight property aliasing (optional future optimization)**: If a hardcoded `weight` property becomes necessary (e.g., for domain queries that don't parameterize the weight property), it can be added in Phase 1 during the LOAD CSV edges step (item 5) with a simple `SET r.weight = r.referenceCount` clause. Currently all projection usage is parameterized, so this is not needed. \ No newline at end of file diff --git a/domains/scip-index-import/README.md b/domains/scip-index-import/README.md new file mode 100644 index 000000000..cb8d841b9 --- /dev/null +++ b/domains/scip-index-import/README.md @@ -0,0 +1,108 @@ +# SCIP Index Import Domain + +Imports SCIP type-graph data from CSV into Neo4j and enriches it for analysis. +[SCIP](https://github.com/sourcegraph/scip) (Sourcegraph Code Intelligence Protocol) provides a language-agnostic type dependency graph. + +Supported languages: Go, Java, TypeScript, Rust, C++, Ruby, Python, C#. + +## When to use + +Run this domain after generating `scip_type_nodes.csv` and `scip_type_edges.csv` and placing them in the Neo4j import directory. + +## Entry Point + +| Script | Purpose | +|--------|---------| +| [importScipIndexData.sh](./importScipIndexData.sh) | Full import and enrichment pipeline — run this directly | + +## Prerequisites + +Two CSV files must be present in the Neo4j import directory before running: + +| File | Columns | +|------|---------| +| `scip_type_nodes.csv` | `symbol`, `display_name`, `file`, `scheme`, `type_name`, `package_id`, `package_manager`, `version`, `module`, `is_abstract` | +| `scip_type_edges.csv` | `source_symbol`, `target_symbol`, `reference_count` | + +Internal types have a non-empty `file` column. External types have an empty `file` column. + +## Import Phases + +`importScipIndexData.sh` runs the following queries in order: + +### 1. Setup + +| Query | Purpose | +|-------|---------| +| [Cleanup_SCIP_Type_Nodes.cypher](./queries/Cleanup_SCIP_Type_Nodes.cypher) | Delete all existing SCIP nodes — clean slate before re-import | +| [Create_SCIP_Type_Constraint.cypher](./queries/Create_SCIP_Type_Constraint.cypher) | Create uniqueness constraint on `SCIPType.symbol` | + +### 2. Import + +| Query | Purpose | +|-------|---------| +| [Import_SCIP_Type_Internal_Nodes.cypher](./queries/Import_SCIP_Type_Internal_Nodes.cypher) | Import internal types (own source files); sets `isTest` from file path patterns | +| [Import_SCIP_Type_External_Nodes.cypher](./queries/Import_SCIP_Type_External_Nodes.cypher) | Import external types (library dependencies) | +| [Import_SCIP_Type_Edges.cypher](./queries/Import_SCIP_Type_Edges.cypher) | Import `DEPENDS_ON` relationships between types | + +### 3. Type Enrichment + +| Query | Purpose | +|-------|---------| +| [Set_Incoming_SCIP_Type_Dependencies.cypher](./queries/Set_Incoming_SCIP_Type_Dependencies.cypher) | Set `incomingDependencies` count on each type | +| [Set_Outgoing_SCIP_Type_Dependencies.cypher](./queries/Set_Outgoing_SCIP_Type_Dependencies.cypher) | Set `outgoingDependencies` count on each type | +| [Set_SCIP_Type_Test_Marker_Integer.cypher](./queries/Set_SCIP_Type_Test_Marker_Integer.cypher) | Set `testMarkerInteger` (0/1) from `isTest` on all types | + +### 4. Structural Nodes and Links + +| Query | Purpose | +|-------|---------| +| [Create_SCIP_Module_Nodes_For_Internal_Types.cypher](./queries/Create_SCIP_Module_Nodes_For_Internal_Types.cypher) | Create `SCIPModule` nodes — one per unique source directory | +| [Create_SCIP_Artifact_Nodes.cypher](./queries/Create_SCIP_Artifact_Nodes.cypher) | Create `SCIPArtifact` nodes — one per unique module+version combination | +| [Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher](./queries/Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher) | `SCIPModule -[:CONTAINS]-> SCIPInternalType` | +| [Link_SCIP_Artifact_CONTAINS_SCIP_Module.cypher](./queries/Link_SCIP_Artifact_CONTAINS_SCIP_Module.cypher) | `SCIPArtifact -[:CONTAINS]-> SCIPModule` | +| [Link_SCIP_Artifact_CONTAINS_SCIP_ExternalType.cypher](./queries/Link_SCIP_Artifact_CONTAINS_SCIP_ExternalType.cypher) | `SCIPArtifact -[:CONTAINS]-> SCIPExternalType` | +| [Set_SCIP_Module_Is_Test_And_Marker_Integer.cypher](./queries/Set_SCIP_Module_Is_Test_And_Marker_Integer.cypher) | Set `isTest` and `testMarkerInteger` on modules — true if any contained type is a test | + +### 5. Dependency Metrics + +Shared queries from [`cypher/Dependency_Enrichment/`](../../cypher/Dependency_Enrichment/): + +- `Set_Dependency_Degree.cypher` — combined in/out degree per node +- `Set_Dependency_Degree_Rank.cypher` — percentile rank of dependency degree + +## Graph Model + +### Nodes + +| Label | Description | +|-------|-------------| +| `SCIP:SCIPType:SCIPInternalType` | Type from own source code; has `isTest`, `testMarkerInteger`, `file` | +| `SCIP:SCIPType:SCIPExternalType` | Type from an external library; `isTest = false` | +| `SCIP:SCIPModule` | Source directory; has `isTest`, `testMarkerInteger` | +| `SCIP:SCIPArtifact` | Module + version package; groups types and modules | + +### Relationships + +| Relationship | From → To | Description | +|--------------|-----------|-------------| +| `DEPENDS_ON` | `SCIPType → SCIPType` | Type-level dependency with `referenceCount` | +| `CONTAINS` | `SCIPModule → SCIPInternalType` | Module contains its source types | +| `CONTAINS` | `SCIPArtifact → SCIPModule` | Artifact contains its modules | +| `CONTAINS` | `SCIPArtifact → SCIPExternalType` | Artifact contains its external types | + +### Key Properties + +| Property | Nodes | Description | +|----------|-------|-------------| +| `isTest` | `SCIPInternalType`, `SCIPModule` | `true` if the node is part of test code | +| `testMarkerInteger` | `SCIPType`, `SCIPModule` | `1` if `isTest`, `0` otherwise — used for graph projections | +| `language` | `SCIPType` | Detected language (e.g. `Java`, `TypeScript`, `Go`) | +| `incomingDependencies` | `SCIPType` | Number of types that depend on this type | +| `outgoingDependencies` | `SCIPType` | Number of types this type depends on | + +### Test Detection + +`isTest` is set on `SCIPInternalType` nodes during import by matching file path patterns (`/test/`, `/tests/`, `/spec/`, `__tests__`, `_test.go`, `.test.`, `.spec.`, Windows equivalents). + +`isTest` on `SCIPModule` nodes is derived from its contained types: a module is a test module if **any** of its `SCIPInternalType` nodes has `isTest = true`. diff --git a/domains/scip-index-import/importScipIndexData.sh b/domains/scip-index-import/importScipIndexData.sh new file mode 100755 index 000000000..b4c2dba25 --- /dev/null +++ b/domains/scip-index-import/importScipIndexData.sh @@ -0,0 +1,95 @@ +#!/usr/bin/env bash + +# Imports SCIP type-graph CSV data into Neo4j and enriches it for projection compatibility. +# Creates SCIPType, SCIPInternalType, SCIPExternalType, SCIPArtifact, and SCIPModule nodes. +# Also creates structural CONTAINS links between artifacts, modules, and types. +# Assumes scip_type_nodes.csv and scip_type_edges.csv are already placed in the Neo4j import directory. +# Requires executeQueryFunctions.sh + +# Fail on any error ("-e" = exit on first error, "-o pipefail" exit on errors within piped commands) +set -o errexit -o pipefail -o nounset +IFS=$'\n\t' + +## Get this "domains/scip-index-import" directory if not already set +# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. +# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. +# This way non-standard tools like readlink aren't needed. +SCIP_INDEX_IMPORT_SCRIPT_DIR=${SCIP_INDEX_IMPORT_SCRIPT_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )} +echo "importScipIndexData: SCIP_INDEX_IMPORT_SCRIPT_DIR=${SCIP_INDEX_IMPORT_SCRIPT_DIR}" + +# Get the "scripts" directory by navigating two levels up from this domain directory. +SCRIPTS_DIR=${SCRIPTS_DIR:-"${SCIP_INDEX_IMPORT_SCRIPT_DIR}/../../scripts"} + +# Cypher query directory within this domain +QUERIES_DIR="${SCIP_INDEX_IMPORT_SCRIPT_DIR}/queries" + +# Dependency enrichment queries in the shared cypher directory +DEPENDENCY_ENRICHMENT_CYPHER_DIR="${SCRIPTS_DIR}/../cypher/Dependency_Enrichment" + +# Define functions to execute a cypher query from within a given file like "execute_cypher" +source "${SCRIPTS_DIR}/executeQueryFunctions.sh" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Cleaning up existing SCIP type nodes..." +execute_cypher "${QUERIES_DIR}/Cleanup_SCIP_Type_Nodes.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Creating SCIP type uniqueness constraint..." +execute_cypher "${QUERIES_DIR}/Create_SCIP_Type_Constraint.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Importing SCIP internal type nodes..." +execute_cypher "${QUERIES_DIR}/Import_SCIP_Type_Internal_Nodes.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Importing SCIP external type nodes..." +execute_cypher "${QUERIES_DIR}/Import_SCIP_Type_External_Nodes.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Importing SCIP type dependency edges..." +execute_cypher "${QUERIES_DIR}/Import_SCIP_Type_Edges.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Setting incoming SCIP type dependencies..." +execute_cypher "${QUERIES_DIR}/Set_Incoming_SCIP_Type_Dependencies.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Setting outgoing SCIP type dependencies..." +execute_cypher "${QUERIES_DIR}/Set_Outgoing_SCIP_Type_Dependencies.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Setting SCIP type test marker integers..." +execute_cypher "${QUERIES_DIR}/Set_SCIP_Type_Test_Marker_Integer.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Creating SCIP module nodes..." +execute_cypher "${QUERIES_DIR}/Create_SCIP_Module_Nodes_For_Internal_Types.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Creating SCIP artifact nodes..." +execute_cypher "${QUERIES_DIR}/Create_SCIP_Artifact_Nodes.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Linking SCIP modules to their contained internal types..." +execute_cypher "${QUERIES_DIR}/Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Linking SCIP artifacts to their contained modules..." +execute_cypher "${QUERIES_DIR}/Link_SCIP_Artifact_CONTAINS_SCIP_Module.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Linking SCIP artifacts to their contained external types..." +execute_cypher "${QUERIES_DIR}/Link_SCIP_Artifact_CONTAINS_SCIP_ExternalType.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Setting SCIP module test markers..." +execute_cypher "${QUERIES_DIR}/Set_SCIP_Module_Is_Test_And_Marker_Integer.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Setting dependency degree..." +execute_cypher "${DEPENDENCY_ENRICHMENT_CYPHER_DIR}/Set_Dependency_Degree.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Setting dependency degree rank..." +execute_cypher "${DEPENDENCY_ENRICHMENT_CYPHER_DIR}/Set_Dependency_Degree_Rank.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Creating SCIP artifact nodes..." +execute_cypher "${QUERIES_DIR}/Create_SCIP_Artifact_Nodes.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Creating SCIP module nodes for internal types..." +execute_cypher "${QUERIES_DIR}/Create_SCIP_Module_Nodes_For_Internal_Types.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Linking SCIP modules to internal types..." +execute_cypher "${QUERIES_DIR}/Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Linking SCIP artifacts to modules..." +execute_cypher "${QUERIES_DIR}/Link_SCIP_Artifact_CONTAINS_SCIP_Module.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') Linking SCIP artifacts to external types..." +execute_cypher "${QUERIES_DIR}/Link_SCIP_Artifact_CONTAINS_SCIP_ExternalType.cypher" + +echo "importScipIndexData: $(date +'%Y-%m-%dT%H:%M:%S%z') SCIP index import complete." diff --git a/domains/scip-index-import/queries/Cleanup_SCIP_Type_Nodes.cypher b/domains/scip-index-import/queries/Cleanup_SCIP_Type_Nodes.cypher new file mode 100644 index 000000000..80bce91dc --- /dev/null +++ b/domains/scip-index-import/queries/Cleanup_SCIP_Type_Nodes.cypher @@ -0,0 +1,4 @@ +// Remove all SCIPType nodes and their relationships from Neo4j. Run before re-importing to start with a clean slate. + +MATCH (node:SCIPType) +DETACH DELETE node diff --git a/domains/scip-index-import/queries/Create_SCIP_Artifact_Nodes.cypher b/domains/scip-index-import/queries/Create_SCIP_Artifact_Nodes.cypher new file mode 100644 index 000000000..14e114970 --- /dev/null +++ b/domains/scip-index-import/queries/Create_SCIP_Artifact_Nodes.cypher @@ -0,0 +1,13 @@ +// Create SCIPArtifact nodes from unique module, version, and packageManager combinations on SCIPType nodes. Requires "Import_SCIP_Type_Internal_Nodes.cypher" and "Import_SCIP_Type_External_Nodes.cypher". + +MATCH (t:SCIPType) + WITH DISTINCT t.module AS module + ,t.version AS version + ,t.packageManager AS packageManager + ,t.packageId AS packageId +MERGE (a:SCIP:SCIPArtifact {fqn: module + ' ' + version}) + SET a.name = module + ,a.version = version + ,a.packageManager = packageManager + ,a.fileName = packageId +RETURN count(*) AS writtenNodes diff --git a/domains/scip-index-import/queries/Create_SCIP_Module_Nodes_For_Internal_Types.cypher b/domains/scip-index-import/queries/Create_SCIP_Module_Nodes_For_Internal_Types.cypher new file mode 100644 index 000000000..5b080d38f --- /dev/null +++ b/domains/scip-index-import/queries/Create_SCIP_Module_Nodes_For_Internal_Types.cypher @@ -0,0 +1,7 @@ +// Create SCIPModule nodes from unique directory portions of source file paths on SCIPInternalType nodes. Requires "Import_SCIP_Type_Internal_Nodes.cypher". + +MATCH (t:SCIPInternalType) + WITH DISTINCT left(t.file, size(t.file) - size(split(t.file, '/')[-1]) - 1) AS directoryPath +MERGE (m:SCIP:SCIPModule {fqn: directoryPath}) + SET m.name = split(directoryPath, '/')[-1] +RETURN count(*) AS writtenNodes diff --git a/domains/scip-index-import/queries/Create_SCIP_Type_Constraint.cypher b/domains/scip-index-import/queries/Create_SCIP_Type_Constraint.cypher new file mode 100644 index 000000000..746c3413c --- /dev/null +++ b/domains/scip-index-import/queries/Create_SCIP_Type_Constraint.cypher @@ -0,0 +1,4 @@ +// Create uniqueness constraint on symbol property for SCIPType nodes. + +CREATE CONSTRAINT scip_type_symbol_unique IF NOT EXISTS +FOR (n:SCIPType) REQUIRE n.symbol IS UNIQUE diff --git a/domains/scip-index-import/queries/Cyclic_SCIP_Type_Dependencies.cypher b/domains/scip-index-import/queries/Cyclic_SCIP_Type_Dependencies.cypher new file mode 100644 index 000000000..3c488fd1a --- /dev/null +++ b/domains/scip-index-import/queries/Cyclic_SCIP_Type_Dependencies.cypher @@ -0,0 +1,35 @@ +// Cyclic SCIP Type Dependencies as List. Requires "Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher" and "Link_SCIP_Artifact_CONTAINS_SCIP_Module.cypher". + +MATCH (module:SCIPModule)-[:CONTAINS]->(forwardSource:SCIPType)-[:DEPENDS_ON]->(forwardTarget:SCIPType)<-[:CONTAINS]-(dependentModule:SCIPModule) +MATCH (dependentModule)-[:CONTAINS]->(backwardSource:SCIPType)-[:DEPENDS_ON]->(backwardTarget:SCIPType)<-[:CONTAINS]-(module) +MATCH (artifact:SCIPArtifact)-[:CONTAINS]->(module) +MATCH (dependentArtifact:SCIPArtifact)-[:CONTAINS]->(dependentModule) +WHERE module.fqn <> dependentModule.fqn + WITH artifact.name AS artifactName + ,module.fqn AS moduleName + ,dependentArtifact.name AS dependentArtifactName + ,dependentModule.fqn AS dependentModuleName + ,collect(DISTINCT forwardSource.name + '->' + forwardTarget.name) AS forwardDependencies + ,collect(DISTINCT backwardSource.name + '->' + backwardTarget.name) AS backwardDependencies + WITH artifactName + ,moduleName + ,dependentArtifactName + ,dependentModuleName + ,forwardDependencies + ,backwardDependencies + ,size(forwardDependencies) AS numberOfForwardDependencies + ,size(backwardDependencies) AS numberOfBackwardDependencies + ,size(forwardDependencies) + size(backwardDependencies) AS numberOfAllCyclicDependencies +WHERE (size(forwardDependencies) > size(backwardDependencies) + OR (size(forwardDependencies) = size(backwardDependencies) + AND size(moduleName) >= size(dependentModuleName))) +RETURN artifactName + ,moduleName + ,dependentArtifactName + ,dependentModuleName + ,toFloat(ABS(numberOfForwardDependencies - numberOfBackwardDependencies)) / numberOfAllCyclicDependencies AS forwardToBackwardBalance + ,numberOfForwardDependencies AS numberForward + ,numberOfBackwardDependencies AS numberBackward + ,forwardDependencies[0..9] AS someForwardDependencies + ,backwardDependencies +ORDER BY forwardToBackwardBalance DESC, moduleName ASC diff --git a/domains/scip-index-import/queries/External_SCIP_Type_Package_Usage_Overall.cypher b/domains/scip-index-import/queries/External_SCIP_Type_Package_Usage_Overall.cypher new file mode 100644 index 000000000..8707dfb92 --- /dev/null +++ b/domains/scip-index-import/queries/External_SCIP_Type_Package_Usage_Overall.cypher @@ -0,0 +1,26 @@ +// External SCIP type package (module) usage overall. Requires "Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher" and "Import_SCIP_Type_Edges.cypher". + + MATCH (module:SCIPModule)-[:CONTAINS]->(type:SCIPInternalType) + WITH count(DISTINCT type.symbol) AS allTypes + ,count(DISTINCT module.fqn) AS allModules + ,collect(type) AS typeList +UNWIND typeList AS type + MATCH (type)-[externalDependency:DEPENDS_ON]->(externalType:SCIPExternalType) + MATCH (typeModule:SCIPModule)-[:CONTAINS]->(type) + WITH allTypes + ,allModules + ,externalType.module AS externalPackageName + ,count(DISTINCT typeModule.fqn) AS numberOfExternalCallerModules + ,count(DISTINCT type.symbol) AS numberOfExternalCallerTypes + ,count(externalDependency) AS numberOfExternalTypeCalls + ,sum(externalDependency.referenceCount) AS numberOfExternalTypeCallsWeighted + ,collect(DISTINCT externalType.name) AS externalTypeNames +RETURN externalPackageName + ,numberOfExternalCallerModules + ,numberOfExternalCallerTypes + ,numberOfExternalTypeCalls + ,numberOfExternalTypeCallsWeighted + ,allModules + ,allTypes + ,externalTypeNames[0..9] AS tenExternalTypeNames + ORDER BY numberOfExternalCallerModules DESC, externalPackageName ASC diff --git a/domains/scip-index-import/queries/Import_SCIP_Type_Edges.cypher b/domains/scip-index-import/queries/Import_SCIP_Type_Edges.cypher new file mode 100644 index 000000000..6188b8eb9 --- /dev/null +++ b/domains/scip-index-import/queries/Import_SCIP_Type_Edges.cypher @@ -0,0 +1,7 @@ +// Import SCIP type dependency edges from 'scip_type_edges.csv'. Requires "Import_SCIP_Type_Internal_Nodes.cypher" and "Import_SCIP_Type_External_Nodes.cypher". + +LOAD CSV WITH HEADERS FROM 'file:///scip_type_edges.csv' AS row +MATCH (source:SCIPType {symbol: row.source_symbol}) +MATCH (target:SCIPType {symbol: row.target_symbol}) +MERGE (source)-[relationship:DEPENDS_ON]->(target) +SET relationship.referenceCount = toInteger(row.reference_count) diff --git a/domains/scip-index-import/queries/Import_SCIP_Type_External_Nodes.cypher b/domains/scip-index-import/queries/Import_SCIP_Type_External_Nodes.cypher new file mode 100644 index 000000000..3a5a3beb3 --- /dev/null +++ b/domains/scip-index-import/queries/Import_SCIP_Type_External_Nodes.cypher @@ -0,0 +1,27 @@ +// Import external SCIP type nodes from 'scip_type_nodes.csv'. Requires "Create_SCIP_Type_Constraint.cypher". + +LOAD CSV WITH HEADERS FROM 'file:///scip_type_nodes.csv' AS row +WITH row WHERE row.file = '' +MERGE (node:SCIP:SCIPType:SCIPExternalType {symbol: row.symbol}) +SET node.fqn = row.symbol, + node.name = row.display_name, + node.scheme = row.scheme, + node.language = CASE row.scheme + WHEN 'scip-go' THEN 'Go' + WHEN 'semanticdb' THEN 'Java' + WHEN 'scip-typescript' THEN 'TypeScript' + WHEN 'rust-analyzer' THEN 'Rust' + WHEN 'cxx' THEN 'C++' + WHEN 'scip-ruby' THEN 'Ruby' + WHEN 'scip-python' THEN 'Python' + WHEN 'scip-dotnet' THEN 'C#' + ELSE toUpper(left(replace(row.scheme, 'scip-', ''), 1)) + substring(replace(row.scheme, 'scip-', ''), 1) + END, + node.typeName = row.type_name, + node.file = '', + node.packageId = row.package_id, + node.packageManager = row.package_manager, + node.version = row.version, + node.module = row.module, + node.isAbstract = (row.is_abstract = 'true'), + node.isTest = false diff --git a/domains/scip-index-import/queries/Import_SCIP_Type_Internal_Nodes.cypher b/domains/scip-index-import/queries/Import_SCIP_Type_Internal_Nodes.cypher new file mode 100644 index 000000000..c75abe7e2 --- /dev/null +++ b/domains/scip-index-import/queries/Import_SCIP_Type_Internal_Nodes.cypher @@ -0,0 +1,38 @@ +// Import internal SCIP type nodes from 'scip_type_nodes.csv'. Requires "Create_SCIP_Type_Constraint.cypher". + +LOAD CSV WITH HEADERS FROM 'file:///scip_type_nodes.csv' AS row +WITH row WHERE row.file <> '' +MERGE (node:SCIP:SCIPType:SCIPInternalType {symbol: row.symbol}) +SET node.fqn = row.symbol, + node.name = row.display_name, + node.scheme = row.scheme, + node.language = CASE row.scheme + WHEN 'scip-go' THEN 'Go' + WHEN 'semanticdb' THEN 'Java' + WHEN 'scip-typescript' THEN 'TypeScript' + WHEN 'rust-analyzer' THEN 'Rust' + WHEN 'cxx' THEN 'C++' + WHEN 'scip-ruby' THEN 'Ruby' + WHEN 'scip-python' THEN 'Python' + WHEN 'scip-dotnet' THEN 'C#' + ELSE toUpper(left(replace(row.scheme, 'scip-', ''), 1)) + substring(replace(row.scheme, 'scip-', ''), 1) + END, + node.typeName = row.type_name, + node.file = row.file, + node.packageId = row.package_id, + node.packageManager = row.package_manager, + node.version = row.version, + node.module = row.module, + node.isAbstract = (row.is_abstract = 'true'), + node.isTest = ( + row.file CONTAINS '/test/' OR + row.file CONTAINS '/tests/' OR + row.file CONTAINS '/spec/' OR + row.file CONTAINS '__tests__' OR + row.file ENDS WITH '_test.go' OR + row.file CONTAINS '.test.' OR + row.file CONTAINS '.spec.' OR + row.file CONTAINS '\\test\\' OR + row.file CONTAINS '\\tests\\' OR + row.file CONTAINS '\\spec\\' + ) diff --git a/domains/scip-index-import/queries/Link_SCIP_Artifact_CONTAINS_SCIP_ExternalType.cypher b/domains/scip-index-import/queries/Link_SCIP_Artifact_CONTAINS_SCIP_ExternalType.cypher new file mode 100644 index 000000000..5c6c10cf3 --- /dev/null +++ b/domains/scip-index-import/queries/Link_SCIP_Artifact_CONTAINS_SCIP_ExternalType.cypher @@ -0,0 +1,6 @@ +// Link SCIPArtifact nodes to their contained SCIPExternalType nodes via CONTAINS. Requires "Create_SCIP_Artifact_Nodes.cypher" and "Import_SCIP_Type_External_Nodes.cypher". + +MATCH (t:SCIPExternalType) +MATCH (a:SCIPArtifact {fqn: t.module + ' ' + t.version}) +MERGE (a)-[:CONTAINS]->(t) +RETURN count(*) AS writtenRelationships diff --git a/domains/scip-index-import/queries/Link_SCIP_Artifact_CONTAINS_SCIP_Module.cypher b/domains/scip-index-import/queries/Link_SCIP_Artifact_CONTAINS_SCIP_Module.cypher new file mode 100644 index 000000000..49fbbc8d8 --- /dev/null +++ b/domains/scip-index-import/queries/Link_SCIP_Artifact_CONTAINS_SCIP_Module.cypher @@ -0,0 +1,6 @@ +// Link SCIPArtifact nodes to their contained SCIPModule nodes via CONTAINS. Requires "Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher" and "Create_SCIP_Artifact_Nodes.cypher". + +MATCH (m:SCIPModule)-[:CONTAINS]->(t:SCIPInternalType) +MATCH (a:SCIPArtifact {fqn: t.module + ' ' + t.version}) +MERGE (a)-[:CONTAINS]->(m) +RETURN count(*) AS writtenRelationships diff --git a/domains/scip-index-import/queries/Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher b/domains/scip-index-import/queries/Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher new file mode 100644 index 000000000..84f28c250 --- /dev/null +++ b/domains/scip-index-import/queries/Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher @@ -0,0 +1,8 @@ +// Link SCIPModule nodes to their contained SCIPInternalType nodes via CONTAINS. Requires "Create_SCIP_Module_Nodes_For_Internal_Types.cypher" and "Import_SCIP_Type_Internal_Nodes.cypher". + +MATCH (t:SCIPInternalType) + WITH t + ,left(t.file, size(t.file) - size(split(t.file, '/')[-1]) - 1) AS directoryPath +MATCH (m:SCIPModule {fqn: directoryPath}) +MERGE (m)-[:CONTAINS]->(t) +RETURN count(*) AS writtenRelationships diff --git a/domains/scip-index-import/queries/Set_Incoming_SCIP_Type_Dependencies.cypher b/domains/scip-index-import/queries/Set_Incoming_SCIP_Type_Dependencies.cypher new file mode 100644 index 000000000..baf13bda0 --- /dev/null +++ b/domains/scip-index-import/queries/Set_Incoming_SCIP_Type_Dependencies.cypher @@ -0,0 +1,13 @@ +// Set incoming SCIP type dependencies. Requires "Import_SCIP_Type_Edges.cypher". + + MATCH (n:SCIPType) + WHERE n.incomingDependencies IS NULL +OPTIONAL MATCH (n)<-[r:DEPENDS_ON]-(source:SCIPType) + WHERE n <> source + WITH n + ,count(DISTINCT source.symbol) AS incomingDependencies + ,sum(r.referenceCount) AS incomingDependenciesWeight + SET n.incomingDependencies = incomingDependencies + ,n.incomingDependenciesWeight = incomingDependenciesWeight + RETURN n.fqn AS symbol + ,incomingDependencies diff --git a/domains/scip-index-import/queries/Set_Outgoing_SCIP_Type_Dependencies.cypher b/domains/scip-index-import/queries/Set_Outgoing_SCIP_Type_Dependencies.cypher new file mode 100644 index 000000000..aa86a87eb --- /dev/null +++ b/domains/scip-index-import/queries/Set_Outgoing_SCIP_Type_Dependencies.cypher @@ -0,0 +1,13 @@ +// Set outgoing SCIP type dependencies. Requires "Import_SCIP_Type_Edges.cypher". + + MATCH (n:SCIPType) + WHERE n.outgoingDependencies IS NULL +OPTIONAL MATCH (n)-[r:DEPENDS_ON]->(target:SCIPType) + WHERE n <> target + WITH n + ,count(DISTINCT target.symbol) AS outgoingDependencies + ,sum(r.referenceCount) AS outgoingDependenciesWeight + SET n.outgoingDependencies = outgoingDependencies + ,n.outgoingDependenciesWeight = outgoingDependenciesWeight + RETURN n.fqn AS symbol + ,outgoingDependencies diff --git a/domains/scip-index-import/queries/Set_SCIP_Module_Is_Test_And_Marker_Integer.cypher b/domains/scip-index-import/queries/Set_SCIP_Module_Is_Test_And_Marker_Integer.cypher new file mode 100644 index 000000000..d93335b16 --- /dev/null +++ b/domains/scip-index-import/queries/Set_SCIP_Module_Is_Test_And_Marker_Integer.cypher @@ -0,0 +1,8 @@ +// Set isTest and testMarkerInteger on SCIPModule nodes based on whether any contained SCIPInternalType is a test. Requires "Link_SCIP_Module_CONTAINS_SCIP_InternalType.cypher". + +MATCH (m:SCIPModule) +OPTIONAL MATCH (m)-[:CONTAINS]->(t:SCIPInternalType) + WITH m, true IN collect(t.isTest) AS hasTestType + SET m.isTest = hasTestType, + m.testMarkerInteger = CASE WHEN hasTestType THEN 1 ELSE 0 END +RETURN count(*) AS writtenNodes diff --git a/domains/scip-index-import/queries/Set_SCIP_Type_Test_Marker_Integer.cypher b/domains/scip-index-import/queries/Set_SCIP_Type_Test_Marker_Integer.cypher new file mode 100644 index 000000000..11b165447 --- /dev/null +++ b/domains/scip-index-import/queries/Set_SCIP_Type_Test_Marker_Integer.cypher @@ -0,0 +1,6 @@ +// Set testMarkerInteger on SCIP type nodes based on the isTest property set during import. Requires "Import_SCIP_Type_Internal_Nodes.cypher". + +MATCH (n:SCIPType) +WHERE n.testMarkerInteger IS NULL + SET n.testMarkerInteger = CASE WHEN n.isTest THEN 1 ELSE 0 END +RETURN count(*) AS writtenNodes