Skip to content

Add git history file overview treemap #323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion COMMANDS.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ Here is the resulting schema:

#### Parameter

The optional parameter `--source directory-path-to-the-source-folder-containing-git-repositories` can be used to select a different directory for the repositories. By default, the `source` directory within the analysis workspace directory is used. This command only needs the git history to be present. Therefore, `git clone --bare` is sufficient. If the `source` directory is also used for code analysis (like for Typescript) then a full git clone is of course needed.
The optional parameter `--source directory-path-to-the-source-folder-containing-git-repositories` can be used to select a different directory for the repositories. By default, the `source` directory within the analysis workspace directory is used. This command only needs the git history to be present. Therefore, `git clone --bare` is sufficient. If the `source` directory is also used for code analysis (like for Typescript) then a full git clone is of course needed. Additionally, if you want to focus on a specific version or branch, use `--branch branch-name` to checkout the branch and `--single-branch` to exclude other branches before importing the git log data.

#### Environment Variable

Expand Down
2 changes: 1 addition & 1 deletion GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Use these optional command line options as needed:

- If you want to analyze Typescript code, create a symbolic link inside the `source` directory that points to the Typescript project. Alternatively you can also copy the project into the `source` directory.

- If you want to include git data like changed files and authors, create a symbolic link inside the `source` directory that points to the repository or clone it into the `source` directory. If you already have your Typescript project in there, you of course don't have to do it twice. If you are analyzing Java artifacts (full source not needed), it is sufficient to use a bare clone that only contains the git history without the sources using `git clone --bare`.
- If you want to include git data like changed files and authors, create a symbolic link inside the `source` directory that points to the repository or clone it into the `source` directory. If you already have your Typescript project in there, you of course don't have to do it twice. If you are analyzing Java artifacts (full source not needed), it is sufficient to use a bare clone that only contains the git history without the sources using `git clone --bare`. If you want to focus on one branch, use `--branch branch-name` to checkout the branch and `--single-branch` to only fetch the history of that branch.

- Alternatively to the steps above, run an already predefined download script

Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ The [Code Structure Analysis Pipeline](./.github/workflows/internal-java-code-an
- [numpy](https://numpy.org)
- [pandas](https://pandas.pydata.org)
- [pip](https://pip.pypa.io/en/stable)
- [plotly](https://plotly.com/python)
- [monotonic](https://github.com/atdt/monotonic)
- [Neo4j Python Driver](https://neo4j.com/docs/api/python-driver)
- [openTSNE](https://github.com/pavlin-policar/openTSNE)
Expand Down
6 changes: 6 additions & 0 deletions cypher/GitLog/List_git_files_per_commit_distribution.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// List how many git commits changed one file, how mandy changed two files, ....

MATCH (git_commit:Git:Commit)-[:CONTAINS_CHANGE]->(git_change:Git:Change)-[]->(git_file:Git:File)
WITH git_commit, count(DISTINCT git_file.relativePath) AS filesPerCommit
RETURN filesPerCommit, count(DISTINCT git_commit.sha) AS commitCount
ORDER BY filesPerCommit ASC
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// List git files with commit statistics

MATCH (git_file:File&Git&!Repository)
WHERE git_file.deletedAt IS NULL // filter out deleted files
WITH percentileDisc(git_file.createdAtEpoch, 0.5) AS medianCreatedAtEpoch
,percentileDisc(git_file.lastModificationAtEpoch, 0.5) AS medianLastModificationAtEpoch
,collect(git_file) AS git_files
UNWIND git_files AS git_file
WITH *
,datetime.fromepochMillis(coalesce(git_file.createdAtEpoch, medianCreatedAtEpoch)) AS fileCreatedAtTimestamp
,datetime.fromepochMillis(coalesce(git_file.lastModificationAtEpoch, git_file.createdAtEpoch, medianLastModificationAtEpoch)) AS fileLastModificationAtTimestamp
MATCH (git_repository:Git&Repository)-[:HAS_FILE]->(git_file)
MATCH (git_commit:Git&Commit)-[:CONTAINS_CHANGE]->(git_change:Git&Change)-->(old_files_included:Git&File&!Repository)-[:HAS_NEW_NAME*0..3]->(git_file)
RETURN git_repository.name + '/' + git_file.relativePath AS filePath
,split(git_commit.author, ' <')[0] AS author
,count(DISTINCT git_commit.sha) AS commitCount
,date(max(git_commit.date)) AS lastCommitDate
,max(date(fileCreatedAtTimestamp)) AS lastCreationDate
,max(date(fileLastModificationAtTimestamp)) AS lastModificationDate
,duration.inDays(date(max(git_commit.date)), date()).days AS daysSinceLastCommit
,duration.inDays(max(fileCreatedAtTimestamp), datetime()).days AS daysSinceLastCreation
,duration.inDays(max(fileLastModificationAtTimestamp), datetime()).days AS daysSinceLastModification
,max(git_commit.sha) AS maxCommitSha
ORDER BY filePath ASCENDING, commitCount DESCENDING
6 changes: 6 additions & 0 deletions cypher/Validation/ValidateGitHistory.cypher
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// Check if there is at least one Git:Commit pointing to a Git:Change containing a Git:File from a Git:Repository

MATCH (commit:Git:Commit)-[:CONTAINS_CHANGE]->(change:Git:Change)-->(file:Git:File)
MATCH (repository:Git:Repository)-[:HAS_FILE]->(file)
RETURN commit.sha AS commitSha
LIMIT 1
19 changes: 6 additions & 13 deletions jupyter/ExternalDependenciesJava.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -44,23 +44,16 @@
},
{
"cell_type": "code",
"execution_count": 235,
"execution_count": null,
"id": "c1db254b",
"metadata": {},
"outputs": [],
"source": [
"def get_cypher_query_from_file(filename):\n",
" with open(filename) as file:\n",
" return ' '.join(file.readlines())"
]
},
{
"cell_type": "code",
"execution_count": 236,
"id": "59310f6f",
"metadata": {},
"outputs": [],
"source": [
" return ' '.join(file.readlines())\n",
"\n",
"\n",
"def query_cypher_to_data_frame(filename):\n",
" records, summary, keys = driver.execute_query(get_cypher_query_from_file(filename))\n",
" return pd.DataFrame([r.values() for r in records], columns=keys)"
Expand Down Expand Up @@ -1735,7 +1728,7 @@
"celltoolbar": "Tags",
"code_graph_analysis_pipeline_data_validation": "ValidateJavaExternalDependencies",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "codegraph",
"language": "python",
"name": "python3"
},
Expand All @@ -1749,7 +1742,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.12.9"
},
"title": "External Dependencies for Java"
},
Expand Down
17 changes: 5 additions & 12 deletions jupyter/ExternalDependenciesTypescript.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -51,16 +51,9 @@
"source": [
"def get_cypher_query_from_file(filename):\n",
" with open(filename) as file:\n",
" return ' '.join(file.readlines())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59310f6f",
"metadata": {},
"outputs": [],
"source": [
" return ' '.join(file.readlines())\n",
"\n",
"\n",
"def query_cypher_to_data_frame(filename):\n",
" records, summary, keys = driver.execute_query(get_cypher_query_from_file(filename))\n",
" return pd.DataFrame([r.values() for r in records], columns=keys)"
Expand Down Expand Up @@ -1638,7 +1631,7 @@
"celltoolbar": "Tags",
"code_graph_analysis_pipeline_data_validation": "ValidateTypescriptModuleDependencies",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "codegraph",
"language": "python",
"name": "python3"
},
Expand All @@ -1652,7 +1645,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.12.9"
},
"title": "External Dependencies for Typescript"
},
Expand Down
Loading
Loading