Skip to content

Commit 425ff5d

Browse files
authored
Merge pull request #323 from JohT/feature/add-git-history-resports-using-treemaps
Add git history file overview treemap
2 parents e677ecc + 0e7c645 commit 425ff5d

28 files changed

+1429
-321
lines changed

COMMANDS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ Here is the resulting schema:
264264

265265
#### Parameter
266266

267-
The optional parameter `--source directory-path-to-the-source-folder-containing-git-repositories` can be used to select a different directory for the repositories. By default, the `source` directory within the analysis workspace directory is used. This command only needs the git history to be present. Therefore, `git clone --bare` is sufficient. If the `source` directory is also used for code analysis (like for Typescript) then a full git clone is of course needed.
267+
The optional parameter `--source directory-path-to-the-source-folder-containing-git-repositories` can be used to select a different directory for the repositories. By default, the `source` directory within the analysis workspace directory is used. This command only needs the git history to be present. Therefore, `git clone --bare` is sufficient. If the `source` directory is also used for code analysis (like for Typescript) then a full git clone is of course needed. Additionally, if you want to focus on a specific version or branch, use `--branch branch-name` to checkout the branch and `--single-branch` to exclude other branches before importing the git log data.
268268

269269
#### Environment Variable
270270

GETTING_STARTED.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ Use these optional command line options as needed:
6666

6767
- If you want to analyze Typescript code, create a symbolic link inside the `source` directory that points to the Typescript project. Alternatively you can also copy the project into the `source` directory.
6868

69-
- If you want to include git data like changed files and authors, create a symbolic link inside the `source` directory that points to the repository or clone it into the `source` directory. If you already have your Typescript project in there, you of course don't have to do it twice. If you are analyzing Java artifacts (full source not needed), it is sufficient to use a bare clone that only contains the git history without the sources using `git clone --bare`.
69+
- If you want to include git data like changed files and authors, create a symbolic link inside the `source` directory that points to the repository or clone it into the `source` directory. If you already have your Typescript project in there, you of course don't have to do it twice. If you are analyzing Java artifacts (full source not needed), it is sufficient to use a bare clone that only contains the git history without the sources using `git clone --bare`. If you want to focus on one branch, use `--branch branch-name` to checkout the branch and `--single-branch` to only fetch the history of that branch.
7070
7171
- Alternatively to the steps above, run an already predefined download script
7272

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,7 @@ The [Code Structure Analysis Pipeline](./.github/workflows/internal-java-code-an
135135
- [numpy](https://numpy.org)
136136
- [pandas](https://pandas.pydata.org)
137137
- [pip](https://pip.pypa.io/en/stable)
138+
- [plotly](https://plotly.com/python)
138139
- [monotonic](https://github.com/atdt/monotonic)
139140
- [Neo4j Python Driver](https://neo4j.com/docs/api/python-driver)
140141
- [openTSNE](https://github.com/pavlin-policar/openTSNE)
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
// List how many git commits changed one file, how mandy changed two files, ....
2+
3+
MATCH (git_commit:Git:Commit)-[:CONTAINS_CHANGE]->(git_change:Git:Change)-[]->(git_file:Git:File)
4+
WITH git_commit, count(DISTINCT git_file.relativePath) AS filesPerCommit
5+
RETURN filesPerCommit, count(DISTINCT git_commit.sha) AS commitCount
6+
ORDER BY filesPerCommit ASC
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
// List git files with commit statistics
2+
3+
MATCH (git_file:File&Git&!Repository)
4+
WHERE git_file.deletedAt IS NULL // filter out deleted files
5+
WITH percentileDisc(git_file.createdAtEpoch, 0.5) AS medianCreatedAtEpoch
6+
,percentileDisc(git_file.lastModificationAtEpoch, 0.5) AS medianLastModificationAtEpoch
7+
,collect(git_file) AS git_files
8+
UNWIND git_files AS git_file
9+
WITH *
10+
,datetime.fromepochMillis(coalesce(git_file.createdAtEpoch, medianCreatedAtEpoch)) AS fileCreatedAtTimestamp
11+
,datetime.fromepochMillis(coalesce(git_file.lastModificationAtEpoch, git_file.createdAtEpoch, medianLastModificationAtEpoch)) AS fileLastModificationAtTimestamp
12+
MATCH (git_repository:Git&Repository)-[:HAS_FILE]->(git_file)
13+
MATCH (git_commit:Git&Commit)-[:CONTAINS_CHANGE]->(git_change:Git&Change)-->(old_files_included:Git&File&!Repository)-[:HAS_NEW_NAME*0..3]->(git_file)
14+
RETURN git_repository.name + '/' + git_file.relativePath AS filePath
15+
,split(git_commit.author, ' <')[0] AS author
16+
,count(DISTINCT git_commit.sha) AS commitCount
17+
,date(max(git_commit.date)) AS lastCommitDate
18+
,max(date(fileCreatedAtTimestamp)) AS lastCreationDate
19+
,max(date(fileLastModificationAtTimestamp)) AS lastModificationDate
20+
,duration.inDays(date(max(git_commit.date)), date()).days AS daysSinceLastCommit
21+
,duration.inDays(max(fileCreatedAtTimestamp), datetime()).days AS daysSinceLastCreation
22+
,duration.inDays(max(fileLastModificationAtTimestamp), datetime()).days AS daysSinceLastModification
23+
,max(git_commit.sha) AS maxCommitSha
24+
ORDER BY filePath ASCENDING, commitCount DESCENDING
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
// Check if there is at least one Git:Commit pointing to a Git:Change containing a Git:File from a Git:Repository
2+
3+
MATCH (commit:Git:Commit)-[:CONTAINS_CHANGE]->(change:Git:Change)-->(file:Git:File)
4+
MATCH (repository:Git:Repository)-[:HAS_FILE]->(file)
5+
RETURN commit.sha AS commitSha
6+
LIMIT 1

jupyter/ExternalDependenciesJava.ipynb

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -44,23 +44,16 @@
4444
},
4545
{
4646
"cell_type": "code",
47-
"execution_count": 235,
47+
"execution_count": null,
4848
"id": "c1db254b",
4949
"metadata": {},
5050
"outputs": [],
5151
"source": [
5252
"def get_cypher_query_from_file(filename):\n",
5353
" with open(filename) as file:\n",
54-
" return ' '.join(file.readlines())"
55-
]
56-
},
57-
{
58-
"cell_type": "code",
59-
"execution_count": 236,
60-
"id": "59310f6f",
61-
"metadata": {},
62-
"outputs": [],
63-
"source": [
54+
" return ' '.join(file.readlines())\n",
55+
"\n",
56+
"\n",
6457
"def query_cypher_to_data_frame(filename):\n",
6558
" records, summary, keys = driver.execute_query(get_cypher_query_from_file(filename))\n",
6659
" return pd.DataFrame([r.values() for r in records], columns=keys)"
@@ -1735,7 +1728,7 @@
17351728
"celltoolbar": "Tags",
17361729
"code_graph_analysis_pipeline_data_validation": "ValidateJavaExternalDependencies",
17371730
"kernelspec": {
1738-
"display_name": "Python 3 (ipykernel)",
1731+
"display_name": "codegraph",
17391732
"language": "python",
17401733
"name": "python3"
17411734
},
@@ -1749,7 +1742,7 @@
17491742
"name": "python",
17501743
"nbconvert_exporter": "python",
17511744
"pygments_lexer": "ipython3",
1752-
"version": "3.11.9"
1745+
"version": "3.12.9"
17531746
},
17541747
"title": "External Dependencies for Java"
17551748
},

jupyter/ExternalDependenciesTypescript.ipynb

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -51,16 +51,9 @@
5151
"source": [
5252
"def get_cypher_query_from_file(filename):\n",
5353
" with open(filename) as file:\n",
54-
" return ' '.join(file.readlines())"
55-
]
56-
},
57-
{
58-
"cell_type": "code",
59-
"execution_count": null,
60-
"id": "59310f6f",
61-
"metadata": {},
62-
"outputs": [],
63-
"source": [
54+
" return ' '.join(file.readlines())\n",
55+
"\n",
56+
"\n",
6457
"def query_cypher_to_data_frame(filename):\n",
6558
" records, summary, keys = driver.execute_query(get_cypher_query_from_file(filename))\n",
6659
" return pd.DataFrame([r.values() for r in records], columns=keys)"
@@ -1638,7 +1631,7 @@
16381631
"celltoolbar": "Tags",
16391632
"code_graph_analysis_pipeline_data_validation": "ValidateTypescriptModuleDependencies",
16401633
"kernelspec": {
1641-
"display_name": "Python 3 (ipykernel)",
1634+
"display_name": "codegraph",
16421635
"language": "python",
16431636
"name": "python3"
16441637
},
@@ -1652,7 +1645,7 @@
16521645
"name": "python",
16531646
"nbconvert_exporter": "python",
16541647
"pygments_lexer": "ipython3",
1655-
"version": "3.11.9"
1648+
"version": "3.12.9"
16561649
},
16571650
"title": "External Dependencies for Typescript"
16581651
},

0 commit comments

Comments
 (0)