Pre-commit hooks are scripts that execute before a code commit is finalised. Their purpose is to perform some action whenever a commit is made. This takes place before the actual commit is added, and depending on the hook result, may block the commit being made. Hooks can be set up to validate the code, check for formatting errors, ensure that tests pass, and perform various other checks you can define. Here we are just using it to run a gitleaks check on all files included in a commit.
This guide is intended as a 'how-to' for aligning with the guidance in the DDaT playbook. See also our SECRETS page.
- (Once only) Create a new
condaenvironment with thepre-commitpackage. Can name it whatever you like, I usedev.
conda create -n dev -c conda-forge python=3.11 pre-commitThe following steps should be repeated on all repos.
- Activate the new environment.
conda activate dev- Make
pre-commitactive. This tellsgitto run any definedpre-commithooks on every commit. The hooks will be defined in a YAML config, calledpre-commit-config.yaml, which is placed in eachgitrepo - see Configuration.
pre-commit install- Deactivate the
condaenvironment.
conda deactivateYou do not need to activate the environment for pre-commit to run in future. If you need to temporarily disable pre-commit (for example, to test the GitHub action for gitleaks is working correctly), you can run
conda activate dev
pre-commit uninstallthen turn it back on with
pre-commit installAn alternative method is to get steps 2-4 to be done automatically, whenever you create or clone a new repo.
- Ensure you are in the environment with the
pre-commitpackage available.
conda activate dev- Create a template git repo.
git config --global init.templateDir ~/.git-template- Install
pre-commitin the template repo
pre-commit init-templatedir ~/.git-templateNote that this will not apply to existing repos, only new ones created by either git init or git clone. However, you can easily copy the hook from the template repo (make sure you are in your repo first).
cp ~/.git-template/hooks/pre-commit .git/hooks/pre-commitThe result of the above will be that pre-commit will run if there is a pre-commit-config.yaml file in the repo. If there is not, then it will silently allow commits. If you want to ensure you do not forget to add the config, add a warning.
- Open the template repo hook.
code ~/.git-template/hooks/pre-commit- Add the following directly after the
#!/usr/bin/env bash. You can leave theexit 0if you prefer to allow commits when there is no config file by default, or replace withexit 1to prevent them by default.
# --- START CUSTOM WARNING ---
if [ ! -f .pre-commit-config.yaml ]; then
# Bold Red text for visibility on Light and Dark themes
echo -e "\033[1;31m[WARNING] No .pre-commit-config.yaml found in this repository.\033[0m"
echo -e "\033[1;31m Pre-commit checks are NOT active.\033[0m"
# Exit successfully (0) so the commit proceeds, or fail (1) if you want to block it
exit 0
fi
# --- END CUSTOM WARNING ---Getting pre-commit installed on laptop is not as straightforward as on AVD.
- MS Store python - The Microsoft Store version of python is what is installed on laptops. This is a poor version for development. The issue here seems to be the 256 character limit on file paths. The solution was to use
condato create a dedicated virtual environment with it's own python (just like done on the AVD). - Installing
conda- Above issue lead to the need to installcondaon my laptop. I tried several methods, but the one that finally worked was to installminiforge. Unfortunately, without further setup that I did not feel like trying (as may not be possible as non-admin) this means using the command line interface that comes withminiforge. This is worse to use than git bash, but at least it works for this use case, and is not needed other than starting/stoppingpre-commit.
Once you have got some form of conda running, follow the same steps as as for AVD.
- Go to the Gitleaks Releases Page.
- Download the latest windows archive (e.g., gitleaks_8.30.0_windows_x64.zip) - you may need to click the 'Show all assets' link.
- Extract the zip.
- Place it in a folder of your choice (e.g.,
C:\ProgramData). - Add the full path to your user
PATHenvironment variable (e.g.C:\ProgramData\gitleaks_8.30.0_windows_x64). You can do this by searching for "env" in the Windows search and choosing the "Edit environment variables for your account" option, then finding and clicking "Edit" for thePATHvariable.
This will make gitleaks globally available for you.
Each repo that you wish to have the pre-commit hook run for requires a file named pre-commit-config.yaml. This file can contain any number of hooks. For our use case of running gitleaks we use
repos:
- repo: local
hooks:
- id: gitleaks
name: Detect hardcoded secrets
description: Detect hardcoded secrets using Gitleaks with centralised NHSBSA config
entry: >
bash -c '
mkdir -p .github-config &&
curl -f -s https://raw.githubusercontent.com/nhsbsa-data-analytics/.github/main/gitleaks.toml -o .github-config/gitleaks.toml &&
curl -f -s https://raw.githubusercontent.com/nhsbsa-data-analytics/.github/main/gitleaks-nhsbsa.toml -o .github-config/gitleaks-nhsbsa.toml &&
gitleaks protect --config=".github-config/gitleaks.toml" --verbose --redact --staged;
EXIT_CODE=$?;
rm -rf .github-config;
exit $EXIT_CODE'
language: system
pass_filenames: false
This can be copied from the repo_files folder of this repo: .pre-commit-config.yaml.
This hook will initiate the following actions every time you run git commit ...:
- Create a temporary folder
.github-config. - Download the
gitleaksdefinition files (TOMLs) from our.githubrepo. - Run
gitleaksusing those definitions. - Delete the temporary folder.
- Finally, output the
gitleaksmessages.
If the hook passes, it means no secrets were detected and the commit is allowed. If the hook fails, it means some secrets were discovered - gitleaks will output details of the leaks and also prevent the commit from actually happening. Resolve accordingly before trying to commit again.
The nice thing about pre-commit hooks is that once setup, they run automatically. So you just use git as you usually would.
We try to add a file gitleaks_tests with content
NHS number:
1234567890
DB creds:
DB_DALP_USERNAME = "ABCDE"
DB_DALP_PASSWORD = "qwertyuiop"Commit it (-am means add all files and use given message)
git commit -am "Expect to fail"The output is
Detect hardcoded secrets.................................................Failed
- hook id: gitleaks
- exit code: 1
○
│╲
│ ○
○ ░
░ gitleaks
Finding: REDACTED
Secret: REDACTED
RuleID: nhs-number
Entropy: 3.321928
Tags: [secret pii nhs]
File: gitleaks_tests
Line: 4
Fingerprint: gitleaks_tests:nhs-number:4
Finding: DB_DALP_USERNAME = "REDACTED"
Secret: REDACTED
RuleID: database-connection-strings
Entropy: 2.321928
Tags: [secret credential database]
File: gitleaks_tests
Line: 19
Fingerprint: gitleaks_tests:database-connection-strings:19
Finding: DB_DALP_PASSWORD = "REDACTED"
Secret: REDACTED
RuleID: database-connection-strings
Entropy: 3.321928
Tags: [secret credential database]
File: gitleaks_tests
Line: 20
Fingerprint: gitleaks_tests:database-connection-strings:20
12:34PM INF 0 commits scanned.
12:34PM INF scanned ~88 bytes (88 bytes) in 121ms
12:34PM WRN leaks found: 3
This gives you details of all potential leaks, so you can easily see why they were flagged and where they are in your code. For how to allow a false positive, see Allowing false positives.
If you detect a secret, you must immediately follow the remediation plan from your risk assessment and take steps to remove the secret from wherever they are used. Additionally, where possible, a new rule should be added to the gitleaks TOML file.
When no potential secrets are found the output will show the check passed, along with the normal output from making a commit, since the commit was allowed.
Detect hardcoded secrets.................................................Passed
[gitleaks-update 2ca6879] Expect to pass
1 file changed, 3 insertions(+)
Because the gitleaks command is set to run only on files to be committed, it will temporarily stash unstaged files, run the check and then pop the stash to restore the unstaged files. It looks like
[WARNING] Unstaged files detected.
[INFO] Stashing unstaged files to C:\Users\MAMCP\.cache\pre-commit\patch1764593063-8128.
Detect hardcoded secrets.............................(no files to check)Skipped
[INFO] Restored changes from C:\Users\MAMCP\.cache\pre-commit\patch1764593063-8128.
[gitleaks-update 277fab8] Empty commit
It is possible that your code includes something that matches a gitleaks rule, but is not a leaked secret. These can be selectively allowed. You can use the #gitleaks:allow comment on the line.
fake_nhs_number = 1234567890 #gitleaks:allowAn alternative, but experimental, method is to use a .gitleaksignore file and add the Fingerprint from the gitleaks detection output.
These methods are best used for different cases. When introducing secrets detection with gitleaks, you should do a full history scan on existing commits (i.e. the full code base).
Add gitleaks.json and .github-config/ to the .gitignore file to prevent accidentally committing it. The commands below (can copy and paste all in one go and hit return on the final one) will temporarily copy the centralised TOML files and run the full scan.
mkdir -p .github-config/gitleaks
curl -f -s https://raw.githubusercontent.com/nhsbsa-data-analytics/.github/main/gitleaks/gitleaks.toml -o .github-config/gitleaks/gitleaks.toml
curl -f -s https://raw.githubusercontent.com/nhsbsa-data-analytics/.github/main/gitleaks/gitleaks-nhsbsa.toml -o .github-config/gitleaks/gitleaks-nhsbsa.toml
gitleaks detect --config .github-config/gitleaks/gitleaks.toml --report-path leak-report.json
rm -rf .github-configAny detections will be output in gitleaks.json. False positives in the results can be ignored by adding their fingerprint to a .gitleaksignore file, like
b546cbaf5b7526dae0a2cfaf772000748f81e7b0:test/gitleaks_tests:nhs-number:4
b546cbaf5b7526dae0a2cfaf772000748f81e7b0:test/gitleaks_tests:nhs-number:8
If you detect a secret, you must immediately follow the remediation plan from your risk assessment and take steps to remove the secret from wherever they are used. Additionally, where possible, a new rule should be added to the gitleaks TOML file.
However, using the ignore file method is brittle, as it references specific lines of code. For this reason it is best to use only for existing code, when gitleaks is introduced in a repo. For any new code, after you do the initial history scan, use allow comments as these will move with the code when it moves.
Rule of thumb:
- for existing code (before gitleaks) with a
.gitleaksignore - new code (after gitleaks) with
#gitleaks:allow