CheckConsents
tool requires Python 3.12+.
CheckConsents
tool requires convert
from ImageMagick software.
$ convert --version
Version: ImageMagick 7.1.1-15 Q16-HDRI x86_64 21298 https://imagemagick.org
Copyright: (C) 1999 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC HDRI Modules OpenMP(4.5)
Delegates (built-in): bzlib cairo djvu fftw fontconfig freetype gslib gvc heic jbig jng jp2 jpeg jxl lcms lqr ltdl lzma openexr pangocairo png ps raqm raw rsvg tiff webp wmf x xml zip zlib
Compiler: gcc (13.2)
CheckConsents tool use Tesseract
to detect orientation of pages.
To work, it requires to have osd.traineddata
file available in resources
folder.
You can download it at: https://github.com/tesseract-ocr/tessdata/raw/3.04.00/osd.traineddata
You may need to set TESSDATA_PREFIX env var.
export TESSDATA_PREFIX="/path/to/folder/which/contains/traineddata_files"
It is easier to use container, see Docker section.
A development image can be built from the available Dockerfile
.
This image is base on python:3.12-slim
image.
We define a dockerfile to set up the perfect environment for CheckConsents
tool without installing any dependencies on your machine.
This image is not production ready. We recommend to use it only for dev and testing purpose.
$ docker build -t cad/checkconsents:1.0.0 .
$ docker run --rm cad/checkconsents:1.0.0
usage: checkconsents.py [-h] [-v] -i INPUT_FOLDER [-w WORKING_DIR]
[--configfile CONFIGFILE]
[--log_level {ERROR,error,WARNING,warning,INFO,info,DEBUG,debug}]
[--log_file LOG_FILE]
Check consent checkboxes into consent forms (pdf format)
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Inputs:
-i INPUT_FOLDER, --input_folder INPUT_FOLDER
path of input directory (default: None)
-w WORKING_DIR, --working_dir WORKING_DIR
Directory use to generate intermediates (default:
/tmp)
Config:
--configfile CONFIGFILE
configfile filepath (default:
/code/consentforms/checkconsents_config.yml)
Logger:
--log_level {ERROR,error,WARNING,warning,INFO,info,DEBUG,debug}
log level (default: INFO)
--log_file LOG_FILE log file (use the stderr by default) (default: None)
Vulnerabilities scan of docker image
SBOM description of docker image
$ ./consentforms/checkconsents.py --help
usage: checkconsents.py [-h] [-v] -i INPUT_FOLDER [-w WORKING_DIR]
[--configfile CONFIGFILE]
[--log_level {ERROR,error,WARNING,warning,INFO,info,DEBUG,debug}]
[--log_file LOG_FILE]
Check consent checkboxes into consent forms (pdf format)
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
Inputs:
-i INPUT_FOLDER, --input_folder INPUT_FOLDER
path of input directory (default: None)
-w WORKING_DIR, --working_dir WORKING_DIR
Directory use to generate intermediates (default:
/tmp)
Config:
--configfile CONFIGFILE
configfile filepath (default:
./checkconsents_config.yml)
Logger:
--log_level {ERROR,error,WARNING,warning,INFO,info,DEBUG,debug}
log level (default: INFO)
--log_file LOG_FILE log file (use the stderr by default) (default: None)
$ mkdir target
$ ./consentforms/checkconsents.py --input_folder ./tests/resources/initial_data --configfile resources/checkconsents_config.yml -w ./target
$ docker run --rm \
-v /etc/localtime:/etc/localtime:ro \
-v ./target:/code/target \
-v ./resources:/code/resources \
-v ./tests/resources:/code/tests/resources cad/checkconsents:1.0.0 \
--input_folder ./tests/resources/initial_data \
--configfile resources/checkconsents_config.yml \
-w /code/target
CheckConsents
tool creates an output folder (like CheckConsents_20231102-141811
) into the working directory.
Inside this directory, CheckConsents
tool generates a json file with all results.
If some forms have not been resolved a debugging image is generated for a human review. In this specific case, CheckConsents
exits with code = 3.
The debug level, enabled by adding --log_level debug
to the command line, increases the verbosity of logs and generates by default all final debugging images.
You can use the config file is to define all templates to use and adding/change value for some parameters.
You have an example here: [resources/checkconsents_config.yml] (resources/checkconsents_config.yml).
var | description | default value |
---|---|---|
intermediates | keep intermediate files | False |
parsing_header_limit | max of lines read (correspond to the max size of title). Must be an integer greater than 0. | 5 |
|
Define specific parameter for pdf conversion to png | density = 300 opt_args=None |
|
Dict to describe all your templates | None |
|
describe the patterns to identify the right page to analyse | None |
code | description |
---|---|
0 | Success |
1 | input directory does not exist |
2 | error during pdf conversion to png |
3 | Some form can be resolved. See logs for further details |
Filename | "Research usage yes" checked | "Research usage no" checked | Use for research agreed | Result of the automatic detection |
---|---|---|---|---|
Consentement_1.pdf | yes | no | yes | yes |
Consentement_2.pdf | no | yes | no | no |
Consentement_3.pdf | yes | no | yes | undetermined |
Consentement_4.pdf | no | yes | no | no |
Consentement_5.pdf | yes | no | yes | yes |
Consentement_6.pdf | yes | no | yes | yes |
Consentement_7.pdf | no | yes | no | no |
Consentement_8.pdf | yes | no | yes | yes |
Consentement_9.pdf | yes | no | yes | yes |
Consentement_10.pdf | yes | no | yes | yes |
Consentement_11.pdf | yes | no | yes | yes |
Consentement_12.pdf | no | yes | no | no |
Consentement_13.pdf | no | yes | no | no |
Consentement_14.pdf | no | yes | no | no |
Consentement_15.pdf | yes | no | yes | yes |
Consentement_16.pdf | no | yes | no | no |
Consentement_17.pdf | no | yes | no | no |
Consentement_18.pdf | no | yes | no | no |
Consentement_19.pdf | no | yes | no | no |
Consentement_20.pdf | no | yes | no | no |
Consentement_21.pdf | yes | no | yes | yes |
Consentement_22.pdf | no | yes | no | no |
Consentement_23.pdf | no | yes | no | no |
Consentement_24.pdf | no | yes | no | no |
Consentement_25.pdf | yes | no | yes | yes |
Consentement_26.pdf | yes | no | yes | yes |
Consentement_27.pdf | yes | no | yes | yes |
Consentement_28.pdf | yes | no | yes | yes |
Consentement_29.pdf | yes | no | yes | yes |
Consentement_30.pdf | yes | no | yes | yes |
Consentement_31.pdf | yes | no | yes | yes |
Consentement_32.pdf | no | yes | no | no |
Consentement_33.pdf | yes | no | yes | yes |
Consentement_34.pdf | yes | no | yes | yes |
These predictions were obtained using default CheckConsents parameters for conversion of pdf into png with a density of 300 dpi, and no colorscale transformation.
Note
Modification of curent values of CheckConsents parameters may decrease file size of png and therefore increase execution time but may also impact accuracy of the detection.
The json output from checkconsents could be parsed to generate other formats or combined with other informations.
There is an example of parser which generate a csv file from the consents.json
file.
It creates a table with the following header:
"Input directory","Input filename","Result of the automatic detection","Debug image filename","Output directory"
You can use virtualenv locally or use the docker compose file: compose-dev.yml
.
We recommend to use docker which makes easier the management of environment.
$ python -m venv env
$ source env/bin/activate
(env) $ pip install -r requirements.txt
It requires Docker and Docker Compose installed.
All folders and files form the repository are available into /code
directory.
$ mkdir target # temp folder for unit test
$ touch .env
$ docker compose -f compose-dev.yml up -d
$ docker compose -f compose-dev.yml ps
$ docker exec -it checkconsents-app-1 /bin/bash
cad@853796383c04:/code$ source dev.docker.bashrc # load aliases (optional but recommended)
[cad@docker-853796383c04 /code] [07.11.2023 09:25:52] $
We use PyTest to define and run our tests.
Run all tests (in dev container):
[cad@docker-853796383c04 /code] [07.11.2023 09:25:52] $ pytest
Run test of a specific submodule (in dev container):
[cad@docker-853796383c04 /code] [07.11.2023 09:25:52] $ # pytest tests/test_<submodule name>.py
[cad@docker-853796383c04 /code] [07.11.2023 09:25:52] $ pytest tests/test_templates.py
# list data used for tests
[cad@docker-853796383c04 /code] [07.11.2023 09:35:42] $ ll /tmp/pytest-of-cad/pytest-current/datacurrent/
total 8984
drwx------. 1 cad cad 226 Nov 7 09:34 .
drwx------. 1 cad cad 140 Nov 7 09:34 ..
-rw-r--r--. 1 cad cad 2781876 Nov 7 09:34 functests_consent-0.png
-rw-r--r--. 1 cad cad 1354738 Nov 7 09:34 functests_consent-1.png
-rw-r--r--. 1 cad cad 1348930 Nov 7 09:34 functests_consent-2.png
-rw-r--r--. 1 cad cad 2130168 Nov 7 09:34 functests_consent-3.png
-rw-r--r--. 1 cad cad 1571076 Nov 7 09:34 functests_consent.pdf
Compute test coverage (in dev container):
[cad@docker-853796383c04 /code] [30.11.2023 14:53:29] $ pytest --cov=consentforms
Generate a test report (report.html
, .coverage
, htmlcov/index.html
) (in dev container):
[cad@docker-853796383c04 /code] [30.11.2023 14:53:29] $ pytest --html=report.html
# with test coverage
[cad@docker-853796383c04 /code] [30.11.2023 14:53:29] $ pytest --html=report.html --cov=consentforms
# with test coverage report
[cad@docker-853796383c04 /code] [30.11.2023 14:53:29] $ pytest --html=report.html --cov=consentforms --cov-report html
Using local virtualenv:
(env) $ python consentforms/checkconsents.py --input_folder ./tests/resources --configfile resources/checkconsents_config.yml --log_level debug -w ./target
(env) $ python -m pdb consentforms/checkconsents.py --input_folder ./tests/resources --configfile resources/checkconsents_config.yml --log_level debug -w ./target
Analyse all example files from tests/resources/initial_data
:
$ mkdir target
$ docker run --rm \
-v /etc/localtime:/etc/localtime:ro \
-v ./target:/code/target \
-v ./resources:/code/resources \
-v ./tests/resources:/code/tests/resources \
cad/checkconsents:1.0.0 \
--input_folder ./tests/resources/initial_data \
--configfile resources/checkconsents_config.yml \
--working_dir /code/target
# NB if you have timezone set on your host add the following docker option : -v /etc/timezone:/etc/timezone:ro
$ docker build -t cad/checkconsents:test .
$ docker run --rm \
-v /etc/localtime:/etc/localtime:ro \
-v ./target:/code/target \
-v ./consentforms:/code/consentforms \
-v ./resources:/code/resources \
-v ./tests/resources:/code/tests/resources \
cad/checkconsents:test \
--input_folder ./tests/resources \
--configfile resources/checkconsents_config.yml \
--log_level debug \
-w /code/target
# debug
$ docker run --rm -it --entrypoint python \
-v /etc/localtime:/etc/localtime:ro \
-v ./target:/code/target \
-v ./consentforms:/code/consentforms \
-v ./resources:/code/resources \
-v ./tests/resources:/code/tests/resources \
cad/checkconsents:test \
-m pdb /code/consentforms/checkconsents.py \
--input_folder ./tests/resources \
--configfile resources/checkconsents_config.yml \
--log_level debug \
-w /code/target
CheckConsents project is developed by David Salgado [email protected] and Adrien Josso Rigonato [email protected]. Expression of the need and test data were created by Cécile Meslier.
The project is supported by Collecteur Analyseur de Données (CAD)
.
GNU AFFERO General Public License v3
See Licence for further details.
Checkbox detection is based on simpleomr
script from RescueOMR.
Copyright(c) 2016-2017: Yuri D'Elia [email protected]
Copyright(c) 2016-2017: EURAC, Institute of Genetic Medicine
Thanks to the work of Yuri D'Elia