Docker container for automatic scanner PDF processing. Optimizes black & white and color scans and uploads them directly to Paperless-ngx via API. Multi-page document assembly is triggered via a Home Assistant button.
See CHANGELOG.md for version history.
The container watches an input directory using inotifywait. When the scanner drops a file, it is processed immediately (single mode) or collected with other pages and processed after a trigger (multi mode).
The /data directory is mounted into the container as a bind volume. A typical setup is to back it with an SMB share on the host so the scanner can drop files directly into /data/import over the network. Mounting the share on the host is done outside the container. The container only watches the directory.
Black & white (scan-bw*.pdf): ImageMagick – deskew, normalize, posterize, LZW compression at BW_DPI DPI (default: 300)
Color (everything else): Ghostscript – bicubic downsampling to 300 DPI, /ebook preset
Processing settings are optimized for the HP LaserJet MFP M130fw at a scan resolution of 600 DPI. Both pipelines are fully configurable via
BW_DPI,BW_PARAMSandCOLOR_PARAMSwithout rebuilding the image.
Default BW_PARAMS (ImageMagick, parameters between input and output):
-chop 5x5 -deskew 60% +repage -strip -interlace Plane -normalize -posterize 3 +dither -compress LZW
Full command: magick -density $BW_DPI <input> $BW_PARAMS <output>
Default COLOR_PARAMS (Ghostscript, parameters before -sOutputFile):
-q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dColorImageDownsampleType=/Bicubic -dColorImageResolution=300 -dGrayImageDownsampleType=/Bicubic -dGrayImageResolution=300 -dFastWebView=false
Full command: gs $COLOR_PARAMS -sOutputFile=<output> <input>
Every incoming PDF is processed immediately on arrival. Use this if your printer is capable of producing multi-page PDFs natively — in that case the entire multi-page document arrives as a single file and no merging or trigger is needed. Home Assistant integration and the /trigger endpoint can be ignored entirely in this mode.
Use this if your printer creates one separate PDF per scanned page and cannot combine them into a single file itself. The first incoming scan starts a collection phase. Additional pages keep arriving as individual files. Once the Home Assistant button fires the trigger (or BUTTON_PAUSE seconds elapse), all collected PDFs are merged, processed, and uploaded to Paperless.
Scan page 1 → scan page 2 → ... → HA button → merge → process → upload to Paperless
docker run -d \
-v /path/to/data:/data \
-p 8080:8080 \
-e PAPERLESS_URL=http://paperless:8000 \
-e PAPERLESS_TOKEN=your-api-token \
scan-treatment:latestAll settings are controlled via environment variables.
| Variable | Default | Description |
|---|---|---|
WATCH_DIR |
/data/import/ |
Directory watched for incoming scans |
EXPORT_DIR |
/data/export/ |
Output directory when Paperless is not configured |
SW_PATTERN |
scan-bw |
Filename prefix identifying B&W scans |
MULTI_PATTERN |
multi |
Substring identifying multi-page scans |
DISABLE_MULTI |
false |
true = ignore MULTI_PATTERN, process every file immediately |
BW_DPI |
300 |
DPI density for ImageMagick B&W processing |
BW_PARAMS |
(see above) | ImageMagick parameters for B&W processing |
COLOR_PARAMS |
(see above) | Ghostscript parameters for color processing |
PAPERLESS_URL |
– | Base URL of your Paperless-ngx instance |
PAPERLESS_TOKEN |
– | API token – found in the Django admin panel (/admin) under Tokens |
HTTP_PORT |
8080 |
Port for the HTTP trigger endpoint |
BUTTON_PAUSE |
1800 |
Seconds to wait for trigger before auto-proceeding |
FAIL_PAUSE |
60 |
Seconds between Paperless upload retries |
TG_API_KEY |
– | Telegram bot token (optional) |
TG_CHAT_ID |
– | Telegram chat ID (optional) |
TG_NOTIFY_SUCCESS |
false |
true = also send Telegram notification on successful uploads |
PRINTER_NOTIFY |
false |
true = update the matching scan job's display name in the printer menu after each batch |
PRINTER_IP |
– | Printer IP address. Required for PRINTER_NOTIFY=true and all /scan/* endpoints. eSCL scanning is disabled if this is not set. |
PRINTER_USER |
– | Optional filter: only update jobs whose display name contains this string. Useful when multiple containers share one printer. |
ESCL_BW_DPI |
600 |
DPI for B&W scans via /scan/single/bw and /scan/multi/bw |
ESCL_COLOR_DPI |
600 |
DPI for color scans via /scan/single/color and /scan/multi/color |
TZ |
Europe/Berlin |
Timezone for log timestamps and output filenames (e.g. Europe/London, America/New_York) |
All endpoints accept GET and POST. The container exposes them on HTTP_PORT (default 8080).
| Path | Description |
|---|---|
/trigger |
Start multi-page processing |
/health |
Health check, returns 200 OK |
/status |
Current scan state as JSON |
/scan/single/bw |
Scan one B&W page and process immediately (requires PRINTER_IP) |
/scan/single/color |
Scan one color page and process immediately (requires PRINTER_IP) |
/scan/multi/bw |
Add a B&W page to the current multi-page batch (requires PRINTER_IP) |
/scan/multi/color |
Add a color page to the current multi-page batch (requires PRINTER_IP) |
/scan/multi/next |
Add a page in the same mode as the current batch, auto-detects B&W or color (requires PRINTER_IP) |
# trigger manually
curl http://<host>:8080/trigger
# current state
curl http://<host>:8080/status
# scan single B&W page and upload to Paperless
curl http://<host>:8080/scan/single/bw
# add pages to a multi-page batch, then trigger processing
curl http://<host>:8080/scan/multi/bw
curl http://<host>:8080/scan/multi/next
curl http://<host>:8080/triggerThe /scan/* endpoints trigger the scanner directly via the eSCL (AirPrint Scan) protocol — no SMB share or physical button press required. The printer handles the scan and the container downloads the result automatically.
eSCL is only active when PRINTER_IP is set. Without it, all /scan/* endpoints return 503 Service Unavailable. To disable eSCL scanning while keeping PRINTER_NOTIFY active, simply remove PRINTER_IP — but note that PRINTER_NOTIFY also requires it, so set PRINTER_NOTIFY=false as well if you want neither feature.
eSCL must also be enabled on the printer itself. On HP printers this is typically found under EWS → Networking → eSCL (AirPrint Scan). For a list of printers known to support eSCL, see the sane-airscan tested devices list.
All /scan/* endpoints respond with 202 Accepted immediately and run the scan in the background, so Home Assistant rest_command calls do not time out.
Returns a JSON object with the current container state:
{"state":"collecting","mode":"bw","pages":3,"last_result":"ok","last_time":"14:32"}| Field | Values | Description |
|---|---|---|
state |
idle, scanning, collecting, processing |
Current activity |
mode |
bw, color, null |
Active batch mode |
pages |
integer | Pages collected in current batch |
last_result |
ok, err, null |
Result of last completed batch |
last_time |
HH:MM, null |
Time of last completed batch |
Example: Home Assistant sensor
# configuration.yaml
rest:
- resource: http://192.168.1.x:8080/status
scan_interval: 5
sensor:
- name: "Scanner Status"
unique_id: scanner_status
value_template: "{{ value_json.state }}"
json_attributes:
- mode
- pages
- last_result
- last_timeExample: Home Assistant buttons
# configuration.yaml
rest_command:
scan_trigger:
url: http://192.168.1.x:8080/trigger
method: POST
scan_single_bw:
url: http://192.168.1.x:8080/scan/single/bw
method: POST
scan_multi_bw:
url: http://192.168.1.x:8080/scan/multi/bw
method: POST
scan_multi_color:
url: http://192.168.1.x:8080/scan/multi/color
method: POSTThe scanner profile name determines both the processing mode and the pipeline. Configure your scanner to use these prefixes:
| Scanner profile / filename | Mode | Pipeline |
|---|---|---|
scan-bw.pdf, scan-bw2.pdf, ... |
Single – process immediately | ImageMagick B&W |
scan-color.pdf, scan-color2.pdf, ... |
Single – process immediately | Ghostscript color |
scan-bw-multi.pdf, scan-bw-multi2.pdf, ... |
Multi – collect, wait for trigger, merge | ImageMagick B&W |
scan-color-multi.pdf, scan-color-multi2.pdf, ... |
Multi – collect, wait for trigger, merge | Ghostscript color |
Page numbering: The first scan has no number suffix (e.g. scan-bw-multi.pdf), subsequent pages increment numerically (e.g. scan-bw-multi2.pdf, scan-bw-multi3.pdf, ...). All common numbering schemes are supported – no suffix, 1, 2, 10 as well as zero-padded 0001, 0002, 0010. Pages are always merged in natural numeric order (8 → 9 → 10 → 11), not lexicographic order (which would incorrectly produce 1 → 10 → 11 → 2 → 3).
Detection logic:
- Multi is detected by substring match: filename contains
MULTI_PATTERN(multiby default) - B&W is detected by prefix match: filename starts with
SW_PATTERN(scan-bwby default) - Both checks are independent, so
scan-bw-multicorrectly triggers multi mode and B&W processing.
services:
scan-treatment:
image: ghcr.io/diesteinhose/scan-treatment:latest
container_name: scan-treatment
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- /mnt/scanner:/data
environment:
TZ: Europe/Berlin
PAPERLESS_URL: http://paperless-ngx:8000
PAPERLESS_TOKEN: your-api-token
DISABLE_MULTI: "false"
SW_PATTERN: scan-bw
MULTI_PATTERN: multi
BUTTON_PAUSE: "1800"
BW_DPI: "300"
# BW_PARAMS: "-chop 5x5 ..." # optional: override ImageMagick parameters
# COLOR_PARAMS: "-q ..." # optional: override Ghostscript parameters
# TG_API_KEY: "123456:ABC..." # optional: Telegram bot token
# TG_CHAT_ID: "123456789" # optional: Telegram chat ID
# TG_NOTIFY_SUCCESS: "false" # optional: notify on successful uploads
# PRINTER_NOTIFY: "true" # optional: update printer scan-menu on batch completion
# PRINTER_IP: "192.168.1.x" # optional: required for PRINTER_NOTIFY and /scan/* endpoints
# PRINTER_USER: "Alice" # optional: filter to only update jobs with this name
# ESCL_BW_DPI: "600" # optional: DPI for /scan/*/bw endpoints
# ESCL_COLOR_DPI: "600" # optional: DPI for /scan/*/color endpointsOpen the Unraid terminal and run:
wget -O /boot/config/plugins/dockerMan/templates-user/scan-treatment.xml \
https://raw.githubusercontent.com/diesteinhose/scan-treatment/main/unraid-template.xmlThen go to Docker → Add Container and select scan-treatment from the template list.
Pre-built images are available from the GitHub Container Registry:
docker pull ghcr.io/diesteinhose/scan-treatment:latestOr build locally:
docker build -t scan-treatment .-
Without
PAPERLESS_URLandPAPERLESS_TOKEN, processed files are kept inEXPORT_DIRand not uploaded. -
Failed Paperless uploads are retried indefinitely every
FAIL_PAUSEseconds. -
Telegram notifications are fully optional. Nothing is sent if
TG_API_KEYis unset. -
In multi mode, only one batch runs at a time. Files arriving during an active batch are automatically added to the current stack.
-
Printer notifications (
PRINTER_NOTIFY=true) work only in multi mode. The scan job's display name in the printer's scan menu reflects the current state:State Display Collecting pages My Scan Job [scan 2]Processing after trigger My Scan Job [proc...]Done My Scan Job [OK 14:32]or[ERR 14:32]Done, but next batch already collecting My Scan Job [OK scan 1],[OK scan 2], ...60 minutes after completion My Scan Job(suffix cleared)If pages for a new batch arrive while the previous one is still processing, the completion notification combines both states:
[OK scan N]shows that the previous batch succeeded and N pages are already queued for the next one. Further pages increment the count. Once the trigger fires for the new batch, the display transitions to[proc...]as normal. Status suffixes are also cleared on every container start. Tested with HP LaserJet MFP M130fw via the HP EWS web interface.