Skip to content

Feat: Buildkit caching for Docker builds #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,22 @@ jobs:
id: buildx
uses: docker/setup-buildx-action@v3

- name: Cache Parameters
id: cache_params
run: |
CACHE_SCOPE="cal-itp"
MAIN_BRANCH_REF="refs/heads/main"

echo "cache_from_args=type=gha,scope=${CACHE_SCOPE},ref=${MAIN_BRANCH_REF}" >> $GITHUB_OUTPUT
echo "cache_to_args=type=gha,scope=${CACHE_SCOPE},mode=max,ref=${MAIN_BRANCH_REF}" >> $GITHUB_OUTPUT

- name: Build, tag, and push image to GitHub Container Registry
uses: docker/build-push-action@v6
with:
builder: ${{ steps.buildx.outputs.name }}
build-args: GIT-SHA=${{ github.sha }}
cache-from: type=gha,scope=cal-itp
cache-to: type=gha,scope=cal-itp,mode=max
cache-from: ${{ steps.cache_params.outputs.cache_from_args }}
cache-to: ${{ steps.cache_params.outputs.cache_to_args }}
context: .
platforms: linux/amd64,linux/arm64
file: appcontainer/Dockerfile
Expand Down
87 changes: 59 additions & 28 deletions appcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,56 +1,87 @@
# declare default build args for later stages
ARG PYTHON_VERSION=3.12 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
USER=calitp \
USER_UID=1000 \
USER_GID=1000

FROM python:3.12

ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
USER=calitp
# renew top-level args in this stage
ARG PYTHON_VERSION \
PYTHONDONTWRITEBYTECODE \
PYTHONUNBUFFERED \
USER \
USER_UID \
USER_GID

# set env vars for the user, including HOME
ENV PYTHONUNBUFFERED=${PYTHONUNBUFFERED} \
PYTHONDONTWRITEBYTECODE=${PYTHONDONTWRITEBYTECODE} \
HOME=/home/${USER} \
USER=${USER} \
PATH="/home/${USER}/.local/bin:$PATH" \
# update env for local pip installs
# see https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUSERBASE
# since all `pip install` commands are in the context of $USER
# $PYTHONUSERBASE is the location used by default
PYTHONUSERBASE="/home/${USER}/.local" \
# where to store the pip cache (use the default)
# https://pip.pypa.io/en/stable/cli/pip/#cmdoption-cache-dir
PIP_CACHE_DIR="/home/${USER}/.cache/pip" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought PATH and PYTHONUSERBASE should just be set to /$USER/.local/bin and /$USER/.local respectively, but prefixing with home works too 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They in fact need to be in the /home directory!

Although we have the WORKDIR directly under /$USER for the final image, during build time pip was having trouble using a cache when it was not inside /home.

pip expects to do --user installs, well, inside the user's home directory! All the overriding in the world wasn't working, but this did 😀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's good to know! 👍

GUNICORN_CONF="/$USER/run/gunicorn.conf.py"

EXPOSE 8000

# create non-root $USER and home directory
RUN useradd --create-home --shell /bin/bash $USER && \
USER root
# install apt packages using the archives and lists cache
RUN --mount=type=cache,id=apt-archives,sharing=locked,target=/var/cache/apt/archives \
--mount=type=cache,id=apt-lists,sharing=locked,target=/var/lib/apt/lists \
groupadd --gid ${USER_GID} ${USER} 2>/dev/null || true && \
useradd --uid ${USER_UID} --gid ${USER_GID} --create-home --shell /bin/bash ${USER} && \
# pip cache dir must be created and owned by the user to work with BuildKit cache
mkdir -p ${PIP_CACHE_DIR} && \
# own the parent directory of PIP_CACHE_DIR
chown -R ${USER}:${USER} /home/${USER}/.cache && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use Bash parameter expansion ${PIP_CACHE_DIR%/*} here instead of /home/${USER}/.cache. This line wouldn't be as clear as is it now, but on the other hand we'd guarantee that changes to PIP_CACHE_DIR would follow through (and I don't know how often this would happen 😅 ).

Copy link
Member Author

@thekaveman thekaveman Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely a Bash noob! I was reading about parameter expansion and this section seems relevant to your comment:

${parameter%word}
${parameter%%word}

The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the % case) or the longest matching pattern (the %% case) deleted...

Yikes, that is a mouthful!

I think I am understanding that your suggestion:

${PIP_CACHE_DIR%/*}

Follows the % case above, such that:

  • parameter: PIP_CACHE_DIR -- expands to the full /home/calitp/.cache/pip
  • word: /* -- expands to any subdirectory
  • and the "shortest matching pattern" (any subdirectory in this case) is the /pip portion

So the result is that /pip is deleted and we are left with... what I hardcoded /home/calitp/.cache 😅

Is this correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's correct! That's what I also understood about how parameter expansion works. I used these other docs (the section under ${variable%suffix}) that also have an example, and running it on a local branch confirmed the behavior too.

But yep, we are basically left with what you had hardcoded 😅 I was thinking that parameter expansion could make the script a bit safer in case PIP_CACHE_DIR were to change (since chown -R ${USER}:${USER} /home/${USER}/.cache would automatically point to the right place) but again, PIP_CACHE_DIR may never change so this isn't actually important, it's probably better to prioritize clarity and use the hardcoded string 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I totally agree with you! I just didn't understand your bash-fu and wanted to make sure!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 that sounds good!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought after discussion, we decided to leave it as-is to be more explicit.

# setup $USER permissions for nginx
mkdir -p /var/cache/nginx && \
chown -R $USER /var/cache/nginx && \
chown -R $USER:$USER /var/cache/nginx && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious what the purpose of this change is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets ownership for both the user=$USER (LHS of the :) and the group=$USER (RHS of the :). Maybe it isn't strictly necessary, but since I had to use groups in the cache mount, I figured better to be explicit.

The group happens to be the same name as the user.

mkdir -p /var/lib/nginx && \
chown -R $USER /var/lib/nginx && \
chown -R $USER:$USER /var/lib/nginx && \
mkdir -p /var/log/nginx && \
chown -R $USER /var/log/nginx && \
chown -R $USER:$USER /var/log/nginx && \
touch /var/log/nginx/error.log && \
chown $USER /var/log/nginx/error.log && \
chown $USER:$USER /var/log/nginx/error.log && \
touch /var/run/nginx.pid && \
chown -R $USER /var/run/nginx.pid && \
chown -R $USER:$USER /var/run/nginx.pid && \
# setup directories and permissions for gunicorn, (eventual) app
mkdir -p /$USER/app && \
mkdir -p /$USER/run && \
chown -R $USER /$USER && \
chown -R $USER:$USER /$USER && \
# install server components
apt-get update && \
apt-get install -qq --no-install-recommends build-essential nginx gettext && \
python -m pip install --upgrade pip
apt-get install -y --no-install-recommends build-essential nginx gettext && \
# this cleanup is still important for the final image layer size
# remove lists from the image layer, but they remain in the BuildKit cache mount
rm -rf /var/lib/apt/lists/*

# enter run (gunicorn) directory
WORKDIR /$USER/run

# copy gunicorn config file
COPY appcontainer/gunicorn.conf.py gunicorn.conf.py
# overwrite default nginx.conf
COPY appcontainer/nginx.conf /etc/nginx/nginx.conf

# switch to non-root $USER
USER $USER

# update env for local pip installs
# see https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUSERBASE
# since all `pip install` commands are in the context of $USER
# $PYTHONUSERBASE is the location used by default
ENV PATH="$PATH:/$USER/.local/bin" \
PYTHONUSERBASE="/$USER/.local"

# install python dependencies
COPY appcontainer/requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# copy gunicorn config file
COPY appcontainer/gunicorn.conf.py gunicorn.conf.py
ENV GUNICORN_CONF "/$USER/run/gunicorn.conf.py"

# overwrite default nginx.conf
COPY appcontainer/nginx.conf /etc/nginx/nginx.conf
RUN --mount=type=cache,id=pipcache,target=${PIP_CACHE_DIR},uid=${USER_UID},gid=${USER_GID} \
python -m pip install --user --upgrade pip && \
pip install --user -r requirements.txt

# enter app directory
WORKDIR /$USER/app
Expand Down