To simplify things, the base image used is jupyter/minimal-notebook:lab-3.2.8
which is maintained by the Jupyter team to provide a minimal environment for being able to run Jupyter Lab in an interactive mode from inside the container.
Updating this base image tag can result in the CPython runtime changing in addition to other dependencies, so it should only be changed if the entire environment is going to be rebuilt.
The Python environment that exists inside of the Docker image is described at a high level by a requirements.txt
file, and is fully described at the hash level by a requirements.lock
generated with pip-compile
from pip-tools
.
A developer should control the environment by describing the high level dependencies they need in the requirements.txt
.
These dependencies will be the things that will actually be imported by the intended users (e.g. pandas
and matplotlib
) and not the dependencies of those imports.
To make things more stable and reproducible the dependencies in the requirements.txt
should all be fully pinned (i.e., for a SemVer package fully specified down to the patch release).
While it may be possible for the intended code to work in an environment that provides a less complete description, e.g. using compatible release syntax (~=
) to pin down only to the minor release, the computing environment should be viewed as an application — the goal is runtime stability for existing code.
To create a full description of the environment the high level requirements.txt
can be "compiled" into a requirements.lock
lock file that lists all libraries and all of their dependencies pinned at the hash level using pip-tools
.
The lock file is designed to make the computing environment as fully reproducible as possible. Given the complexity of the dependency solve that might be necessary to meet all dependency requirements, the lock file should not be edited by hand at all and should only be created or updated through use of pip-tools
's pip-compile
command.
The lock file should be kept under version control though so that any build of the Docker image's environment is fully reproducible in the future.
The lock file is compiled from the requirements.txt
using pip-compile
phys-398-mla-image/docker/compile_dependencies.sh
Lines 10 to 13 in aa73f8b
A helper script compile_dependencies.sh
is provided inside of the docker
directory that can be run inside of a Python virtual environment
bash compile_dependencies.sh
to produce the lock file that would be generated for that version of CPython.
To update the Python environment, a developer should:
- Add to or revise the dependencies defined in the high level
requirements.txt
. - Compile the
requirements.lock
lock file from the updatedrequirements.txt
usingcompile_dependencies.sh
. - Commit both the updated
requirements.txt
andrequirements.lock
to version control. - Rebuild the Docker image with the updated files.
The lock file provides the capability for a fully reproducible environment, but to truly ensure it is reproducible it needs to be installed in a secure method as well.
A good way to do this is to implement Brett Cannon's "secure-install" procedure with pip
.
The options that are enabled to have pip
take advantage of the lock file are described in the following section taken from the GitHub Action implementation.
A few options are turned on for pip to make sure installations are secure and reproducible:
- A requirements file must be specified to make sure all dependencies are known statically for auditing purposes (
-r
).- No dependency resolution is done to make sure the requirements file is complete (
--no-deps
).- All requirements must have a hash provided to make sure the files have not been tampered with (
--require-hashes
).- Only wheels are allowed to have reproducible installs (
--only-binary :all:
).
The Docker image will be built automatically and deployed to Docker Hub through CI/CD with GitHub Actions. To avoid accidentally overwriting a course tag with a new build the tag procedure is manual for the time being.
- After a build has successfully finished and deployed to Docker Hub pull it down locally
physicsillinois/phys-398-mla:latest
- Tag the pulled image with the course tag you want to distribute (e.g.
spring-2022
)
docker tag physicsillinois/phys-398-mla:latest physicsillinois/phys-398-mla:<new tag goes here>
- Push the new tag to Docker Hub. This will be very fast as the digest of the image layers already exist on Docker Hub and it will recognize this and just associate a new tag to them instead of pushing the full image.
docker push physicsillinois/phys-398-mla:<new tag goes here>
It is recommended after this that you clean up your Docker image cache with
docker system prune
If you are working with Docker locally on a regular basis you probably want to do this at least daily.