Developing the Docker image

Base image

To simplify things, the base image used is jupyter/minimal-notebook:lab-3.2.8 which is maintained by the Jupyter team to provide a minimal environment for being able to run Jupyter Lab in an interactive mode from inside the container. Updating this base image tag can result in the CPython runtime changing in addition to other dependencies, so it should only be changed if the entire environment is going to be rebuilt.

Python environment

The Python environment that exists inside of the Docker image is described at a high level by a requirements.txt file, and is fully described at the hash level by a requirements.lock generated with pip-compile from pip-tools.

High level requirements.txt

A developer should control the environment by describing the high level dependencies they need in the requirements.txt. These dependencies will be the things that will actually be imported by the intended users (e.g. pandas and matplotlib) and not the dependencies of those imports. To make things more stable and reproducible the dependencies in the requirements.txt should all be fully pinned (i.e., for a SemVer package fully specified down to the patch release).

While it may be possible for the intended code to work in an environment that provides a less complete description, e.g. using compatible release syntax (~=) to pin down only to the minor release, the computing environment should be viewed as an application — the goal is runtime stability for existing code.

Lock file

To create a full description of the environment the high level requirements.txt can be "compiled" into a requirements.lock lock file that lists all libraries and all of their dependencies pinned at the hash level using pip-tools. The lock file is designed to make the computing environment as fully reproducible as possible. Given the complexity of the dependency solve that might be necessary to meet all dependency requirements, the lock file should not be edited by hand at all and should only be created or updated through use of pip-tools's pip-compile command. The lock file should be kept under version control though so that any build of the Docker image's environment is fully reproducible in the future.

Updating the environment

The lock file is compiled from the requirements.txt using pip-compile

phys-398-mla-image/docker/compile_dependencies.sh

Lines 10 to 13 in aa73f8b

    
           pip-compile \ 
        
               --generate-hashes \ 
        
               --output-file requirements.lock \ 
        
               requirements.txt

A helper script compile_dependencies.sh is provided inside of the docker directory that can be run inside of a Python virtual environment

bash compile_dependencies.sh

to produce the lock file that would be generated for that version of CPython.

To update the Python environment, a developer should:

Add to or revise the dependencies defined in the high level requirements.txt.
Compile the requirements.lock lock file from the updated requirements.txt using compile_dependencies.sh.
Commit both the updated requirements.txt and requirements.lock to version control.
Rebuild the Docker image with the updated files.

Installing the environment

The lock file provides the capability for a fully reproducible environment, but to truly ensure it is reproducible it needs to be installed in a secure method as well. A good way to do this is to implement Brett Cannon's "secure-install" procedure with pip. The options that are enabled to have pip take advantage of the lock file are described in the following section taken from the GitHub Action implementation.

Design

A few options are turned on for pip to make sure installations are secure and reproducible:

A requirements file must be specified to make sure all dependencies are known statically for auditing purposes (-r).

No dependency resolution is done to make sure the requirements file is complete (--no-deps).

All requirements must have a hash provided to make sure the files have not been tampered with (--require-hashes).

Only wheels are allowed to have reproducible installs (--only-binary :all:).

Updating the tagged Docker images

The Docker image will be built automatically and deployed to Docker Hub through CI/CD with GitHub Actions. To avoid accidentally overwriting a course tag with a new build the tag procedure is manual for the time being.

After a build has successfully finished and deployed to Docker Hub pull it down locally

physicsillinois/phys-398-mla:latest

Tag the pulled image with the course tag you want to distribute (e.g. spring-2022)

docker tag physicsillinois/phys-398-mla:latest physicsillinois/phys-398-mla:<new tag goes here>

Push the new tag to Docker Hub. This will be very fast as the digest of the image layers already exist on Docker Hub and it will recognize this and just associate a new tag to them instead of pushing the full image.

docker push physicsillinois/phys-398-mla:<new tag goes here>

It is recommended after this that you clean up your Docker image cache with

docker system prune

If you are working with Docker locally on a regular basis you probably want to do this at least daily.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

development.md

development.md

Developing the Docker image

Base image

Python environment

High level requirements.txt

Lock file

Updating the environment

Installing the environment

Design

Updating the tagged Docker images

	pip-compile \
	--generate-hashes \
	--output-file requirements.lock \
	requirements.txt

Files

development.md

Latest commit

History

development.md

File metadata and controls

Developing the Docker image

Base image

Python environment

High level requirements.txt

Lock file

Updating the environment

Installing the environment

Design

Updating the tagged Docker images