Skip to content

Commit 64923df

Browse files
committed
update software install instructions
1 parent 478613b commit 64923df

14 files changed

+682
-532
lines changed

content/dependencies.md

Lines changed: 365 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,365 @@
1+
# Reproducible environments and dependencies
2+
3+
:::{objectives}
4+
- There are not many codes that have no dependencies.
5+
How should we **deal with dependencies**?
6+
- We will focus on installing and managing dependencies in Python when using packages from PyPI and Conda.
7+
- We will not discuss how to distribute your code as a package.
8+
:::
9+
10+
[This episode borrows from <https://coderefinery.github.io/reproducible-python/reusable/>
11+
and <https://aaltoscicomp.github.io/python-for-scicomp/dependencies/>]
12+
13+
Essential XKCD comics:
14+
- [xkcd - dependency](https://xkcd.com/2347/)
15+
- [xkcd - superfund](https://xkcd.com/1987/)
16+
17+
18+
## How to avoid: "It works on my machine &#129335;"
19+
20+
Use a **standard way** to list dependencies in your project:
21+
- Python: `requirements.txt` or `environment.yml`
22+
- R: `DESCRIPTION` or `renv.lock`
23+
- Rust: `Cargo.lock`
24+
- Julia: `Project.toml`
25+
- C/C++/Fortran: `CMakeLists.txt` or `Makefile` or `spack.yaml` or the module
26+
system on clusters or containers
27+
- Other languages: ...
28+
29+
30+
## Two ecosystems: PyPI (The Python Package Index) and Conda
31+
32+
:::{admonition} PyPI
33+
- **Installation tool:** `pip` or `uv` or similar
34+
- Traditionally used for Python-only packages or
35+
for Python interfaces to external libraries. There are also packages
36+
that have bundled external libraries (such as numpy).
37+
- **Pros:**
38+
- Easy to use
39+
- Package creation is easy
40+
- **Cons:**
41+
- Installing packages that need external libraries can be complicated
42+
:::
43+
44+
:::{admonition} Conda
45+
- **Installation tool:** `conda` or `mamba` or similar
46+
- Aims to be a more general package distribution tool
47+
and it tries to provide not only the Python packages, but also libraries
48+
and tools needed by the Python packages.
49+
- **Pros:**
50+
- Quite easy to use
51+
- Easier to manage packages that need external libraries
52+
- Not only for Python
53+
- **Cons:**
54+
- Package creation is harder
55+
:::
56+
57+
58+
## Conda ecosystem explained
59+
60+
- [Anaconda](https://www.anaconda.com) is a distribution of conda packages
61+
made by Anaconda Inc. When using Anaconda remember to check that your
62+
situation abides with their licensing terms (see below).
63+
64+
- Anaconda has recently changed its **licensing terms**, which affects its
65+
use in a professional setting. This caused uproar among academia
66+
and Anaconda modified their position in
67+
[this article](https://www.anaconda.com/blog/update-on-anacondas-terms-of-service-for-academia-and-research).
68+
69+
Main points of the article are:
70+
- conda (installation tool) and community channels (e.g. conda-forge)
71+
are free to use.
72+
- Anaconda repository and **Anaconda's channels in the community repository**
73+
are free for universities and companies with fewer than 200 employees.
74+
Non-university research institutions and national laboratories need
75+
licenses.
76+
- Miniconda is free, when it does not download Anaconda's packages.
77+
- Miniforge is not related to Anaconda, so it is free.
78+
79+
For ease of use on sharing environment files, we recommend using
80+
Miniforge to create the environments and using conda-forge as the main
81+
channel that provides software.
82+
83+
- Major repositories/channels:
84+
- [Anaconda Repository](https://repo.anaconda.com)
85+
houses Anaconda's own proprietary software channels.
86+
- Anaconda's proprietary channels: `main`, `r`, `msys2` and `anaconda`.
87+
These are sometimes called `defaults`.
88+
- [conda-forge](https://conda-forge.org) is the largest open source
89+
community channel. It has over 28k packages that include open-source
90+
versions of packages in Anaconda's channels.
91+
92+
93+
## Tools and distributions for dependency management in Python
94+
95+
- [Poetry](https://python-poetry.org): Dependency management and packaging.
96+
- [Pipenv](https://pipenv.pypa.io): Dependency management, alternative to Poetry.
97+
- [pyenv](https://github.com/pyenv/pyenv): If you need different Python versions for different projects.
98+
- [virtualenv](https://docs.python.org/3/library/venv.html): Tool to create isolated Python environments for PyPI packages.
99+
- [micropipenv](https://github.com/thoth-station/micropipenv): Lightweight tool to "rule them all".
100+
- [Conda](https://docs.conda.io): Package manager for Python and other languages maintained by Anaconda Inc.
101+
- [Miniconda](https://docs.anaconda.com/miniconda/): A "miniature" version of conda, maintained by Anaconda Inc. By default uses
102+
Anaconda's channels. Check licensing terms when using these packages.
103+
- [Mamba](https://mamba.readthedocs.io): A drop in replacement for conda.
104+
It used be much faster than conda due to better
105+
dependency solver but nowadays conda
106+
[also uses the same solver](https://conda.org/blog/2023-11-06-conda-23-10-0-release/).
107+
It still has some UI improvements.
108+
- [Micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html): Tiny version of the Mamba package manager.
109+
- [Miniforge](https://github.com/conda-forge/miniforge): Open-source Miniconda alternative with
110+
conda-forge as the default channel and optionally mamba as the default installer.
111+
- [Pixi](https://pixi.sh): Modern, super fast tool which can manage conda environments.
112+
- [uv](https://docs.astral.sh/uv/): Modern, super fast replacement for pip,
113+
poetry, pyenv, and virtualenv. You can also switch between Python versions.
114+
115+
116+
## Best practice: Install dependencies into isolated environments
117+
118+
- For each project, create a **separate environment**.
119+
- Don't install dependencies globally for all projects. Sooner or later, different projects will have conflicting dependencies.
120+
- Install them **from a file** which documents them at the same time
121+
Install dependencies by first recording them in `requirements.txt` or
122+
`environment.yml` and install using these files, then you have a trace
123+
(we will practice this later below).
124+
125+
:::{keypoints}
126+
If somebody asks you what dependencies you have in your project, you should be
127+
able to answer this question **with a file**.
128+
129+
In Python, the two most common ways to do this are:
130+
- **requirements.txt** (for pip and virtual environments)
131+
- **environment.yml** (for conda and similar)
132+
133+
You can export ("freeze") the dependencies from your current environment into these files:
134+
```bash
135+
# inside a conda environment
136+
$ conda env export --from-history > environment.yml
137+
138+
# inside a virtual environment
139+
$ pip freeze > requirements.txt
140+
```
141+
:::
142+
143+
144+
## How to communicate the dependencies as part of a report/thesis/publication
145+
146+
Each notebook or script or project which depends on libraries should come with
147+
either a `requirements.txt` or a `environment.yml`, unless you are creating
148+
and distributing this project as Python package.
149+
150+
- Attach a `requirements.txt` or a `environment.yml` to your thesis.
151+
- Even better: Put `requirements.txt` or a `environment.yml` in your Git repository along your code.
152+
- Even better: Also [binderize](https://mybinder.org/) your analysis pipeline.
153+
154+
155+
## Containers
156+
157+
- A container is like an **operating system inside a file**.
158+
- "Building a container": Container definition file (recipe) -> Container image
159+
- This can be used with [Apptainer](https://apptainer.org/)/
160+
[SingularityCE](https://sylabs.io/singularity/).
161+
162+
Containers offer the following advantages:
163+
- **Reproducibility**: The same software environment can be recreated on
164+
different computers. They force you to know and **document all your dependencies**.
165+
- **Portability**: The same software environment can be run on different computers.
166+
- **Isolation**: The software environment is isolated from the host system.
167+
- "**Time travel**":
168+
- You can run old/unmaintained software on new systems.
169+
- Code that needs new dependencies which are not available on old systems can
170+
still be run on old systems.
171+
172+
173+
## How to install dependencies into environments
174+
175+
Now we understand a bit better why and how we installed dependencies
176+
for this course in the {doc}`installation`.
177+
178+
We have used **Miniforge** and the long command we have used was:
179+
```console
180+
$ mamba env create -n course -f https://raw.githubusercontent.com/coderefinery/python-progression/main/software/environment.yml
181+
```
182+
183+
This command did two things:
184+
- Create a new environment with name "course" (specified by `-n`).
185+
- Installed all dependencies listed in the `environment.yml` file (specified by
186+
`-f`), which we fetched directly from the web.
187+
[Here](https://github.com/coderefinery/python-progression/blob/main/software/environment.yml)
188+
you can browse it.
189+
190+
For your own projects:
191+
1. Start by writing an `environment.yml` of `requirements.txt` file. They look like this:
192+
:::::{tabs}
193+
::::{tab} environment.yml
194+
:::{literalinclude} ../software/environment.yml
195+
:language: yaml
196+
:::
197+
::::
198+
199+
::::{tab} requirements.txt
200+
:::{literalinclude} ../software/requirements.txt
201+
:::
202+
::::
203+
:::::
204+
205+
2. Then set up an isolated environment and install the dependencies from the file into it:
206+
:::::{tabs}
207+
::::{group-tab} Miniforge
208+
- Create a new environment with name "myenv" from `environment.yml`:
209+
```console
210+
$ conda env create -n myenv -f environment.yml
211+
```
212+
Or equivalently:
213+
```console
214+
$ mamba env create -n myenv -f environment.yml
215+
```
216+
- Activate the environment:
217+
```console
218+
$ conda activate myenv
219+
```
220+
- Run your code inside the activated virtual environment.
221+
```console
222+
$ python example.py
223+
```
224+
::::
225+
226+
::::{group-tab} Pixi
227+
- Create `pixi.toml` from `environment.yml`:
228+
```console
229+
$ pixi init --import environment.yml
230+
```
231+
- Run your code inside the environment:
232+
```console
233+
$ pixi run python example.py
234+
```
235+
::::
236+
237+
::::{group-tab} Virtual environment
238+
- Create a virtual environment by running (the second argument is the name
239+
of the environment and you can change it):
240+
```console
241+
$ python -m venv venv
242+
```
243+
- Activate the virtual environment (how precisely depends on your operating
244+
system and shell).
245+
- Install the dependencies:
246+
```console
247+
$ python -m pip install -r requirements.txt
248+
```
249+
- Run your code inside the activated virtual environment.
250+
```console
251+
$ python example.py
252+
```
253+
::::
254+
255+
::::{group-tab} uv
256+
- Create a virtual environment by running (the second argument is the name
257+
of the environment and you can change it):
258+
```console
259+
$ uv venv venv
260+
```
261+
- Activate the virtual environment (how precisely depends on your operating
262+
system and shell).
263+
- Install the dependencies:
264+
```console
265+
$ uv pip sync requirements.txt
266+
```
267+
- Run your code inside the virtual environment.
268+
```console
269+
$ uv run python example.py
270+
```
271+
::::
272+
:::::
273+
274+
275+
## Updating environments
276+
277+
What if you forgot a dependency? Or during the development of your project
278+
you realize that you need a new dependency? Or you don't need some dependency anymore?
279+
280+
1. Modify the `environment.yml` or `requirements.txt` file.
281+
2. Either remove your environment and create a new one, or update the existing one:
282+
283+
:::::{tabs}
284+
::::{group-tab} Miniforge
285+
- Update the environment by running:
286+
```console
287+
$ conda env update --file environment.yml
288+
```
289+
- Or equivalently:
290+
```console
291+
$ mamba env update --file environment.yml
292+
```
293+
::::
294+
295+
::::{group-tab} Pixi
296+
- Remove `pixi.toml`.
297+
- Then update it from the updated `environment.yml` by running:
298+
```console
299+
$ pixi init --import environment.yml
300+
```
301+
::::
302+
303+
::::{group-tab} Virtual environment
304+
- Activate the virtual environment.
305+
- Update the environment by running:
306+
```console
307+
$ pip install -r requirements.txt
308+
```
309+
::::
310+
311+
::::{group-tab} uv
312+
- Activate the virtual environment.
313+
- Update the environment by running:
314+
```console
315+
$ uv pip sync requirements.txt
316+
```
317+
::::
318+
:::::
319+
320+
321+
## Pinning package versions
322+
323+
Let us look at the
324+
[environment.yml](https://github.com/coderefinery/python-progression/blob/main/software/environment.yml)
325+
which we used to set up the environment for this progression course.
326+
Dependencies are listed without version numbers. Should we **pin the
327+
versions**?
328+
329+
- Both `pip` and `conda` ecosystems and all the tools that we have
330+
mentioned support pinning versions.
331+
332+
- It is possible to define a range of versions instead of precise versions.
333+
334+
- While your project is still in progress, I often use latest versions and do not pin them.
335+
336+
- When publishing the script or notebook, it is a good idea to pin the versions
337+
to ensure that the code can be run in the future.
338+
339+
- Remember that at some point in time you will face a situation where
340+
newer versions of the dependencies are no longer compatible with your
341+
software. At this point you'll have to update your software to use the newer
342+
versions or to lock it into a place in time.
343+
344+
345+
## Managing dependencies on a supercomputer
346+
347+
- Additional challenges:
348+
- Storage quotas: **Do not install dependencies in your home directory**. A conda environment can easily contain 100k files.
349+
- Network file systems struggle with many small files. Conda environments often contain many small files.
350+
- Possible solutions:
351+
- Try [Pixi](https://pixi.sh/) (modern take on managing Conda environments) and
352+
[uv](https://docs.astral.sh/uv/) (modern take on managing virtual
353+
environments). Blog post: [Using Pixi and uv on a supercomputer](https://research-software.uit.no/blog/2025-pixi-and-uv/)
354+
- Install your environment on the fly into a scratch directory on local disk (**not** the network file system).
355+
- Install your environment on the fly into a RAM disk/drive.
356+
- Containerize your environment into a container image.
357+
358+
---
359+
360+
:::{keypoints}
361+
- Being able to communicate your dependencies is not only nice for others, but
362+
also for your future self or the next PhD student or post-doc.
363+
- If you ask somebody to help you with your code, they will ask you for the
364+
dependencies.
365+
:::

content/documentation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Diátaxis is a systematic approach to technical documentation authoring.
139139

140140
:::{prereq} Preparation
141141
In this episode we will use the following 5 packages which we installed
142-
previously as part of the {ref}`conda` or {ref}`venv`:
142+
previously as part of the {doc}`installation`:
143143
```
144144
myst-parser
145145
sphinx
46.5 KB
Loading
9.21 KB
Loading
17.8 KB
Loading
21.2 KB
Loading
29.2 KB
Loading

0 commit comments

Comments
 (0)