Skip to content

Commit 1495c78

Browse files
committed
adapt episode about the example project to classifying task
1 parent 672f387 commit 1495c78

File tree

6 files changed

+61
-61
lines changed

6 files changed

+61
-61
lines changed

content/conf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@
4040
# remove once sphinx_rtd_theme updated for contrast and accessibility:
4141
"sphinx_rtd_theme_ext_color_contrast",
4242
"sphinx_coderefinery_branding",
43-
"sphinxcontrib.video",
4443
]
4544

4645
# MyST extensions

content/example.md

Lines changed: 59 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,29 @@
1-
(example-project)=
1+
# Example project: 2D classification task using a nearest-neighbor predictor
22

3-
# Example project: Simulating the motion of planets
3+
The [example code](https://github.com/workshop-material/classification-task)
4+
that we will study is a relatively simple nearest-neighbor predictor written in
5+
Python. It is not important or expected that we understand the code in detail.
46

5-
The [example code](https://github.com/workshop-material/planets) that we will study
6-
is a hopefully simple N-body simulation written in Python. It is not important
7-
or expected that we understand the code in any detail.
7+
The code will produce something like this:
88

9-
:::{video} video/animation.mp4
10-
:width: 600
9+
:::{figure} img/chart.svg
10+
:alt: Results of the classification task
11+
:width: 100%
12+
13+
The bottom row shows the training data (two labels) and the top row shows the
14+
test data and whether the nearest-neighbor predictor classified their labels
15+
correctly.
1116
:::
1217

13-
The **big picture** is that the code simulates the motion of a number of
14-
planets:
15-
- We can choose the number of planets.
16-
- Each planet starts with a random position, velocity, and mass.
17-
- At each time step, the code calculates the gravitational force between each
18-
pair of planets.
19-
- The forces accelerate each planet, the acceleration modifies the velocity,
20-
the velocity modifies the position of each planet.
21-
- We can choose the number of time steps.
22-
- The units were chosen to make numbers easy to read.
18+
The **big picture** of the code is as follows:
19+
- We can choose the number of samples (the example above has 50 samples).
20+
- The code will generate samples with two labels (0 and 1) in a 2D space.
21+
- One of the labels has a normal distribution and a circular distribution with
22+
some minimum and maximum radius.
23+
- The second label only has a circular distribution with a different radius.
24+
- Then we try to predict whether the test samples belong to label 0 or 1 based
25+
on the nearest neighbors in the training data. The number of neighbors can
26+
be adjusted and the code will take label of the majority of the neighbors.
2327

2428

2529
## Example run
@@ -29,57 +33,54 @@ The instructor demonstrates running the code on their computer.
2933
:::
3034

3135
The code is written to accept **command-line arguments** to specify the number
32-
of planets and the number of time steps.
36+
of samples and file names. Later we will discuss advantages of this approach.
3337

34-
We first generate starting data:
38+
Let us try to get the help text:
3539
```console
36-
$ python generate-data.py --num-planets 10 --output-file initial.csv
37-
```
40+
$ python generate-data.py --help
41+
42+
Usage: generate-data.py [OPTIONS]
43+
44+
Program that generates a set of training and test samples for a non-linear
45+
classification task.
3846

39-
The generated file (initial.csv) could look like this:
47+
Options:
48+
--num-samples INTEGER Number of samples for each class. [required]
49+
--training-data TEXT Training data is written to this file. [required]
50+
--test-data TEXT Test data is written to this file. [required]
51+
--help Show this message and exit.
4052
```
41-
px,py,pz,vx,vy,vz,mass
42-
-46.88,-42.51,88.33,-0.86,-0.18,0.55,6.70
43-
-5.29,17.09,-96.13,0.66,0.45,-0.17,3.51
44-
83.53,-92.83,-68.77,-0.26,-0.48,0.24,6.84
45-
-36.31,25.48,64.16,0.85,0.75,-0.56,1.53
46-
-68.38,-17.21,-97.07,0.60,0.26,0.69,6.63
47-
-48.37,-48.74,3.92,-0.92,-0.33,-0.93,8.60
48-
40.53,-75.50,44.18,-0.62,-0.31,-0.53,8.04
49-
-27.21,10.78,-78.82,-0.09,-0.55,-0.03,5.35
50-
88.42,-74.95,-45.85,0.81,0.68,0.56,5.36
51-
39.09,53.12,-59.54,-0.54,0.56,0.07,8.98
53+
54+
We first generate the training and test data:
55+
```console
56+
$ python generate-data.py --num-samples 50 --training-data train.csv --test-data test.csv
57+
58+
Generated 50 training samples (train.csv) and test samples (test.csv).
5259
```
5360

54-
Then we can simulate their motion (in this case for 20 steps):
61+
In a second step we generate predictions for the test data:
5562
```console
56-
$ python simulate.py --num-steps 20 \
57-
--input-file initial.csv \
58-
--output-file final.csv
63+
$ python generate-predictions.py --num-neighbors 7 --training-data train.csv --test-data test.csv --predictions predictions.csv
64+
65+
Predictions saved to predictions.csv
5966
```
6067

61-
The `--output-file` (final.csv) is again a CSV file (comma-separated values)
62-
and contains the final positions of all planets.
63-
64-
It is possible to run on **multiple cores** and to **animate** the result.
65-
Here is an example with 100 planets:
66-
```{code-block} console
67-
---
68-
emphasize-lines: 7,11
69-
---
70-
$ python generate-data.py --num-planets 100 --output-file initial.csv
71-
72-
$ python simulate.py --num-steps 50 \
73-
--input-file initial.csv \
74-
--output-file final.csv \
75-
--trajectories-file trajectories.npz \
76-
--num-cores 8
77-
78-
$ python animate.py --initial-file initial.csv \
79-
--trajectories-file trajectories.npz \
80-
--output-file animation.mp4
68+
Finally, we can plot the results:
69+
```console
70+
$ python plot-results.py --training-data train.csv --predictions predictions.csv --output-chart chart.svg
71+
72+
Accuracy: 0.94
73+
Saved chart to chart.svg
8174
```
8275

76+
77+
## Discussion and goals
78+
79+
:::{discussion}
80+
- Together we look at the generated files (train.csv, test.csv, predictions.csv, chart.svg).
81+
- We browse and discuss the [example code behind these scripts](https://github.com/workshop-material/classification-task).
82+
:::
83+
8384
:::{admonition} Learning goals
8485
- What are the most important steps to make this code **reusable by others**
8586
and **our future selves**?
@@ -90,6 +91,6 @@ $ python animate.py --initial-file initial.csv \
9091
- ... how the code works internally in detail.
9192
- ... whether this is the most efficient algorithm.
9293
- ... whether the code is numerically stable.
93-
- ... how to code scales with the number of cores.
94+
- ... how to code scales with system size.
9495
- ... whether it is portable to other operating systems (we will discuss this later).
9596
:::

content/img/chart.svg

Lines changed: 1 addition & 0 deletions
Loading

content/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ them to own projects**.
3030
- 13:00-13:30 - **Welcome and introduction**
3131
- Practical information (tools, communication, breaks, etc.)
3232
- Motivation (reproducibility, robustness, distribution, improvement, trust, etc.)
33-
- {ref}`example-project`
33+
- {doc}`example`
3434

3535
- 13:30-14:45 - {ref}`version-control` (1/2)
3636
- {ref}`version-control-motivation` (15 min)

content/video/animation.mp4

-205 KB
Binary file not shown.

requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,3 @@ sphinx_rtd_theme_ext_color_contrast
44
myst_nb
55
sphinx-lesson
66
https://github.com/coderefinery/sphinx-coderefinery-branding/archive/master.zip
7-
sphinxcontrib-video

0 commit comments

Comments
 (0)