1
- (example- project)=
1
+ # Example project: 2D classification task using a nearest-neighbor predictor
2
2
3
- # Example project: Simulating the motion of planets
3
+ The [ example code] ( https://github.com/workshop-material/classification-task )
4
+ that we will study is a relatively simple nearest-neighbor predictor written in
5
+ Python. It is not important or expected that we understand the code in detail.
4
6
5
- The [ example code] ( https://github.com/workshop-material/planets ) that we will study
6
- is a hopefully simple N-body simulation written in Python. It is not important
7
- or expected that we understand the code in any detail.
7
+ The code will produce something like this:
8
8
9
- :::{video} video/animation.mp4
10
- :width: 600
9
+ :::{figure} img/chart.svg
10
+ :alt: Results of the classification task
11
+ :width: 100%
12
+
13
+ The bottom row shows the training data (two labels) and the top row shows the
14
+ test data and whether the nearest-neighbor predictor classified their labels
15
+ correctly.
11
16
:::
12
17
13
- The ** big picture** is that the code simulates the motion of a number of
14
- planets:
15
- - We can choose the number of planets.
16
- - Each planet starts with a random position, velocity, and mass.
17
- - At each time step, the code calculates the gravitational force between each
18
- pair of planets.
19
- - The forces accelerate each planet, the acceleration modifies the velocity,
20
- the velocity modifies the position of each planet.
21
- - We can choose the number of time steps.
22
- - The units were chosen to make numbers easy to read.
18
+ The ** big picture** of the code is as follows:
19
+ - We can choose the number of samples (the example above has 50 samples).
20
+ - The code will generate samples with two labels (0 and 1) in a 2D space.
21
+ - One of the labels has a normal distribution and a circular distribution with
22
+ some minimum and maximum radius.
23
+ - The second label only has a circular distribution with a different radius.
24
+ - Then we try to predict whether the test samples belong to label 0 or 1 based
25
+ on the nearest neighbors in the training data. The number of neighbors can
26
+ be adjusted and the code will take label of the majority of the neighbors.
23
27
24
28
25
29
## Example run
@@ -29,57 +33,54 @@ The instructor demonstrates running the code on their computer.
29
33
:::
30
34
31
35
The code is written to accept ** command-line arguments** to specify the number
32
- of planets and the number of time steps .
36
+ of samples and file names. Later we will discuss advantages of this approach .
33
37
34
- We first generate starting data :
38
+ Let us try to get the help text :
35
39
``` console
36
- $ python generate-data.py --num-planets 10 --output-file initial.csv
37
- ```
40
+ $ python generate-data.py --help
41
+
42
+ Usage: generate-data.py [OPTIONS]
43
+
44
+ Program that generates a set of training and test samples for a non-linear
45
+ classification task.
38
46
39
- The generated file (initial.csv) could look like this:
47
+ Options:
48
+ --num-samples INTEGER Number of samples for each class. [required]
49
+ --training-data TEXT Training data is written to this file. [required]
50
+ --test-data TEXT Test data is written to this file. [required]
51
+ --help Show this message and exit.
40
52
```
41
- px,py,pz,vx,vy,vz,mass
42
- -46.88,-42.51,88.33,-0.86,-0.18,0.55,6.70
43
- -5.29,17.09,-96.13,0.66,0.45,-0.17,3.51
44
- 83.53,-92.83,-68.77,-0.26,-0.48,0.24,6.84
45
- -36.31,25.48,64.16,0.85,0.75,-0.56,1.53
46
- -68.38,-17.21,-97.07,0.60,0.26,0.69,6.63
47
- -48.37,-48.74,3.92,-0.92,-0.33,-0.93,8.60
48
- 40.53,-75.50,44.18,-0.62,-0.31,-0.53,8.04
49
- -27.21,10.78,-78.82,-0.09,-0.55,-0.03,5.35
50
- 88.42,-74.95,-45.85,0.81,0.68,0.56,5.36
51
- 39.09,53.12,-59.54,-0.54,0.56,0.07,8.98
53
+
54
+ We first generate the training and test data:
55
+ ``` console
56
+ $ python generate-data.py --num-samples 50 --training-data train.csv --test-data test.csv
57
+
58
+ Generated 50 training samples (train.csv) and test samples (test.csv).
52
59
```
53
60
54
- Then we can simulate their motion (in this case for 20 steps) :
61
+ In a second step we generate predictions for the test data :
55
62
``` console
56
- $ python simulate .py --num-steps 20 \
57
- --input-file initial.csv \
58
- --output-file final .csv
63
+ $ python generate-predictions .py --num-neighbors 7 --training-data train.csv --test-data test.csv --predictions predictions.csv
64
+
65
+ Predictions saved to predictions .csv
59
66
```
60
67
61
- The ` --output-file ` (final.csv) is again a CSV file (comma-separated values)
62
- and contains the final positions of all planets.
63
-
64
- It is possible to run on ** multiple cores** and to ** animate** the result.
65
- Here is an example with 100 planets:
66
- ``` {code-block} console
67
- ---
68
- emphasize-lines: 7,11
69
- ---
70
- $ python generate-data.py --num-planets 100 --output-file initial.csv
71
-
72
- $ python simulate.py --num-steps 50 \
73
- --input-file initial.csv \
74
- --output-file final.csv \
75
- --trajectories-file trajectories.npz \
76
- --num-cores 8
77
-
78
- $ python animate.py --initial-file initial.csv \
79
- --trajectories-file trajectories.npz \
80
- --output-file animation.mp4
68
+ Finally, we can plot the results:
69
+ ``` console
70
+ $ python plot-results.py --training-data train.csv --predictions predictions.csv --output-chart chart.svg
71
+
72
+ Accuracy: 0.94
73
+ Saved chart to chart.svg
81
74
```
82
75
76
+
77
+ ## Discussion and goals
78
+
79
+ :::{discussion}
80
+ - Together we look at the generated files (train.csv, test.csv, predictions.csv, chart.svg).
81
+ - We browse and discuss the [ example code behind these scripts] ( https://github.com/workshop-material/classification-task ) .
82
+ :::
83
+
83
84
:::{admonition} Learning goals
84
85
- What are the most important steps to make this code ** reusable by others**
85
86
and ** our future selves** ?
@@ -90,6 +91,6 @@ $ python animate.py --initial-file initial.csv \
90
91
- ... how the code works internally in detail.
91
92
- ... whether this is the most efficient algorithm.
92
93
- ... whether the code is numerically stable.
93
- - ... how to code scales with the number of cores .
94
+ - ... how to code scales with system size .
94
95
- ... whether it is portable to other operating systems (we will discuss this later).
95
96
:::
0 commit comments