DataScience x Logistic Regression - School-42 project
- Learn how to read a dataset, to visualize it in different ways, to select and clean unnecessary information from your data.
- Implement one-vs-all logistic regression that will solve classification problem
Look at subject.pdf for more information
Python 3
PyTorch
NumPy
Pandas
Matplotlib
git clone https://github.com/Gleonett/DSLR.git
cd DSLR
pip3 install -r requirements.txt
For information about arguments each executable has -h
flag
This are some visualization utils for dataset:
histogram.py | scatter_plot.py | clusters_3d_plot.py |
---|---|---|
![]() |
![]() |
![]() |
Show course marks distribution | Show values for two courses using Cartesian coordinates | 3D scatter plot of clusters |
pair_plot.py is scatter_plot.py + histogram.py for all courses |
---|
![]() |
describe.py is implementation of pandas.DataFrame.describe |
---|
![]() |
Accuracy
with standard parameters is 0.99
Parameters for training, including batch size, are stored in config.yaml
- logreg_train.py saves
data/weights.pt
(use-v
flag for loss history visualization)
stochastic GD | batch GD | GD |
---|---|---|
![]() |
![]() |
![]() |
- logreg_predict.py takes
data/weights.pt
and savesdata/houses.csv
- evaluate.py -
logreg_train.py
+logreg_predict.py
and evaluating ondataset_truth.csv
- random_evaluate.py - training and evaluating on random splitted
dataset_train.csv