Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot training calibration, PR, and ROC curves; logging of label breakdown and number of epochs #332 #340

Closed
wants to merge 14 commits into from

Conversation

kathwy
Copy link
Contributor

@kathwy kathwy commented Jun 23, 2020

resolves #332
resolves #216

Added argument plot_train_curves (defaults to False) to plot calibration, PR, and ROC curves for training set.
Reports the label breakdown for train/valid/test at the end of train mode
Reports the number of epochs completed

@kathwy kathwy requested review from erikr and StevenSong June 23, 2020 06:09
@erikr erikr added the enhancement New feature or request label Jun 23, 2020
@erikr erikr linked an issue Jun 23, 2020 that may be closed by this pull request
Copy link
Collaborator

@StevenSong StevenSong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments mostly about the handling of the generator workers, we should be consistent and handle both the training and validation generators. also let's get review of @lucidtronix or @ndiamant

ml4cvd/models.py Outdated
@@ -1016,6 +1016,7 @@ def train_model_from_generators(
inspect_show_labels: bool,
return_history: bool = False,
plot: bool = True,
plot_train_curves: bool = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the argument here as plot_train_curves implies this function will do the plotting or have some functionality relating to it. instead it's just deferring the worker management to the caller of the function. maybe we can rename this argument for train_model_generators to defer_worker_halt or something similar?

out_path = os.path.join(args.output_folder, args.id + '/')
test_data, test_labels, test_paths = big_batch_from_minibatch_generator(generate_test, args.test_steps)
train_data, train_labels = big_batch_from_minibatch_generator(generate_train, args.training_steps)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the size of the big_batch returned here? training_steps is usually alot larger than test_steps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train big_batch shape = (25600, 2500, 12), as opposed to (2048, 2500, 12) for test

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a need to return train_paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to return it since it isn't used in plotting?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is optional, if provided it will be used to label outliers

Copy link
Collaborator

@StevenSong StevenSong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, please remove the 2 extra files though and also request review from a Broadie!

._.DS_Store
ml4cvd/._models.py

@kathwy kathwy requested a review from ndiamant June 25, 2020 02:56
@erikr erikr removed their request for review June 25, 2020 03:35
@erikr
Copy link

erikr commented Jun 25, 2020

awesome, please remove the 2 extra files though and also request review from a Broadie!

._.DS_Store
ml4cvd/._models.py

This repo has a .gitignore but it does not seem to be working.

Also, ensure ml4cvd/._arguments.py isn't being committed.

@StevenSong
Copy link
Collaborator

awesome, please remove the 2 extra files though and also request review from a Broadie!

._.DS_Store
ml4cvd/._models.py

This repo has a .gitignore but it does not seem to be working.

Also, ensure ml4cvd/._arguments.py isn't being committed.

those aren't currently in the codebase https://github.com/broadinstitute/ml/blob/106a1cca25a1e2f68cc4db99658ff38d4f5a94ce/.gitignore

I wonder what's generating the ._ files?

@kathwy
Copy link
Contributor Author

kathwy commented Jun 25, 2020

It seems like they're metadata files created by Mac that get separated out when I push (can't see them on my computer but I see them on the github website)?

@StevenSong
Copy link
Collaborator

It seems like they're metadata files created by Mac that get separated out when I push (can't see them on my computer but I see them on the github website)?

would recommend doing ls -a from your mac terminal window and rm them

before add and commit to git, can also do git status to see files that are untracked

we could also just add ._* to .gitignore

@kathwy kathwy changed the title Ky plot train roc #332 Plot training calibration, PR, and ROC curves; logging of label breakdown and number of epochs #332 Jul 4, 2020
@kathwy
Copy link
Contributor Author

kathwy commented Jul 6, 2020

Closing this issue for now. I've merged the changes into a separate fork (aguirre-lab/ml) for sts-ecg modeling purposes. If anyone finds this capability also useful for broadinstitute/ml, I'm happy to reopen this PR and merge.

@kathwy kathwy closed this Jul 6, 2020
@StevenSong StevenSong deleted the ky_plot_train_roc branch August 12, 2020 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Produce ROC and PR curves for both train and test sets Improve clarity of logfile contents
4 participants