You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to run the benchmark but it crashes on the dcase2016_task2 task. After training for what seems like 229 epochs, at the prediction stage, I get a KeyError trying to access the postprocessing parameters at epoch 240:
predict - dcase2016_task2 - 2024-08-01 09:19:18,874 - 874 - result: [0.1666666716337204, 29, {"batch_size": 1024, "check_val_every_n_epoch": 10, "dropout": 0.1, "embedding_norm": "<class 'torch.nn.modules.linear.Identity'>", "hidden_di
m": 1024, "hidden_layers": 2, "hidden_norm": "<class 'torch.nn.modules.batchnorm.BatchNorm1d'>", "initialization": "<function xavier_uniform_ at 0x7fd89f389830>", "lr": 0.0032, "max_epochs": 500, "norm_after_activation": false, "optim":
"<class 'torch.optim.adam.Adam'>", "patience": 20}, [["median_filter_ms", 250], ["min_duration", 125]]]
Grid Point Summary: [0.19771863520145416, 39, {"batch_size": 1024, "check_val_every_n_epoch": 10, "dropout": 0.1, "embedding_norm": "<class 'torch.nn.modules.linear.Identity'>", "hidden_dim": 1024, "hidden_layers": 2, "hidden_norm": "<c
lass 'torch.nn.modules.batchnorm.BatchNorm1d'>", "initialization": "<function xavier_normal_ at 0x7fd89f3898c0>", "lr": 0.00032, "max_epochs": 500, "norm_after_activation": false, "optim": "<class 'torch.optim.adam.Adam'>", "patience":
20}, [["median_filter_ms", 250], ["min_duration", 125]]]
Grid Point Summary: [0.19354838132858276, 59, {"batch_size": 1024, "check_val_every_n_epoch": 10, "dropout": 0.1, "embedding_norm": "<class 'torch.nn.modules.linear.Identity'>", "hidden_dim": 1024, "hidden_layers": 2, "hidden_norm": "<c
lass 'torch.nn.modules.batchnorm.BatchNorm1d'>", "initialization": "<function xavier_uniform_ at 0x7fd89f389830>", "lr": 0.00032, "max_epochs": 500, "norm_after_activation": false, "optim": "<class 'torch.optim.adam.Adam'>", "patience":
20}, [["median_filter_ms", 250], ["min_duration", 125]]]
Grid Point Summary: [0.1901140660047531, 269, {"batch_size": 1024, "check_val_every_n_epoch": 10, "dropout": 0.1, "embedding_norm": "<class 'torch.nn.modules.linear.Identity'>", "hidden_dim": 1024, "hidden_layers": 1, "hidden_norm": "<c
lass 'torch.nn.modules.batchnorm.BatchNorm1d'>", "initialization": "<function xavier_normal_ at 0x7fd89f3898c0>", "lr": 0.00032, "max_epochs": 500, "norm_after_activation": false, "optim": "<class 'torch.optim.adam.Adam'>", "patience":
20}, [["median_filter_ms", 250], ["min_duration", 125]]]
Grid Point Summary: [0.18285714089870453, 139, {"batch_size": 1024, "check_val_every_n_epoch": 10, "dropout": 0.1, "embedding_norm": "<class 'torch.nn.modules.linear.Identity'>", "hidden_dim": 1024, "hidden_layers": 1, "hidden_norm": "<
class 'torch.nn.modules.batchnorm.BatchNorm1d'>", "initialization": "<function xavier_normal_ at 0x7fd89f3898c0>", "lr": 0.0001, "max_epochs": 500, "norm_after_activation": false, "optim": "<class 'torch.optim.adam.Adam'>", "patience":
20}, [["median_filter_ms", 250], ["min_duration", 125]]]
Grid Point Summary: [0.1807909607887268, 69, {"batch_size": 1024, "check_val_every_n_epoch": 10, "dropout": 0.1, "embedding_norm": "<class 'torch.nn.modules.linear.Identity'>", "hidden_dim": 1024, "hidden_layers": 1, "hidden_norm": "<cl
ass 'torch.nn.modules.batchnorm.BatchNorm1d'>", "initialization": "<function xavier_normal_ at 0x7fd89f3898c0>", "lr": 0.001, "max_epochs": 500, "norm_after_activation": false, "optim": "<class 'torch.optim.adam.Adam'>", "patience": 20}
, [["median_filter_ms", 250], ["min_duration", 125]]]
Grid Point Summary: [0.1732580065727234, 29, {"batch_size": 1024, "check_val_every_n_epoch": 10, "dropout": 0.1, "embedding_norm": "<class 'torch.nn.modules.linear.Identity'>", "hidden_dim": 1024, "hidden_layers": 2, "hidden_norm": "<cl
ass 'torch.nn.modules.batchnorm.BatchNorm1d'>", "initialization": "<function xavier_uniform_ at 0x7fd89f389830>", "lr": 0.001, "max_epochs": 500, "norm_after_activation": false, "optim": "<class 'torch.optim.adam.Adam'>", "patience": 20
}, [["median_filter_ms", 250], ["min_duration", 125]]]
Grid Point Summary: [0.1666666716337204, 29, {"batch_size": 1024, "check_val_every_n_epoch": 10, "dropout": 0.1, "embedding_norm": "<class 'torch.nn.modules.linear.Identity'>", "hidden_dim": 1024, "hidden_layers": 2, "hidden_norm": "<cl
ass 'torch.nn.modules.batchnorm.BatchNorm1d'>", "initialization": "<function xavier_uniform_ at 0x7fd89f389830>", "lr": 0.0032, "max_epochs": 500, "norm_after_activation": false, "optim": "<class 'torch.optim.adam.Adam'>", "patience": 2
0}, [["median_filter_ms", 250], ["min_duration", 125]]]
Grid Point Summary: [0.16030533611774445, 19, {"batch_size": 1024, "check_val_every_n_epoch": 10, "dropout": 0.1, "embedding_norm": "<class 'torch.nn.modules.linear.Identity'>", "hidden_dim": 1024, "hidden_layers": 2, "hidden_norm": "<c
lass 'torch.nn.modules.batchnorm.BatchNorm1d'>", "initialization": "<function xavier_normal_ at 0x7fd89f3898c0>", "lr": 0.0032, "max_epochs": 500, "norm_after_activation": false, "optim": "<class 'torch.optim.adam.Adam'>", "patience": 2
0}, [["median_filter_ms", 250], ["min_duration", 125]]]
grid: 8it [1:59:58, 899.87s/it]
predict - dcase2016_task2 - 2024-08-01 09:19:18,874 - 874 - Best Grid Point Validation Score: 0.19771863520145416 Grid Point HyperParams: {'batch_size': 1024, 'check_val_every_n_epoch': 10, 'dropout': 0.1, 'embedding_norm': <class 'tor
ch.nn.modules.linear.Identity'>, 'hidden_dim': 1024, 'hidden_layers': 2, 'hidden_norm': <class 'torch.nn.modules.batchnorm.BatchNorm1d'>, 'initialization': <function xavier_normal_ at 0x7fd89f3898c0>, 'lr': 0.00032, 'max_epochs': 500, '
norm_after_activation': False, 'optim': <class 'torch.optim.adam.Adam'>, 'patience': 20}split: 0it [00:00, ?it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 84000/84000 [00:00<00:00, 140181.00it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 84000/84000 [00:01<00:00, 59876.66it/s]
Getting embeddings for split ['test'], which has 84000 instances.███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 79197/84000 [00:01<00:00, 60485.99it/s]
You are using a CUDA device ('NVIDIA RTX A6000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, r
ead https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Restoring states from the checkpoint path at logs/embeddings/mymodel/dcase2016_task2-hear2021-full/lightning_logs/version_4/checkpoints/epoch=39-step=10320.ckpt
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3]
Loaded model weights from checkpoint at logs/embeddings/mymodel/dcase2016_task2-hear2021-full/lightning_logs/version_4/checkpoints/epoch=39-step=10320.ckpt
/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:229: PossibleUserWarning: The dataloader, test_dataloader 0, does not have many workers which may be a bottleneck.
Consider increasing the value of the `num_workers` argument` (try 48 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
category=PossibleUserWarning,
0%| | 0/6 [2:00:03<?, ?it/s]
Traceback (most recent call last):
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/runpy.py", line 193, in _run_module_as_main"__main__", mod_spec)
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/runpy.py", line 85, in _run_codeexec(code, run_globals)
File "/home/ondrej/proj/sandbox/heareval/src/heareval/heareval/predictions/runner.py", line 181, in <module>
runner()
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/click/core.py", line 1130, in __call__returnself.main(*args, **kwargs)
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv =self.invoke(ctx)
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/click/core.py", line 1404, in invokereturn ctx.invoke(self.callback, **ctx.params)
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/click/core.py", line 760, in invokereturn __callback(*args, **kwargs)
File "/home/ondrej/proj/sandbox/heareval/src/heareval/heareval/predictions/runner.py", line 148, in runner
logger=logger,
File "/home/ondrej/proj/sandbox/heareval/src/heareval/heareval/predictions/task_predictions.py", line 1411, in task_predictions
in_memory=in_memory,
File "/home/ondrej/proj/sandbox/heareval/src/heareval/heareval/predictions/task_predictions.py", line 1106, in task_predictions_test
ckpt_path=grid_point.model_path, dataloaders=test_dataloader
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 795, in testself, self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interruptreturn trainer_fn(*args, **kwargs)
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 842, in _test_impl
results =self._run(model, ckpt_path=self.ckpt_path)
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run
results =self._run_stage()
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1188, in _run_stagereturnself._run_evaluate()
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1228, in _run_evaluate
eval_loop_results =self._evaluation_loop.run()
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/loops/loop.py", line 206, in run
output =self.on_run_end()
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 180, in on_run_endself._evaluation_epoch_end(self._outputs)
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 288, in _evaluation_epoch_endself.trainer._call_lightning_module_hook(hook_name, output_or_outputs)
File "/home/ondrej/mambaforge/envs/heareval/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1356, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/home/ondrej/proj/sandbox/heareval/src/heareval/heareval/predictions/task_predictions.py", line 305, in test_epoch_endself._score_epoch_end("test", outputs)
File "/home/ondrej/proj/sandbox/heareval/src/heareval/heareval/predictions/task_predictions.py", line 467, in _score_epoch_end
postprocessing_cached =self.epoch_best_postprocessing_or_default(epoch)
File "/home/ondrej/proj/sandbox/heareval/src/heareval/heareval/predictions/task_predictions.py", line 431, in epoch_best_postprocessing_or_defaultreturnself.epoch_best_postprocessing[epoch]
KeyError: 240
Testing DataLoader 0: 100%|██████████| 83/83 [00:02<00:00, 34.43it/s]
I'm using a conda environment. I have pytorch-lightning==1.9.5, torch==1.13.1 and scikit-learn==1.0.2.
The text was updated successfully, but these errors were encountered:
To get the wanted outcome, change this line to: trainer.fit_loop.epoch_progress.current.completed = grid_point.epoch.
This actually changes the value you get when calling self.current_epoch in _score_epoch_end (line 464).
Another solution would probably be to just set a new variable of the trainer and then retrieve the value of that variable where you need it.
I'm trying to run the benchmark but it crashes on the
dcase2016_task2
task. After training for what seems like 229 epochs, at the prediction stage, I get aKeyError
trying to access the postprocessing parameters at epoch 240:I'm using a conda environment. I have
pytorch-lightning==1.9.5
,torch==1.13.1
andscikit-learn==1.0.2
.The text was updated successfully, but these errors were encountered: