Upgrade UK to data sampler #276

dfulu · 2024-11-13T09:55:17Z

This PR upgrades the UK part datamodule to ocf-data-sampler. This works towards #273

Notes:

The datamodules for wind and pvsite have not been updated
This breaks PVNet for wind and pvsites due to an update in the target variable shape, and in the time units coming from data-sampler
I have also stripped out the exponential weighted losses. We haven't used them in a very long time if ever
Also stripped out parts of model training test routine, we only ever use train and validate
There are also a couple of minor bug fixes and clean ups that struck me along the way
Update tests

pvnet/data/datamodule.py

pvnet/models/multimodal/multimodal.py

pvnet/utils.py

scripts/checkpoint_to_huggingface.py

for more information, see https://pre-commit.ci

…into data_sampler

for more information, see https://pre-commit.ci

…into data_sampler

for more information, see https://pre-commit.ci

pvnet/data/datamodule.py

Co-authored-by: Sukhil Patel <[email protected]>

for more information, see https://pre-commit.ci

tests/test_data/presaved_samples/data_configuration.yaml

pvnet/models/base_model.py

Sukh-P · 2024-11-15T11:55:04Z

More of a general question, looking through am I right in thinking that we are (for the UK regional stuff at least) moving from saving batches (batch_size number of samples saved together) to saving each individual sample in separate .pt files, does the extra writes (a write to .pt for each sample now) and reads impact batch/sample creation time / training times significantly or is it negligible?

pvnet/models/base_model.py

Sukh-P

Really great work, thanks for adding this in!

dfulu · 2024-11-15T12:57:40Z

More of a general question, looking through am I right in thinking that we are (for the UK regional stuff at least) moving from saving batches (batch_size number of samples saved together) to saving each individual sample in separate .pt files, does the extra writes (a write to .pt for each sample now) and reads impact batch creation time / training times significantly or is it negligible?

Yes I did change it so we would save samples. I honestly don't know how it impacts speed, I haven't measured it. I can only say I haven't noticed much difference between how long it took before and what it takes now. But by saving samples we cut down a lot of RAM usage. Previously, we were loading a bunch of batches, splitting the batches into samples, shuffling the samples, and recombining back into batches on the fly. So a lot more samples had to be stored in memory at a given time. This was to make sure the same samples weren't always in a batch together. I do think that saving them as individual samples makes more sense. I think it is hard to evaluate performance-wise since it will depend on the hardware and since we plan to change the data structure we save to disk, it would soon become outdated. So honestly I don't know about speed, but it does cut down RAM usage, which might allow us to use more workers for a given machine size

Sukh-P · 2024-11-20T15:08:22Z

Starting to use this code to test out the Site Torch Dataset pipeline, whilst doing that I thought it would be good to update the config example parameters in the datamodule folder which mention "batch" to "samples" to align with the new save_samples.py script

In fact more generally that example config folder needs updating to match the Configuration data model now in ocf-data-sampler

peterdudfield · 2024-11-25T09:34:13Z

is it good to merge this? Make sure @Sukh-P is working from that branch then?

dfulu · 2024-11-25T10:21:26Z

Well there is still the example configs to update, but yeh then we could merge into the dev-data-sampler branch

dfulu added 4 commits August 9, 2024 12:49

hot fix for checkpoint upload

e668a34

update batch save script to ocf-data-sampler

a281271

add dataloader and clean

07dece5

set dask default

292ad9d

dfulu commented Nov 13, 2024

View reviewed changes

pvnet/data/datamodule.py Outdated Show resolved Hide resolved

pvnet/models/multimodal/multimodal.py Outdated Show resolved Hide resolved

pvnet/utils.py Outdated Show resolved Hide resolved

scripts/checkpoint_to_huggingface.py Outdated Show resolved Hide resolved

dfulu and others added 12 commits November 13, 2024 10:11

Merge branch 'dev-data-sampler' into data_sampler

4f0a3dd

[pre-commit.ci] auto fixes from pre-commit.com hooks

db81147

for more information, see https://pre-commit.ci

tidy

87d5718

Merge branch 'data_sampler' of https://github.com/openclimatefix/PVNet …

465e518

…into data_sampler

[pre-commit.ci] auto fixes from pre-commit.com hooks

1c27bd6

for more information, see https://pre-commit.ci

update reqs

50c5552

Merge branch 'data_sampler' of https://github.com/openclimatefix/PVNet …

ee232f7

…into data_sampler

typo

d17eaac

linting

bfdf112

update tests

4d69524

[pre-commit.ci] auto fixes from pre-commit.com hooks

85c32a3

for more information, see https://pre-commit.ci

Merge branch 'dev-data-sampler' into data_sampler

a17f8dd

dfulu requested a review from Sukh-P November 13, 2024 13:25

Sukh-P reviewed Nov 14, 2024

View reviewed changes

pvnet/data/datamodule.py Outdated Show resolved Hide resolved

dfulu and others added 3 commits November 14, 2024 13:05

Add type hint

0e618fc

Co-authored-by: Sukhil Patel <[email protected]>

Add missing import

1644f87

[pre-commit.ci] auto fixes from pre-commit.com hooks

fc0cfb9

for more information, see https://pre-commit.ci

Sukh-P reviewed Nov 15, 2024

View reviewed changes

tests/test_data/presaved_samples/data_configuration.yaml Show resolved Hide resolved

Sukh-P reviewed Nov 15, 2024

View reviewed changes

pvnet/models/base_model.py Show resolved Hide resolved

Sukh-P reviewed Nov 15, 2024

View reviewed changes

pvnet/models/base_model.py Show resolved Hide resolved

Sukh-P approved these changes Nov 15, 2024

View reviewed changes

Sukh-P merged commit 82b6009 into dev-data-sampler Dec 17, 2024
3 checks passed

Sukh-P deleted the data_sampler branch December 18, 2024 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade UK to data sampler #276

Upgrade UK to data sampler #276

dfulu commented Nov 13, 2024 •

edited

Loading

Sukh-P commented Nov 15, 2024 •

edited

Loading

Sukh-P left a comment

dfulu commented Nov 15, 2024

Sukh-P commented Nov 20, 2024 •

edited

Loading

peterdudfield commented Nov 25, 2024

dfulu commented Nov 25, 2024

Upgrade UK to data sampler #276

Upgrade UK to data sampler #276

Conversation

dfulu commented Nov 13, 2024 • edited Loading

Sukh-P commented Nov 15, 2024 • edited Loading

Sukh-P left a comment

Choose a reason for hiding this comment

dfulu commented Nov 15, 2024

Sukh-P commented Nov 20, 2024 • edited Loading

peterdudfield commented Nov 25, 2024

dfulu commented Nov 25, 2024

dfulu commented Nov 13, 2024 •

edited

Loading

Sukh-P commented Nov 15, 2024 •

edited

Loading

Sukh-P commented Nov 20, 2024 •

edited

Loading