-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process Met Office extended NWP data #4
Comments
![]() Appears we are still missing lots of Met Office UK Extended NWP data. (Using init times from @devsjc, for when you are back, is there another location where this data might be kept ? I was under the impression that it was downloaded but maybe we still need to pull it. |
A Dagster partition fill was ran on 17th of December to download 2024 but failed, possibly due to the storage_b disk becoming full |
Just to highlight this further, a few errors are notable when executing save_samples.py. Brief structural overview of both 2022 and 2023 data (worth noting that specific keys are not present in certain dates also):
|
Regarding the specific errors: ValueError: num_samples=0 (2023 validation error) KeyError: "not all values found in index 'step'" (Cross year error) Step indexing seems to fail when trying to handle different temporal structures across years. I think that fundamentally this structural break is causing the dataloader to fail when trying to work across effectively different data formats. |
Previously the Met Office extended NWP data was processed for the purpose of running the 2023 backtest, so this year was prioritised and processed before previous years had downloaded.
The 2017-2024 data needs to be re-processed and uploaded to a GCP disk so it can be used for training a DA PVNet model. (Additional init times may have be downloaded so likely worth re processing again).
Steps:
unzip_mo.py
combine_proc_zarrs_mo.py
gs://solar-pv-nowcasting-data/NWP/UK_Met_Office
with the nameUKV_extended_v2
.uk-all-inputs-v2
, the duplicate can be calleduk-all-inputs-v3
. Given we are adding possible 8TB of new data, the disk size will need to be increased to avoid running out of storage space when transferring data.Steps 1-3 are done on Leonardo, where the raw data is located.
Possible Errors and Issues along the way
Issues and Important Considerations for NWP Processing
in the README of this repo for more.It's worth noting that due to the file sizes, this process can take a long time.
The text was updated successfully, but these errors were encountered: