Description
Detailed Description
Following on from #1 I wanted to write this issue.
We currently have lots of NWP data and PVLive data. Its too much to go into memory, so we have to cut it down ready for ML experiments. The way we've done this in the past is to create samples of data. This are smaller chinks of the data, that contain specific data for a certain time (and space). These then get batched up in a dataloader and the ML model can then train from them.
So we want to build a pipeline for making these samples (most of the work is done in ocf-data-sampler)
Context
- NWP = numerical weather predictions data
- PVLive data, national solar generation data
- We have GFS on S3 and we have been collecting Metoffice Global data
- @jcamier @siddharth7113 and others have been working on this already
- ocf-data-sampler is python library used to create samples from large datasets.
Possible Implementation
There are lots of ways to do this, but theres a suggestion to use ocf-data-sampler
You start with a data configuration (see below) that tells ocf-data-sampler what to load and other specific bits.
It would be really great to
- Create a configuration for this project. Perhaps starting with GFS and PVLive, and then adding Metoffice later
- Using this configuration, run ocf-data-sampler and make some samples
- Make a script for 2. so that others can use it.
- Same samples, maybe in s3, so others can use them
The ocf-data-sampler class we recommend using is PVNetUKRegionalDataset, but there are a few things that might need adding like
- GFS normalisation constants
- A loading file for GFS data that is in ocf-data-sampler Support for GFS Data in ocf-data-sampler ocf-data-sampler#188
- Make it work for gsp_id=0 (this is national)
- Setting the correct values for the three configuration files that are required which are https://github.com/openclimatefix/PVNet/blob/main/configs.example/config.yaml, https://github.com/openclimatefix/PVNet/blob/main/configs.example/datamodule/configuration/example_configuration.yaml and https://github.com/openclimatefix/PVNet/blob/main/configs.example/datamodule/streamed_batches.yaml
Metadata
Metadata
Assignees
Type
Projects
Status