ML pipeline to make samples

## Detailed Description
Following on from #1 I wanted to write this issue. 

We currently have lots of NWP data and PVLive data. Its too much to go into memory, so we have to cut it down ready for ML experiments. The way we've done this in the past is to create samples of data. This are smaller chinks of the data, that contain specific data for a certain time (and space). These then get batched up in a dataloader and the ML model can then train from them. 

So we want to build a pipeline for making these samples (most of the work is done in ocf-data-sampler)

## Context
- NWP = numerical weather predictions data
- PVLive data, national solar generation data
- We have GFS on S3 and we have been collecting Metoffice Global data
- @jcamier @siddharth7113 and others have been working on this already
- ocf-data-sampler is python library used to create samples from large datasets. 

## Possible Implementation
There are lots of ways to do this, but theres a suggestion to use ocf-data-sampler

You start with a data configuration (see below) that tells ocf-data-sampler what to load and other specific bits. 

It would be really great to
1. Create a configuration for this project. Perhaps starting with GFS and PVLive, and then adding Metoffice later
2. Using this configuration, run ocf-data-sampler and make some samples
3. Make a script for 2. so that others can use it.
4. Same samples, maybe in s3, so others can use them

The ocf-data-sampler class we recommend using is [PVNetUKRegionalDataset](https://github.com/openclimatefix/ocf-data-sampler/blob/main/ocf_data_sampler/torch_datasets/datasets/pvnet_uk.py#L172), but there are a few things that might need adding like
- [x] GFS normalisation constants
- [x] A loading file for GFS data that is in ocf-data-sampler https://github.com/openclimatefix/ocf-data-sampler/issues/188 
- [x] Make it work for gsp_id=0 (this is national)
- [x] Setting the correct values for the three configuration files that are required which are https://github.com/openclimatefix/PVNet/blob/main/configs.example/config.yaml, https://github.com/openclimatefix/PVNet/blob/main/configs.example/datamodule/configuration/example_configuration.yaml and https://github.com/openclimatefix/PVNet/blob/main/configs.example/datamodule/streamed_batches.yaml 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ML pipeline to make samples #62

Detailed Description

Context

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ML pipeline to make samples #62

Description

Detailed Description

Context

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions