Skip to content

Commit 1bf3550

Browse files
authored
Update README.md
1 parent 69705a0 commit 1bf3550

File tree

1 file changed

+42
-170
lines changed

1 file changed

+42
-170
lines changed

README.md

Lines changed: 42 additions & 170 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# PVNet 2.1
22

3-
[![test-release](https://github.com/openclimatefix/PVNet/actions/workflows/test-release.yml/badge.svg)](https://github.com/openclimatefix/PVNet/actions/workflows/test-release.yml)
3+
[![Python Bump Version & release](https://github.com/openclimatefix/PVNet/actions/workflows/release.yml/badge.svg)](https://github.com/openclimatefix/PVNet/actions/workflows/release.yml)
44

5-
This project is used for training PVNet and running PVnet on live data.
5+
This project is used for training PVNet and running PVNet on live data.
66

77
PVNet2 is a multi-modal late-fusion model that largely inherits the same architecture from
8-
[PVNet1.0](https://github.com/openclimatefix/predict_pv_yield). The NWP and
8+
[PVNet1.0](https://github.com/openclimatefix/predict_pv_yield). The NWP (Numerical Weather Prediction) and
99
satellite data are sent through some neural network which encodes them down to
10-
1D intermediate representations. These are concatenated together with the GSP
10+
1D intermediate representations. These are concatenated together with the GSP (Grid Supply Point)
1111
output history, the calculated solar coordinates (azimuth and elevation) and the
1212
GSP ID which has been put through an embedding layer. This 1D concatenated
1313
feature vector is put through an output network which outputs predictions of the
@@ -56,7 +56,7 @@ pip install ".[dev]"
5656

5757
## Getting started with running PVNet
5858

59-
Before running any code in within PVNet, copy the example configuration to a
59+
Before running any code in PVNet, copy the example configuration to a
6060
configs directory:
6161

6262
```
@@ -74,14 +74,14 @@ suggested locations for downloading such datasets below:
7474

7575
**GSP (Grid Supply Point)** - Regional PV generation data\
7676
The University of Sheffield provides API access to download this data:
77-
https://www.solar.sheffield.ac.uk/pvlive/api/
77+
https://www.solar.sheffield.ac.uk/api/
7878

7979
Documentation for querying generation data aggregated by GSP region can be found
8080
here:
8181
https://docs.google.com/document/d/e/2PACX-1vSDFb-6dJ2kIFZnsl-pBQvcH4inNQCA4lYL9cwo80bEHQeTK8fONLOgDf6Wm4ze_fxonqK3EVBVoAIz/pub#h.9d97iox3wzmd
8282

8383
**NWP (Numerical weather predictions)**\
84-
OCF maintains a Zarr formatted version the German Weather Service's (DWD)
84+
OCF maintains a Zarr formatted version of the German Weather Service's (DWD)
8585
ICON-EU NWP model here:
8686
https://huggingface.co/datasets/openclimatefix/dwd-icon-eu which includes the UK
8787

@@ -121,212 +121,84 @@ cp -r configs.example configs
121121

122122
### Set up and config example for batch creation
123123

124-
We will use the example of creating batches using data from gcp:
125-
`/PVNet/configs/datamodule/configuration/gcp_configuration.yaml`
126-
Ensure that the file paths are set to the correct locations in
127-
`gcp_configuration.yaml`.
124+
We will use the following example config file for creating batches: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
128125

129-
`PLACEHOLDER` is used to indcate where to input the location of the files.
130126

131-
For OCF use cases, file locations can be found in `template_configuration.yaml` located alongside `gcp_configuration.yaml`.
127+
When creating batches, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the batch creation script: `streamed_batches.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
132128

133-
In these configurations you can update the train, val & test periods to cover the data you have access to.
134-
135-
136-
With your configuration in place, you can proceed to create batches. PVNet uses
137-
[hydra](https://hydra.cc/) which enables us to pass variables via the command
138-
line that will override the configuration defined in the `./configs` directory.
139-
140-
When creating batches, an additional config is used which is passed into the batch creation script. This is the datamodule config located `PVNet/configs/datamodule`.
141-
142-
For this example we will be using the `streamed_batches.yaml` config. Like before, a placeholder variable is used when specifing which configuration to use:
143-
144-
`configuration: "PLACEHOLDER.yaml"`
129+
```yaml
130+
configuration: "PLACEHOLDER.yaml"
131+
```
145132
146-
This should be given the whole path to the config on your local machine, such as for our example it should be changed to:
133+
This should be given the whole path to the config on your local machine, for example:
147134
148-
`configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/gcp_configuration.yaml"`
149-
`
135+
```yaml
136+
configuration: "/FULL-PATH-TO-REPO/PVNet/configs/datamodule/configuration/example_configuration.yaml"
137+
```
150138
151139
Where `FULL-PATH-TO-REPO` represent the whole path to the PVNet repo on your local machine.
152140

141+
This is also where you can update the train, val & test periods to cover the data you have access to.
142+
153143
### Running the batch creation script
154144

155-
Run the save_batches.py script to create batches if setting parameters in the datamodule config (`streamed_batches.yaml` in this example):
145+
Run the `save_batches.py` script to create batches with the parameters specified in the datamodule config (`streamed_batches.yaml` in this example):
156146

157-
```
147+
```bash
158148
python scripts/save_batches.py
159149
```
160-
or with the following example arguments to override config:
150+
PVNet uses
151+
[hydra](https://hydra.cc/) which enables us to pass variables via the command
152+
line that will override the configuration defined in the `./configs` directory, like this:
161153

162-
```
154+
```bash
163155
python scripts/save_batches.py datamodule=streamed_batches datamodule.batch_output_dir="./output" datamodule.num_train_batches=10 datamodule.num_val_batches=5
164156
```
165157

166-
In this function the datamodule argument looks for a config under `PVNet/configs/datamodule`. The examples here are either to use "premade_batches" or "streamed_batches".
167-
168-
Its important that the dates set for the training, validation and testing in the datamodule (`streamed_batches.yaml`) config are within the ranges of the dates set for the input features in the configuration (`gcp_configuration.yaml`).
158+
`scripts/save_batches.py` needs a config under `PVNet/configs/datamodule`. You can adapt `streamed_batches.yaml` or create your own in the same folder.
169159

170-
If downloading private data from a gcp bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
160+
If downloading private data from a GCP bucket make sure to authenticate gcloud (the public satellite data does not need authentication):
171161

172162
```
173163
gcloud auth login
174164
```
175165

176-
For files stored in multiple locations they can be added as list. For example from the gcp_configuration.yaml file we can change from satellite data stored on a bucket:
166+
Files stored in multiple locations can be added as a list. For example, in the `example_configuration.yaml` file we can supply a path to satellite data stored on a bucket:
177167

178168
```yaml
179169
satellite:
180170
satellite_zarr_path: gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr
181171
```
182172

183-
To satellite data hosted by Google:
173+
Or to satellite data hosted by Google:
184174

185175
```yaml
186176
satellite:
187177
satellite_zarr_paths:
188178
- "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_nonhrv.zarr"
189179
- "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2021_nonhrv.zarr"
190180
```
191-
Datapipes is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
181+
182+
Datapipes are currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
192183

193184

194185
### Training PVNet
195186

196187
How PVNet is run is determined by the extensive configuration in the config
197-
files. The following configs have been tested to work using batches of data
198-
created using the steps and batch creation config mentioned above.
199-
200-
You should create the following configs before trying to train a model locally,
201-
as so:
202-
203-
In `configs/datamodule/local_premade_batches.yaml`:
204-
205-
```yaml
206-
_target_: pvnet.data.datamodule.DataModule
207-
configuration: null
208-
batch_dir: "./output" # where the batches are saved
209-
num_workers: 20
210-
prefetch_factor: 2
211-
batch_size: 8
212-
```
213-
214-
In `configs/model/local_multimodal.yaml`:
188+
files. The configs stored in `PVNet/configs.example` should work with batches created using the steps and batch creation config mentioned above.
215189

216-
```yaml
217-
_target_: pvnet.models.multimodal.multimodal.Model
218-
219-
output_quantiles: [0.02, 0.1, 0.25, 0.5, 0.75, 0.9, 0.98]
220-
221-
#--------------------------------------------
222-
# NWP encoder
223-
#--------------------------------------------
224-
225-
nwp_encoders_dict:
226-
ukv:
227-
_target_: pvnet.models.multimodal.encoders.encoders3d.DefaultPVNet
228-
_partial_: True
229-
in_channels: 10
230-
out_features: 256
231-
number_of_conv3d_layers: 6
232-
conv3d_channels: 32
233-
image_size_pixels: 24
234-
235-
#--------------------------------------------
236-
# Sat encoder settings
237-
#--------------------------------------------
238-
239-
# Ignored as premade batches were created without satellite data
240-
# sat_encoder:
241-
# _target_: pvnet.models.multimodal.encoders.encoders3d.DefaultPVNet
242-
# _partial_: True
243-
# in_channels: 11
244-
# out_features: 256
245-
# number_of_conv3d_layers: 6
246-
# conv3d_channels: 32
247-
# image_size_pixels: 24
248-
249-
add_image_embedding_channel: False
250-
251-
#--------------------------------------------
252-
# PV encoder settings
253-
#--------------------------------------------
254-
255-
pv_encoder:
256-
_target_: pvnet.models.multimodal.site_encoders.encoders.SingleAttentionNetwork
257-
_partial_: True
258-
num_sites: 349
259-
out_features: 40
260-
num_heads: 4
261-
kdim: 40
262-
pv_id_embed_dim: 20
263-
264-
#--------------------------------------------
265-
# Tabular network settings
266-
#--------------------------------------------
267-
268-
output_network:
269-
_target_: pvnet.models.multimodal.linear_networks.networks.ResFCNet2
270-
_partial_: True
271-
fc_hidden_features: 128
272-
n_res_blocks: 6
273-
res_block_layers: 2
274-
dropout_frac: 0.0
275-
276-
embedding_dim: 16
277-
include_sun: True
278-
include_gsp_yield_history: False
279-
280-
#--------------------------------------------
281-
# Times
282-
#--------------------------------------------
283-
284-
# Foreast and time settings
285-
history_minutes: 60
286-
forecast_minutes: 120
287-
288-
min_sat_delay_minutes: 60
289-
290-
sat_history_minutes: 90
291-
pv_history_minutes: 60
292-
293-
# These must be set for each NWP encoder
294-
nwp_history_minutes:
295-
ukv: 60
296-
nwp_forecast_minutes:
297-
ukv: 120
298-
299-
# ----------------------------------------------
300-
# Optimizer
301-
# ----------------------------------------------
302-
optimizer:
303-
_target_: pvnet.optimizers.EmbAdamWReduceLROnPlateau
304-
lr: 0.0001
305-
weight_decay: 0.01
306-
amsgrad: True
307-
patience: 5
308-
factor: 0.1
309-
threshold: 0.002
310-
```
190+
Make sure to update the following config files before training your model:
311191

312-
In `configs/local_trainer.yaml`:
313-
314-
```yaml
315-
_target_: lightning.pytorch.trainer.trainer.Trainer
316-
317-
accelerator: cpu # Important if running on a system without a supported GPU
318-
devices: auto
319-
320-
min_epochs: null
321-
max_epochs: null
322-
reload_dataloaders_every_n_epochs: 0
323-
num_sanity_val_steps: 8
324-
fast_dev_run: false
325-
accumulate_grad_batches: 4
326-
log_every_n_steps: 50
327-
```
192+
1. In `configs/datamodule/local_premade_batches.yaml`:
193+
- update `batch_dir` to point to the directory you stored your batches in during batch creation
194+
2. In `configs/model/local_multimodal.yaml`:
195+
- update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
196+
- `in_channels`: number of variables your NWP source supplies
197+
- `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `nwp_image_size_pixels_height` and/or `nwp_image_size_pixels_width` in `datamodule/example_configs.yaml`, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
198+
3. In `configs/local_trainer.yaml`:
199+
- set `accelerator: 0` if running on a system without a supported GPU
328200

329-
And finally update `defaults` in the main `./configs/config.yaml` file to use
201+
If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
330202
your customised config files:
331203

332204
```yaml
@@ -350,7 +222,7 @@ python run.py
350222

351223
## Backtest
352224

353-
If you have succesfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK gsp backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
225+
If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
354226

355227

356228
## Testing

0 commit comments

Comments
 (0)