remove clone depth from ocf-data-sampler install in readme

zakwatts · zakwatts · commit 19b5ea24c830 · 2025-03-04T15:49:58.000Z
diff --git a/README.md b/README.md
@@ -1,11 +1,13 @@
 # PVNet 2.1
+
 <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
+
 [![All Contributors](https://img.shields.io/badge/all_contributors-8-orange.svg?style=flat-square)](#contributors-)
+
 <!-- ALL-CONTRIBUTORS-BADGE:END -->
 
  [![Python Bump Version & release](https://github.com/openclimatefix/PVNet/actions/workflows/release.yml/badge.svg)](https://github.com/openclimatefix/PVNet/actions/workflows/release.yml) [![ease of contribution: hard](https://img.shields.io/badge/ease%20of%20contribution:%20hard-bb2629)](https://github.com/openclimatefix/ocf-meta-repo?tab=readme-ov-file#overview-of-ocfs-nowcasting-repositories)
 
-
 This project is used for training PVNet and running PVNet on live data.
 
 PVNet2 is a multi-modal late-fusion model that largely inherits the same architecture from
@@ -18,7 +20,6 @@ feature vector is put through an output network which outputs predictions of the
 future GSP yield. National forecasts are made by adding all the GSP forecasts
 together.
 
-
 ## Experiments
 
 Our paper based on this repo was accepted into the Tackling Climate Change with Machine Learning workshop at ICLR 2024 and can be viewed [here](https://www.climatechange.ai/papers/iclr2024/46).
@@ -28,8 +29,6 @@ Some slightly more structured notes on deliberate experiments we have performed
 Some very rough, early working notes on this model are
 [here](https://docs.google.com/document/d/1fbkfkBzp16WbnCg7RDuRDvgzInA6XQu3xh4NCjV-WDA). These are now somewhat out of date.
 
-
-
 ## Setup / Installation
 
 ```bash
@@ -39,9 +38,11 @@ pip install .
 ```
 
 The commit history is extensive. To save download time, use a depth of 1:
+
 ```bash
 git clone --depth 1 https://github.com/openclimatefix/PVNet.git
 ```
+
 This means only the latest commit and its associated files will be downloaded.
 
 Next, in the PVNet repo, install PVNet as an editable package:
@@ -56,8 +57,6 @@ pip install -e .
 pip install ".[dev]"
 ```
 
-
-
 ## Getting started with running PVNet
 
 Before running any code in PVNet, copy the example configuration to a
@@ -76,29 +75,29 @@ As a minimum, in order to create batches of data/run PVNet, you will need to
 supply paths to NWP and GSP data. PV data can also be used. We list some
 suggested locations for downloading such datasets below:
 
-**GSP (Grid Supply Point)** - Regional PV generation data\
+**GSP (Grid Supply Point)** - Regional PV generation data
 The University of Sheffield provides API access to download this data:
 https://www.solar.sheffield.ac.uk/api/
 
 Documentation for querying generation data aggregated by GSP region can be found
 here:
 https://docs.google.com/document/d/e/2PACX-1vSDFb-6dJ2kIFZnsl-pBQvcH4inNQCA4lYL9cwo80bEHQeTK8fONLOgDf6Wm4ze_fxonqK3EVBVoAIz/pub#h.9d97iox3wzmd
 
-**NWP (Numerical weather predictions)**\
+**NWP (Numerical weather predictions)**
 OCF maintains a Zarr formatted version of the German Weather Service's (DWD)
 ICON-EU NWP model here:
 https://huggingface.co/datasets/openclimatefix/dwd-icon-eu which includes the UK
 
-**PV**\
+**PV**
 OCF maintains a dataset of PV generation from 1311 private PV installations
 here: https://huggingface.co/datasets/openclimatefix/uk_pv
 
-
 ### Connecting with ocf-data-sampler for batch creation
 
 Outside the PVNet repo, clone the ocf-data-sampler repo and exit the conda env created for PVNet: https://github.com/openclimatefix/ocf-data-sampler
+
 ```bash
-git clone --depth 1 https://github.com/openclimatefix/ocf-data-sampler.git
+git clone https://github.com/openclimatefix/ocf-data-sampler.git
 conda create -n ocf-data-sampler python=3.11
 ```
 
@@ -114,11 +113,14 @@ Then exit this environment, and enter back into the pvnet conda environment and
 pip install -e <PATH-TO-ocf-data-sampler-REPO>
 ```
 
+If you install the local version of `ocf-data-sampler` that is more recent than the version specified in PVNet, you might receive a warning. However, it should still function correctly.
+
 ## Generating pre-made batches of data for training/validation of PVNet
 
 PVNet contains a script for generating batches of data suitable for training the PVNet models. To run the script you will need to make some modifications to the datamodule configuration.
 
 Make sure you have copied the example configs (as already stated above):
+
 ```
 cp -r configs.example configs
 ```
@@ -127,7 +129,6 @@ cp -r configs.example configs
 
 We will use the following example config file for creating batches: `/PVNet/configs/datamodule/configuration/example_configuration.yaml`. Ensure that the file paths are set to the correct locations in `example_configuration.yaml`: search for `PLACEHOLDER` to find where to input the location of the files. You will need to comment out or delete the parts of `example_configuration.yaml` pertaining to the data you are not using.
 
-
 When creating batches, an additional datamodule config located in `PVNet/configs/datamodule` is passed into the batch creation script: `streamed_batches.yaml`. Like before, a placeholder variable is used when specifying which configuration to use:
 
 ```yaml
@@ -151,6 +152,7 @@ Run the `save_samples.py` script to create batches with the parameters specified
 ```bash
 python scripts/save_samples.py
 ```
+
 PVNet uses
 [hydra](https://hydra.cc/) which enables us to pass variables via the command
 line that will override the configuration defined in the `./configs` directory, like this:
@@ -185,7 +187,6 @@ satellite:
 
 ocf-data-sampler is currently set up to use 11 channels from the satellite data, the 12th of which is HRV and is not included in these.
 
-
 ### Training PVNet
 
 How PVNet is run is determined by the extensive configuration in the config
@@ -194,13 +195,13 @@ files. The configs stored in `PVNet/configs.example` should work with batches cr
 Make sure to update the following config files before training your model:
 
 1. In `configs/datamodule/local_premade_batches.yaml`:
-    - update `batch_dir` to point to the directory you stored your batches in during batch creation
+   - update `batch_dir` to point to the directory you stored your batches in during batch creation
 2. In `configs/model/local_multimodal.yaml`:
-    - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
-        - `in_channels`: number of variables your NWP source supplies
-        - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
+   - update the list of encoders to reflect the data sources you are using. If you are using different NWP sources, the encoders for these should follow the same structure with two important updates:
+     - `in_channels`: number of variables your NWP source supplies
+     - `image_size_pixels`: spatial crop of your NWP data. It depends on the spatial resolution of your NWP; should match `image_size_pixels_height` and/or `image_size_pixels_width` in `datamodule/configuration/site_example_configuration.yaml` for the NWP, unless transformations such as coarsening was applied (e. g. as for ECMWF data)
 3. In `configs/local_trainer.yaml`:
-    - set `accelerator: 0` if running on a system without a supported GPU
+   - set `accelerator: 0` if running on a system without a supported GPU
 
 If creating copies of the config files instead of modifying existing ones, update `defaults` in the main `./configs/config.yaml` file to use
 your customised config files:
@@ -228,7 +229,6 @@ python run.py
 
 If you have successfully trained a PVNet model and have a saved model checkpoint you can create a backtest using this, e.g. forecasts on historical data to evaluate forecast accuracy/skill. This can be done by running one of the scripts in this repo such as [the UK GSP backtest script](scripts/backtest_uk_gsp.py) or the [the pv site backtest script](scripts/backtest_sites.py), further info on how to run these are in each backtest file.
 
-
 ## Testing
 
 You can use `python -m pytest tests` to run tests
@@ -238,8 +238,11 @@ You can use `python -m pytest tests` to run tests
 Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
 
 <!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
+
 <!-- prettier-ignore-start -->
+
 <!-- markdownlint-disable -->
+
 <table>
   <tbody>
     <tr>
@@ -258,6 +261,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
 </table>
 
 <!-- markdownlint-restore -->
+
 <!-- prettier-ignore-end -->
 
 <!-- ALL-CONTRIBUTORS-LIST:END -->