extended readme; added max_scaling_factor to variance scaling method

btschwertfeger · btschwertfeger · commit 36f42cd3d1a3 · 2022-11-13T09:44:57.000+01:00
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Bias adjustment/correction procedures for climatic reasearch
 
-<div style="text-align: center">
+<div align="center">
 
 [![GitHub](https://badgen.net/badge/icon/github?icon=github&label)](https://github.com/btschwertfeger/Bias-Adjustment-Python)
 [![Generic badge](https://img.shields.io/badge/python-3.7+-green.svg)](https://shields.io/)
@@ -9,12 +9,26 @@
 
 </div>
 
-Collection of different scale- and distribution-based bias adjustment techniques for climatic research (see `examples.ipynb` for help).
+This Python module contains a collection of different scale- and distribution-based bias adjustment techniques for climatic research (see `examples.ipynb` for help).
 
-Bias adjustment procedures in Python are very slow, so they should not be used on large data sets.
-A C++ implementation that works way faster can be found [here](https://github.com/btschwertfeger/Bias-Adjustment-Cpp).
+Since the Python programming language is very slow and bias adjustments are complex statistical transformations, it is recommended to use the C++ implementation on large data sets. This can be found [here](https://github.com/btschwertfeger/Bias-Adjustment-Cpp).
 
-## About
+---
+
+## Table of Contents
+
+1. [ About ](#about)
+2. [ Available Methods ](#methods)
+3. [ Installation ](#installation)
+4. [ Usage and Examples ](#examples)
+5. [ Notes ](#notes)
+6. [ References ](#references)
+
+---
+
+<a name="about"></a>
+
+## 1. About
 
 These programs and data structures are designed to help minimize discrepancies between modeled and observed climate data. Data from past periods are used to adjust variables from current and future time series so that their distributional properties approximate possible actual values.
 
@@ -33,31 +47,41 @@ In this way, for example, modeled data, which on average represent values that a
   src="images/dm-doy-plot.png?raw=true"
   alt="Temperature per day of year in modeled, observed and bias-adjusted climate data"
   style="background-color: white; border-radius: 7px">
-  <figcaption>Figure 2: Temperature per day of year in modeled, observed and bias-adjusted climate data</figcaption>
+  <figcaption>Figure 2: Temperature per day of year in observed, modeled and bias-adjusted climate data</figcaption>
 </figure>
 
 ---
 
-## Available methods:
+<a name="methods"></a>
+
+## 2. Available methods:
+
+All methods except the `adjust_3d` function requires the application on one time series.
 
-- Linear Scaling (additive and multiplicative)
-- Variance Scaling (additive)
-- Delta (Change) Method (additive and multiplicative)
-- Quantile Mapping (additive)
-- Detrended Quantile Mapping (additive and multiplicative)
-- Quantile Delta Mapping (additive and multuplicative)
+| Function name            | Description                                                                                  |
+| ------------------------ | -------------------------------------------------------------------------------------------- |
+| `linear_scaling`         | Linear Scaling (additive and multiplicative)                                                 |
+| `variance_scaling`       | Variance Scaling (additive)                                                                  |
+| `delta_method`           | Delta (Change) Method (additive and multiplicative)                                          |
+| `quantile_mapping`       | Quantile Mapping (additive) and Detrended Quantile Mapping (additive and multiplicative)     |
+| `quantile_delta_mapping` | Quantile Delta Mapping (additive and multiplicative)                                         |
+| `adjust_3d`              | requires a method name and the respective parameters to adjust all time series of a data set |
 
 ---
 
-## Usage
+<a name="installation"></a>
 
-### Installation
+## 3. Installation
 
 ```bash
 python3 -m pip install python-cmethods
 ```
 
-### Import and application
+---
+
+<a name="examples"></a>
+
+## 4. Usage and Examples
 
 ```python
 import xarray as xr
@@ -91,15 +115,13 @@ qdm_result = cm.adjust_3d( # 3d = 2 spatial and 1 time dimension
 Notes:
 
 - When using the `adjust_3d` method you have to specify the method by name.
-- For the multiplicative linear scaling and delta method is a maximum scaling factor of 10 set. This can be changed by the `max_scaling_factor` parameter.
-
----
+- For the multiplicative linear scaling and the delta method as well as the variance scaling method a maximum scaling factor of 10 is defined. This can be changed by the parameter `max_scaling_factor`.
 
 ## Examples (see repository on [GitHub](https://github.com/btschwertfeger/Bias-Adjustment-Python))
 
-`/examples/examples.ipynb`: Notebook containing different methods and plots
+Notebook with different methods and plots: `/examples/examples.ipynb`
 
-`/examples/do_bias_correction.py`: Example script for adjusting climate data
+Example script for adjusting climate data: `/examples/do_bias_correction.py`
 
 ```bash
 python3 do_bias_correction.py         \
@@ -109,26 +131,32 @@ python3 do_bias_correction.py         \
     --method linear_scaling           \
     --variable tas                    \
     --unit '°C'                       \
-    --group time.month                \
+    --group 'time.month'              \
     --kind +
 ```
 
 - Linear and variance, as well as delta change method require `--group time.month` as argument.
-- Adjustment methods that apply changes in distributional biasses (QM, QDM, DQM; EQM, ...) need the `--nquantiles` argument set to some integer.
-- Data sets should have the same spatial resolutions.
+- Adjustment methods that apply changes in distributional biasses (QM, QDM, DQM, ...) need the `--nquantiles` argument set to some integer.
+- Data sets must have the same spatial resolutions.
 
 ---
 
-## Notes
+<a name="notes"></a>
 
-- Computation in Python takes some time, so this is only for demonstration. When adjusting large datasets, its best to the C++ implementation mentioned above.
+## 5. Notes
+
+- Computation in Python takes some time, so this is only for demonstration. When adjusting large datasets, its best to use the C++ implementation mentioned above.
 - Formulas and references can be found in the implementations of the corresponding functions.
 
-## Space for improvements
+### Space for improvements:
+
+Since the scaling methods implemented so far scale by default over the mean values of the respective months, unrealistic long-term mean values may occur at the month transitions. This can be prevented either by selecting `group='time.dayofyear'`. Alternatively, it is possible not to scale using long-term mean values, but using a 31-day interval, which takes the 31 surrounding values over all years as the basis for calculating the mean values. This is not yet implemented in this module, but is available in the C++ implementation [here](https://github.com/btschwertfeger/Bias-Adjustment-Cpp).
+
+---
 
-Since the scaling methods implemented so far scale by default over the mean values of the respective months, unrealistic long-term mean values may occur at the month transitions. This can be prevented either by selecting `group='time.dayofyear`. Alternatively, it is possible not to scale using long-term mean values, but using a 30-day interval, which takes the 30 surrounding values over all years as the basis for calculating the mean values. This is not yet implemented in this module, but is available in the C++ implementation [here](https://github.com/btschwertfeger/Bias-Adjustment-Cpp).
+<a name="references"></a>
 
-## References
+## 6. References
 
 - Schwertfeger, Benjamin Thomas (2022) The influence of bias corrections on variability, distribution, and correlation of temperatures in comparison to observed and modeled climate data in Europe (https://epic.awi.de/id/eprint/56689/)
 - Linear Scaling and Variance Scaling based on: Teutschbein, Claudia and Seibert, Jan (2012) Bias correction of regional climate model simulations for hydrological climate-change impact studies: Review and evaluation of different methods (https://doi.org/10.1016/j.jhydrol.2012.05.052)
diff --git a/cmethods/CMethods.py b/cmethods/CMethods.py
@@ -272,7 +272,7 @@ def linear_scaling(cls,
             >    obs=obs[variable],
             >    simh=simh[variable],
             >    simp=simp[variable],
-            >    group='time.month' # optional
+            >    group='time.month' # optional, this is default here
             >)
 
         ----- E Q U A T I O N S -----
@@ -292,13 +292,11 @@ def linear_scaling(cls,
         else:
             if kind in cls.ADDITIVE: return np.array(simp) + (np.nanmean(obs) - np.nanmean(simh)) # Eq. 1
             elif kind in cls.MULTIPLICATIVE: 
-                scaling_factor = (np.nanmean(obs) / np.nanmean(simh))
-                if scaling_factor > 0 and scaling_factor > abs(kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR)):
-                    return np.array(simp) * abs(kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR))
-                elif scaling_factor < 0 and scaling_factor < -abs(kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR)):
-                    return np.array(simp) * -abs(kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR))
-                else: 
-                    return np.array(simp) * scaling_factor # Eq. 2
+                adj_scaling_factor = cls.get_adjusted_scaling_factor(
+                    np.nanmean(obs) / np.nanmean(simh), 
+                    kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR)
+                )
+                return np.array(simp) * adj_scaling_factor # Eq. 2
             else: raise ValueError('Scaling type invalid. Valid options for param kind: "+" and "*"')
 
     # ? -----========= V A R I A N C E - S C A L I N G =========------
@@ -360,8 +358,12 @@ def variance_scaling(cls,
             VS_1_simh = LS_simh - np.nanmean(LS_simh)                   # Eq. 3
             VS_1_simp = LS_simp - np.nanmean(LS_simp)                   # Eq. 4
 
-            VS_2_simp = VS_1_simp * (np.std(obs) / np.std(VS_1_simh))   # Eq. 5
-
+            adj_scaling_factor = cls.get_adjusted_scaling_factor(
+                np.std(obs) / np.std(VS_1_simh), 
+                kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR)
+            )
+            
+            VS_2_simp = VS_1_simp * adj_scaling_factor                  # Eq. 5
             return VS_2_simp + np.nanmean(LS_simp)                      # Eq. 6
 
     # ? -----========= D E L T A - M E T H O D =========------
@@ -414,13 +416,11 @@ def delta_method(cls,
         else:
             if kind in cls.ADDITIVE: return np.array(obs) + (np.nanmean(simp) - np.nanmean(simh))     # Eq. 1
             elif kind in cls.MULTIPLICATIVE: 
-                scaling_factor = (np.nanmean(simp) / np.nanmean(simh))  
-                if scaling_factor > 0 and scaling_factor > abs(kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR)):
-                    return np.array(obs) * abs(kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR))
-                elif scaling_factor < 0 and scaling_factor < -abs(kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR)):
-                    return np.array(obs) * -abs(kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR))
-                else: 
-                    return np.array(obs) * scaling_factor # Eq. 2
+                adj_scaling_factor = cls.get_adjusted_scaling_factor(
+                    np.nanmean(simp) / np.nanmean(simh), 
+                    kwargs.get('max_scaling_factor', cls.MAX_SCALING_FACTOR)
+                )
+                return np.array(obs) * adj_scaling_factor # Eq. 2
             else: raise ValueError(f'{kind} not implemented! Use "+" or "*" instead.')
 
 
@@ -689,16 +689,10 @@ def get_inverse_of_cdf(base_cdf, insert_cdf, xbins) -> np.array:
         return np.interp(insert_cdf, base_cdf, xbins)
 
     @staticmethod
-    def load_data(
-        obs_fpath: str,
-        simh_fpath: str,
-        simp_fpath: str,
-        use_cftime: bool=False,
-        chunks=None
-    ) -> (xr.core.dataarray.Dataset, xr.core.dataarray.Dataset, xr.core.dataarray.Dataset):
-        '''Load and return loaded netcdf datasets'''
-        obs = xr.open_dataset(obs_fpath, use_cftime=use_cftime, chunks=chunks)
-        simh = xr.open_dataset(simh_fpath, use_cftime=use_cftime, chunks=chunks)
-        simp = xr.open_dataset(simp_fpath, use_cftime=use_cftime, chunks=chunks)
-
-        return obs, simh, simp
+    def get_adjusted_scaling_factor(factor: float, max_scaling_factor: float) -> float:
+        if factor > 0 and factor > abs(max_scaling_factor):
+            return abs(max_scaling_factor)
+        elif factor < 0 and factor < -abs(max_scaling_factor):
+            return -abs(max_scaling_factor)
+        else: 
+            return factor
diff --git a/examples/examples.ipynb b/examples/examples.ipynb
@@ -716,7 +716,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3.7.3 64-bit",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -730,7 +730,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.3"
+   "version": "3.9.13"
   },
   "vscode": {
    "interpreter": {
diff --git a/setup.py b/setup.py
@@ -17,7 +17,7 @@
 
 # What packages are required for this module to be executed?
 REQUIRED = [
-   'xarray', 'numpy', 'tqdm', 'netCDF4' # <- always conflicts on install with tqdm on test.pypi..
+   'xarray>=2022.11.0','netCDF4>=1.6.1', 'numpy', 'tqdm', 
 ]
 
 # What packages are optional?

Original file line number	Diff line number	Diff line change
`@@ -17,7 +17,7 @@`
`17`	`17`
`18`	`18`	`# What packages are required for this module to be executed?`
`19`	`19`	`REQUIRED = [`
`20`		`- 'xarray', 'numpy', 'tqdm', 'netCDF4' # <- always conflicts on install with tqdm on test.pypi..`
	`20`	`+ 'xarray>=2022.11.0','netCDF4>=1.6.1', 'numpy', 'tqdm',`
`21`	`21`	`]`
`22`	`22`
`23`	`23`	`# What packages are optional?`