glm

Introduction of the problem

This is an example of how to downscale via generalized linear models (GLM). The example will be done on the NCEP/NCAR Reanalysis 1 dataset, for the Igueldo station (San Sebastian - Spain) of the VALUE station dataset.

First we call the libraries.

library(loadeR)
library(transformeR)
library(downscaleR)

Predictands

With the purpose of running simple examples and illustrating the functionalities of downscaleR, several R-data objects are available in the package. This data is originally loaded with the transformeR package (an R package for climate data access and manipulation) that can be installed also from its (for more information go to the wiki of transformeR). In this example we will downscale precipitation and daily mean temperature for the Igueldo station, which has station ID = "000234". The data record for this stations ranges from 1982-12-01 to 2002-02-28 GMT and only for the winter season.

# precipitation
yp.igueldo <- subsetGrid(VALUE_Iberia_pr,station.id = "000234")
# temperature
yt.igueldo <- subsetGrid(VALUE_Iberia_tas,station.id = "000234")

Predictors

The transformeR package has a pre-loaded version of the NCEP reanalysis of the Iberian peninsula for illustrating purposes. This version contains the humidity and temperature at 850hPa, and the sea level pressure (SLP) at the same time period as the VALUE_Iberia data. The predictors used for downscaling both temperature and precipitation are the same.

x <- makeMultiGrid(NCEP_Iberia_hus850, NCEP_Iberia_ta850, NCEP_Iberia_psl)

The climatology of the NCEP humidity at 850hPa with the Igueldo location are plotted in the next figure

spatialPlot(climatology(subsetDimension(x,dimension = "var",indices = 1)),backdrop.theme = "coastline")
points(yp.igueldo$xyCoords[,1], yp.igueldo$xyCoords[,2],  cex = 2, pch = 17)

Once we have loaded the NCEP variables, we split the data into a train and test set performing a 1 year leave-one-out cross-validation.

xT <- subsetGrid(x, years = 1983:1998)
xt <- subsetGrid(x, years = 1999:2000)
yT.p <- subsetGrid(yp.igueldo,years = 1983:1998)
yt.p <- subsetGrid(yp.igueldo,years = 1999:2000)
yT.t <- subsetGrid(yt.igueldo,years = 1983:1998)
yt.t <- subsetGrid(yt.igueldo,years = 1999:2000)

In the example about the analogs method we worked with standardize raw data. For this reason, to illustrate another way to face the problem we will compress the information by using principal components and retaining those that explain the 95% of the variance. By this way, we eliminate both redundancy and irrelevancy, typically present in atmospheric variables.

# precipitation
gridT.p <- prepare_predictors(xT,yT.p,PCA = list(v.exp = 0.95, which.combine = getVarNames(x)))
gridt.p <- prepare_newdata(xt,gridT.p)
# temperature
gridT.t <- prepare_predictors(xT,yT.t,PCA = list(v.exp = 0.95, which.combine = getVarNames(x)))
gridt.t <- prepare_newdata(xt,gridT.t)

Generalized linear models (GLM)

To downscale we use the functions of the package downscaleR: downscale.train and downscale.predict. Among the available methods we have the GLM. The optional parameters for the GLM are:

-Fitting: A string indicating the types of objective functions and how to fit the linear model. Options are NULL (simple linear regression), L1 (l1 penalty), L2 (l2 penalty), L1L2 (elastic-net), gLASSO (group lasso).

-Family: indicates the family of the generalized linear model. Options are gaussian, binomial, Gamma, inverse.gaussian, poisson, quasi, quasibinomial, quasipoisson. For more information about the available parameters please type help("glm.train") or help("downscale.train").

Better downscaling can be attained if we build a model for the ocurrence of precipitation (0 for no rain and 1 for rain) and another for the amount of precipitation, filtering the days where there was no rain by the parameter filt = ">0". W have considered no rain the days where it rained fewer than 0.1mm. The models built are a logistic regression for the ocurrence of precipitation and a gamma family with link logarithmic for the amount of precipitation.

# precipitation - ocurrence
y.ocu <- convert2bin(yT.p,threshold = 0.1)
gridT.p.ocu <- gridT.p
gridT.p.ocu$y <- y.ocu
model.ocu <- downscale.train(gridT.p.ocu,singlesite = TRUE, method = "GLM", family = "binomial", prediction = TRUE)
pred.ocu  <- downscale.predict(gridt.p, model.ocu)
pred.ocu <- convert2bin(pred.ocu,ref.obs = y.ocu, ref.pred = model.ocu$pred)
# precipitation - amount
model.amo <- downscale.train(gridT.p,singlesite = TRUE, filt = ">0",method = "GLM", family = Gamma(link = "log"), prediction = TRUE)
pred.amo  <- downscale.predict(gridt.p, model.amo)

The complete serie is the product of the two (ocurrence and amount) downscaled series.

# precipitation
pred.p <- yt.p
pred.p$Data <- pred.ocu$Data * pred.amo$Data

In the case of the temperature, a normal linear regression model with gaussian family is built.

# temperature
model.t <- downscale.train(gridT.p,singlesite = TRUE, method = "GLM", family = "gaussian", prediction = TRUE)
pred.t  <- downscale.predict(gridt.p, model.t)

To evaluate the goodness of a model we can build a linear model among the predictions and the observations for the test period. The following code compares the prediction and the observation for the chosen station, Igueldo. The parameter r2 serves us as a metric of the goodness of the model.

# precipitation
df <- cbind.data.frame("predicted" = pred.p$Data,
                       "observed" = yt.p$Data)
plot(df$observed, df$predicted, ylab = "Predicho", xlab = "observado")
grid()
modelo.reg <- lm(predicted ~ observed, data = df)
abline(reg = modelo.reg, col = "red")
title(pred.t$Metadata$name)
mtext(paste("r2 =", round(summary(modelo.reg)$adj.r.squared, 2)))

# temperature
df <- cbind.data.frame("predicted" = pred.t$Data,
                       "observed" = yt.t$Data)
plot(df$observed, df$predicted, ylab = "Predicho", xlab = "observado")
grid()
modelo.reg <- lm(predicted ~ observed, data = df)
abline(reg = modelo.reg, col = "red")
title(pred.t$Metadata$name)
mtext(paste("r2 =", round(summary(modelo.reg)$adj.r.squared, 2)))

downscaleR - Santander MetGroup (Univ. Cantabria - CSIC)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glm

Introduction of the problem

Predictands

Predictors

Generalized linear models (GLM)

Clone this wiki locally