DHS_MA.Rmd

% Darwin Harbour Sediment Sampling Design - Middle Arm
% Murray Logan
% 22-03-2019

<top>
 
 
Synopsis
==========

The purpose of this work was to inform the spatial component of the
Darwin Harbour Sediment Monitoring sampling design.  In particular, to
determine the 'best' location for 100 sediment monitoring sites in the
Middle Arm (including West Arm) sections of Darwin Harbour.

To help inform this process, there were two broad sources of data
available:

1. Hydrodynamic modelling of the entire Darwin Harbour.  These data
(availed via geoTiffs), comprise broadly tidal bed shear and velocity
as well as wave driven forces at 10m resolution and will be used to
isolate areas likely to experience deposition (rather than erosion) of
sediments.

2. Munksgaard sediment chemical survey from 2012 provided by Lynda
Radke (as an Excel workbook).  These data provided background
information that was used to predict the full spatial distribution
of a range of sediment chemicals.  These spatial distributions 
then helped tune and evaluate a range of sampling designs.

The basic procedure involved the following steps:

1. Read in a process the data sources.
2. Fit a barrier spatial model to each of the Munksgaard sediment
chemicals and predict/develop spatial layers for the Middle Arm section.
3. Develop masks out of the hydrodynamic model data and use them to
exclude areas of likely erosion from the chemical spatial layers.
4. Use spatial layers representing shipping channels, ports and other
exclusion zones to establish additional masks to apply alongside
hydrodynamic modelling masks to further restrict the sampling domains
and prevent sampling configurations containing sites in the exclusion
zones.
5. Explore three different sample generation routines for a range of
sample sizes to establish an optimal sampling design.  The five
routines will be:
   a) Using the masked chemical spatial layers to inform Conditioned Latent Hypercube Sampling - this will 
   generate samples of nominated sizes that are located in a manner that most represents the underlying patterns
   in the chemical spatial layers.
   b) Completely random sampling - this will select the nominated number of samples from within the masked area
   and is completely naive to any underlying spatial patterns (and hence is only likely to be representative of the underlying patterns when
   the number of samples is large).
   c) A regular sampling grid - this will select approximately the nominated number of samples configured in a 
   regular grid within the masked area.  Like the completely random sampling, the regular sampling grid is completely
   naive to the underlying spatial patterns, yet it does guarantee a more even spatial coverage.
   d) A spatially balanced design - this will yield a spatially balanced design in which sampling sites are spread out throughout the 
   spatial domain.
   e) A high dimensional spatially balanced design - this will yield a design in which sampling sites are spread in multiple 
   dimensions (spatial and according to patterns in the underlying chemical distributions).
   
In addition to the 100 long-term monitoring sites, there are 20
designated sites.  These sites are to be sampled more regularly and are
for the purpose of compliance monitoring specific areas.  Although
these sites are additional to the long-term samples, they do form part
of the overall design and thus need to be considered when considering
candidate configurations.

All code for this project is available on github <https://github.com/AIMS/darwin-harbour-sampling.git>

Data processing
=================

## GIS data

A shapefile of Darwin Harbour (see Figure @fig:Map1) was utilized in
order to define the initial sampling domain(s).  This project focused
on the Middle Arm.  For the purpose of the sediment
monitoring program, Middle Arm was defined as Middle Arm, West Arm and Blackmore River. 

![Map of Darwin Harbour highlighting the Middle Arm sections.  Middle Arm is not part of this analysis](output/Map_MA1.pdf){#fig:Map1 width=70%}

<!-- the above figure is generated in DHS_readData_MA.R -->  

## Munksgaard 2012 chemical sediment data 

Munksgaard 2012 chemical sediment data were provided by Lynda Radke in
the form of an Excel workbook.  These data were consolidated together
into a single flat csv text file to support analyses.  The spatial
configuration of the Munksgaard sediment sampling sites are
illustrated in Figure @fig:Map2 (circular points).  Primarily only
the sites within the Middle Arm will be used to inform
the current exploration of future sampling
designs.  

## Designated sampling sites

In addition to the long-term monitoring sites, a number of more
regularly sampled designated (priority) sites were provided by Lynda Radke in the
form of an Excel workbook.  These sites formed additional sites, yet
they needed to be considered in the formulation on site configurations
and are illustrated in Figure @fig:Map2 as red points.  Note: none of
the designated sites fall in the defined Middle Arm
region and thus they were not considered further.

![Map of Darwin Harbour indicating the spatial configuration of Munksgaard 2012 sediment monitoring sites (dots).  Solid black dots signify Munksgaard 2012 sediment sampling sites within the Middle Arm focal area and red points represent designated (priority) sites.](output/Map2_MA.pdf){#fig:Map2 width=80%}

<!-- the above figure is generated in DHS_readData.R -->  

\newpage


## Hydrodynamic and wave modelling 
 
When designing a sediment monitoring program, it is important to
consider the erosive, transportation and deposition forces operating
on the seabed.  Ideally all, if not most, of the sampling sites should
be located in areas that are more likely to experience sediment
deposition than erosion or transportation.

Various hydrodynamic and wave modelling products were made available
by Dr. Richard Brinkman (AIMS) that provide estimates of erosive,
transportation and deposition likelihoods and include:

- current velocity (50th and 75 percentiles - these were calculated
  from a 30 day characteristic spring-neap tidal cycle.  Higher
  velocity equates to higher likelihood of sediment transport and
  erosion and thus lower probability of sediment deposition.
- seabed shear stress (50th and 75 percentiles - these were derived
  from the current velocity and are a measure of the shear forces
  likely to be experienced on the sea bed due to tidal movement.
  Higher bed shear stress equates to higher likelihood of sediment
  transport and erosion and thus lower probability of sediment
  deposition.
- wave derived orbital velocity magnitude at seabed - a shallow water
  wave model was applied using a 10 m/s wind forcing from a range of
  directions (0, 90, 140, 270 and 315 degrees) to simulate the likely
  orbital velocity experienced by the seabed. Higher orbital velocity
  equates to a higher likelihood of sediment transport and erosion and
  thus lower probability of sediment deposition.
- wave derived seabed shear stress - the same shallow water wave model
  was expressed in terms of wave bed shear stress.  Higher shear stress
  equates to higher likelihood of sediment transport and erosion and
  thus lower probability of sediment deposition.

Visual representations of each of the above products within the Middle Arm focal areas are depicted in Figures
@fig:hydro_middle_arm -- @fig:waves_middle_arm.

![Middle Arm extract of four hydrodynamic modeling tidal products.](output/hydro_middle_arm.png){#fig:hydro_middle_arm width=100%}

![Middle Arm extract of five hydrodynamic modeling wave products.  The different products represent different wind angles (0, 90, 140, 270 and 315 degrees).](output/waves_ubot_middle_arm.png){#fig:waves_middle_arm width=100%}

<!-- the above figures are generated in DHS_readData_MA.R -->

### Masks
 
The seabed shear stress products provided spatial modelling of the
expected forces acting on the sea bed during a typical spring-neap
tidal cycle.  In so doing, they provided proxies for the likelihood
for sediment erosion, transportation and deposition.  These products
were used to create masks that exclude areas of high erosive or
transportation potential.

To establish a mask (to focus only on deposition areas), thresholds
need to be established for what represent the critical values below
which deposition is likely.  Figures @fig:Hjulstroms and
@fig:shearStress provide these for a range of sediment particle sizes
for water velocity and sea bed stress respectively.

![Hjulstrom Curve linking sediment size and the velocity needed to erode, transport or deposit (from <https://www.thegeoroom.co.zw/hydrology/hjulstrom-curve.php>).](data/Provided/hjulstrom.jpg){#fig:Hjulstroms width=100%}

<!-- the above figure came from Hemerson -->

![Critical bed shear stress for erosion and particle settling velocity of a range of particle sizes from @Thomsen-2002-2002](data/Provided/Critical-bed-shear-stress-C-for-erosion-of-continental-margin-particles-showing-the_W640.jpg){#fig:shearStress width=60%}


#### Middle Arm sampling domain

Figure @fig:particle_sizes illustrates the distribution of percentage
abundance of a range of particle sizes class categories from the
Munksgaard sampling program.  On average, particles in the size
classes 4-62$\mu$m (Silt) and 62-250$\mu$m (fine sand) made up 25.4
and 43.4 percent of the sediment samples respectively.  Hence, it was
important that future Middle Arm sampling designs focus on sites that
will ensure deposition of particles of these sizes.  According to
Figures @fig:Hjulstroms and @fig:shearStress, particles at or above
29$\mu$m (middle of the silt range) correspond to critical deposition
values of approximately 0.2 m/s velocity and 0.1 seabed shear stress.

The hydrodynamic modeling seabed shear stress products represent the
50th and 75th percentile values.  In the case of the 50th percentile,
this means that 50 percent of the time, seabed shear stress was
estimated to be above this value and 50 percent of values where below.
If a mask was based on setting the threshold to correspond to the 50th
percentile, then the masked layer represents areas where sediment
deposition is likely to occur more regularly than erosion and
transport.

Nevertheless, it was important to also establish the distribution of
seabed shear stress across the seabed in order to better understand
the distribution of values.  Figures @fig:hist_hydro_middle_arm and
@fig:hist_waves_middle_arm illustrate the frequency of seabed shear stresses
	(and velocity) for the Middle Arm area.  The 50th percentiles for seabed
shear stress appear to drop off the peak at around 0.2 m/s, hence this
appears to be a sensible threshold value.

![The percentage abundance of different sediment grain sizes observed across the Munksgaard Middle Arm sediment sampling program.](output/particle_sizes_MA.pdf){#fig:particle_sizes_MA width=80%}

<!-- the above figure is generated in DHS_readData.R -->

![Frequency distributions of hydrodynamic products in the Middle Arm area.](output/hist_hydro_middle_arm.pdf){#fig:hist_hydro_middle_arm width=80%}

![Frequency distributions of wave modelling seabed shear stress products in the Middle Arm area.](output/hist_waves_middle_arm.pdf){#fig:hist_waves_middle_arm width=80%}

<!-- the above two figures were generated in DHS_thresholdLayers_MA.R -->

Both current velocity and seabed shear stress were derived from the
same model and were thus closely correlated.  Similarly, both wave
derived orbital velocity and wave derived seabed shear stress were
correlated.  In each case, the shear stress proxies are intended to be
expressions of the forces that are likely to be acting on the seabed
(from tides and waves respectively).  Hence only the seabed shear
stress versions of the hydrodynamic and wave models were used as the
proxy estimates of tidal and wave forces.

Figure @fig:masks_MA illustrates the resulting masks for the Middle
Arm area using a threshold of 0.2 m/s (or 0.3 m/s for the 75th
percentiles).  Each of the wave derived seabed shear masks were added
together and then joined with the 50 percentile hydrodynamic seabed
shear mask. The resulting mask is illustrated in Figure @fig:mask_middle_arm.
This mask represents (in blue shading) the areas most
likely to experience more deposition than erosion and thus suitable
areas for sediment monitoring.  When we compared this mask to the
spatial extent of the Munksgaard 2012 sediment monitoring site
configuration, it was evident that while they broadly overlap, there
are potentially suitable areas that were not sampled in the Munksgaard
monitoring program.  Moreover, there may also have been areas that were 
sampled that may not have been net deposition areas.

 
<!--
Alarmingly, 
most of the area is masked out on the basis of hydrodynamically derived seabed shear stress (bedshear* and velocity*), and what remains is virtually all then
masked out by the wave model derived seabed shear stresses.  Consultation about the appropriate threshold values will be required.
-->

![Individual Middle Arm masks from various hydrodynamic (bedShear_*) and wave (beagle_*) models categorised using a threshold values of 0.2 for all other than the 75th percentile products with use a threshold of 0.3 m/s. The blue areas indicate areas of predicted relatively low erosion and transport potential and thus good candidate areas for sample site allocation. The black dots illustrate the position of Munksgaard 2012 sediment sampling sites.](output/masks_MA.png){#fig:masks_MA width=90%}

![Middle Arm mask derived from the combination of 50th percentile seabed shear stress and each of the wave derived seabed shear stresses. The blue areas indicate areas of predicted relatively low erosion and transport potential and thus good candidate areas for sample site allocation. The black dots illustrate the position of Munksgaard 2012 sediment sampling sites.](output/masks_MA1.png){#fig:mask_middle_arm width=80%}


## Exclusion zone masks

In addition to using hydrodynamic modelling masks to exclude areas
that might be considered unsuitable for sediment monitoring, there are
areas throughout Darwin Harbour that are not practical or appropriate
for monitoring.  These include the areas of a high probability of hard 
seabed [@Siwabessy-2019], shipping channel and port exclusion
zones as well as other exclusions zones associated with major
infrastructure of military bases.  Hence additional masks were
developed from spatial
layers provided by Lynda Radke.

Figure @fig:Map3 illustrates the sampling masks
associated with Middle Arm.

![Middle Arm mask derived from numerous exclusion zone shapefiles.  The blue areas represent the spatial domain avaialble for sampling.  The black dots illustrate the position of Munksgaard 2012 sediment sampling sites.  Red dots represent the designated sites.  Open black and red circles represent samples that are outside of the Middle Arm area.](output/Map3_MA.pdf){#fig:Map3 width=70%}


\clearpage

Spatial Model fitting
===============

A target or set of targets is required against which the effectiveness
and accuracy of candidate sampling designs can be tuned or gauged.
This target should represent the full underlying conditions and in
essence represent a saturated sampling design - a sampling design in
which all possible locations/sites are sampled.  Whilst this is
logistically not possible, given an adequate set of baseline data,
statistical spatial models can be generated to estimate the underlying
patterns.  The resulting predicted layers can be used to represent the
targets.

Spatial models are complex statistical models that attempt to recreate
the full feature space from which a limited set of samples were
collected.  In so doing, they attempt to incorporate two-dimensional
patterns and correlations to allow prediction to areas in between
samples.

In the simplest cases, simple surfaces can be derived by linear
interpolation between all the sampling points.  However, when samples
are distributed unevenly, there are strong spatial dependencies and/or
the bounding domain is not a simple rectangle, more complex
methodologies are required.

Ecological and environmental processes are often correlated through
space.  To account for these spatial dependencies within a spatial
model it is useful to incorporate a Gaussian Random Field (GRF) which
specifies a spatially dependent covariance structure in which
locations that are closer to one another in space will in general be
more highly correlated to one another than locations that are further
apart.

Large or complex spatial models soon become intractable using a
traditional frequentist modelling framework.  By contrast, equivalent
Bayesian models are typically very computationally expensive.
Integrated Nested Laplace Approximation [INLA: @Rue-2009-319] is a
Bayesian approximation framework that offers the philosophical
advantages of a full Bayesian approach, yet with the computational
efficiency of frequentist approaches.

We can consider a GRF to be stationary if the degree of correlation
between two points is dependent only on the distance between the
points, or non-stationary if we allow the correlation function to vary
over the landscape.  An extreme form of non-stationary model occurs
when there are physical barriers that disrupt or block the flow of
contagious processes.  In such cases, just because two locations are
in close proximity, does not necessarily mean that they will be highly
correlated.  Consider a simple example of the diffusion of a dye in
water throughout a tank.  The dye will spread out from the source and
gradually disperse throughout the tank.  Consequently, the correlation
between the concentration of dye at any two locations during
dispersion will be dependent on the distance between the two
locations.  If however, the tank had a partial barrier that restricted
the flow of dye molecules, then two locations either side of the
barrier might have very different dye concentrations despite being in
close proximity.  Barrier models are able to account for these
obstructions.

$$
\begin{align}
y_i &\sim{} \Gamma(\mu_i, \alpha)\\
log(\mu_i) &= \beta_0 + u(s_i) + \varepsilon_i\\[1em]
\beta_0 &\sim{} N(0,1000)\\
u(s_i) &\sim{} GRF(r, \sigma_u)\\ 
\varepsilon_i &\sim{} N(0, \sigma^2)\\[1em]
\pi(\sigma_e) &\sim{} \lambda_e e^{-\lambda_e \sigma_e}\\
\pi(\sigma_u) &\sim{} \lambda_0 e^{-\lambda_0 \sigma_u}\\
\pi\left(\frac{1}{r}\right) & \sim{} \lambda_1 e^{-\lambda_1 \frac{1}{r}}
\end{align}
$$

where $y_i$ is the $ith$ observation of the target chemical variable,
$\varepsilon_i$ is the independent and individually variable random
effect used to model the very short range erratic dependencies and
$u(s_i)$ is the Gaussian Random Field and is used to model the
long-range structural (autocorrelation) dependencies. A diffuse prior
is applied to the intercept ($\beta_0$) and $\varepsilon_i$ a vector
of independent Gaussians.  The Matern family spatial random effect
($u(s_i)$) and its covariance is defined by two parameters: the range
($r$: represents the length scale of the spatial dependencies) and
standard deviation ($\simga_u$: representing the ratio of spatial to
independent noise).  The smaller the range, the lower the correlation
between two proximal locations.

In a boundary model, two different range parameters ($r$) are applied.
One of the range parameters is applied to the boundary area (in this
case land) and the other to the focal area (in this case Harbour).  By
setting the boundary range smaller (and close to zero) than the focal
area range, the dependency structure across boundaries is disrupted
and approach zero.

Hyper-parameter priors for the random effects ($\sigma_e$, $\sigma_u$)
as well as $r$ are defined in terms of a $\lambda$ parameter which is
determined by the scale of the response (on the log scale in this
case).  Both $\lambda_e$ and $\lambda_0$ were set to the median value
of the target response (on a log scale).  and the focal area $r$ was
set to half the width of the spatial domain after [@Bakka-2016-1608].

The random field was approximated via a Constrained Refined Delaunay
Triangulation (CRDT) mesh.  The mesh comprised of an inner region that
surrounded all the Munksgaard sediment monitoring sites as well as an
outer mesh provides a buffer to help ensure estimates close to the
inner mesh boundary are robust.  In doing so, the maximum permissible
triangle edge length for the inner and outer mesh regions was set to
0.01 and 0.04 (units of latlong projection) respectively.  The smaller
the values, the finer the mesh.  This mesh was then projected onto the
location of the observed sample location.

The above models were fit for each of the sediment chemical recorded
in the Munksgaard 2012 sediment sampling program.  Figure
@fig:inla_model_middle.arm_Zn provides an example of the major elements
of one of the chemical (Zinc) spatial models.  Equivalent figures for
the other chemicals are presented in Appendix A.
                                       
Figure {@fig:inla_model_middle.arm_Zn}a depicts the random field mesh
with the Muksgaard 2012 sampling sites and Harbour boundary overlayed.
Figure {@fig:inla_model_middle.arm_Zn}b illustrates the boundary used
for the barrier in the spatial model and Figure
{@fig:inla_model_middle.arm_Zn}c illustrates the final predicted spatial
layer (with original sample data overlayed) for Zinc within the Middle
Arm area.  For comparison, both predicted (model output) and observed
(Munksgaard samples) are presented on the same colour scale.  Figure
{@fig:inla_model_middle.arm_Zn}c illustrates the final predicted spatial
layer for Zinc within the Middle Arm area and where the colour scale is
based purely on the range of predicted values.

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Zinc. The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Zn.png){#fig:inla_model_middle.arm_Zn width=80%}

Although the shrinkage in models is a little high (there is a tendency
for the magnitude of change over space to be dampened), generally the
models do a very good job of depicting the relative patterns of change
in space.  For this application, the absolute scale of the changes in
patterns are not important (only the relative patterns), since in
order to gauge the accuracy of any candidate sampling designs, it is
necessary to standardize the patterns anyway.  Hence it is more
important that the model depict the general patterns in the observed
data than the exact values of the observed data.


\clearpage

                                       Sampling
=============

Ideally, a good sampling design should comprise a configuration of
sites that collectively represent the broader area as efficiently as
possible.  In this context, efficiency is a compromise between
complete representation (resulting from saturating the spatial domain
with sites) and minimizing sampling effort.

There are numerous approaches for generating candidate sampling
configurations.  Irrespective of the approach, there must be a metric
by which the suitability of the configuration can be gauged.  If we
assume that full saturation must provide maximum representation, then
all other configurations can be gauged relative to the full
saturation.  Hence a measure of how well a configuration is likely to
represent the full spatial domain is the difference between some
empirical property calculated from the candidate configuration and
full saturation.  For example, we could calculate the difference
between the estimated mean Magnesium from a candidate configuration
and the equivalent mean calculated from the full saturation.  The
magnitude of this difference is thus a measure of the inaccuracy and
thus suitability of the candidate configuration.

In the current application, there are numerous sediment chemicals that
can be used to describe the underlying conditions within the spatial
domain.  Consequently, the metric needs to incorporate the differences
across each of the chemicals.  The following metric will be adopted.

$$
Mean Error = \frac{1}{n} \sum_{i=1:n}{\frac{abs(\bar{c_i} - \bar{s_i})}{\bar{r_i}}}
$$

$$
Max Error = \max_{i=1:n}{\frac{abs(\bar{c_i} - \bar{s_i})}{\bar{r_i}}}
$$

$$
Min Error = \min_{i=1:n}{\frac{abs(\bar{c_i} - \bar{s_i})}{\bar{r_i}}}
$$

where $\bar{c}$ and $\bar{s}$ are the estimated domain means of the
$ith$ chemical (out of $n$) from the candidate and full saturation
configurations respectively and the normalizing constant ($r_i$) is
the difference between maximum and minimum predicted values for the
$ith$ chemical.
There are three metrics presented so capture three broad
characteristics of the 'accuracy' of the candidate sampling designs:

- MeanError - this is a measure of the average deviation between the estimated zone mean (from the candidate model) and target mean from across all chemical species.
- MaxError - this is a measure of the maximum deviation between the estimated zone mean (from the candidate model) and the target mean from across all chemical species.  
  As a maximum, it can be used to compare the worst performing aspects of each candidate and thus acts as a worst case scenario.
- MinError - this is a measure of the minimum deviation between the estimated zone mean (from the candidate model) and the target mean from across all chemical species.
  As a minimum, it can be used to compare the best performing aspects of each candidate and thus acts as a best case scenario.

#### Random sampling

The simplest approach to generating a spatial sampling design is to
repeatedly simulate sampling from the spatial domain with a range of
sample sizes and use the above metric to help determine the optimum
sampling size and configuration.  Given a sufficiently large sample
size, random sampling should provide an unbiased and representative
sampling design.  However, it is highly likely that at low sample
sizes this approach will not yield highly representative samples (high
$Error$).  Yet increasing the sample size should result (on average)
in lower $Error$ (= greater accuracy).  To counter the natural
stochasticity associated with simulation, we repeat each sample size
five times.

#### Sampling on a regular grid

In the simple random sampling approach above, each simulated random
draw was independent of all other draws.  As a result, all
configurations are possible - even those in which multiple samples are
aggregated together in close proximity.  In the absence of any prior
knowledge about the underlying environmental conditions and
heterogeneity, an even and regular spread of samples can ensure that
the sampling design does offer general representativeness.  Grids of
increasing sample size offer progressively finer granularity and thus
the ability to detect smaller scale perturbations in space.

#### Conditioned Latin Hypercube Sampling

In drawing both random samples and regular grid samples, the process
is completely naive to the underlying environmental conditions.
Consequently, they were only likely to be representative once a large
number of samples were been collected. Conditional Latin Hypercube
Sampling (cLHS) is a robust and efficient statistical sampling
procedure that generates samples based on the variability within a
multidimensional covariate feature space [@Minasny-2006-1378].  Since
the sampling process is supervised by the underlying conditions (or
proxies thereof), for a given sample size, the resulting candidate
sampling configurations are more likely to yield samples that are
representative of the complex underlying conditions.

In introducing the procedure, @Minasny-2006-1378 provided a real world
example of a soil mapping project in which relatively small samples
sizes generated by cLHS closely represented the original distribution
of a set of environmental covariates.  Furthermore, @Minasny-2006-153
found cLHS superior (in terms of accuracy) to a range of alternative
sampling procedures including random and stratified random sampling.

#### Spatially balanced design

Whilst regular grid sampling designs do space samples out throughout
the spatial domain, they do have the potential to introduce biases due
to any underlying systematic processes that might align with the
design (albeit unintentionally) and do not necessarily provide good
representation.  On the other hand, random sampling designs offer some
degree of protection from those biases.  The shear nature of
randomization means that it is possible that some sampling locations
can be clustered in very close proximity.  When this is the case, not
only does it waste valuable sampling effort (as there is not need to
provide multiple estimates of the same location), it introduces
another bias.  Any resulting estimates from the sampling design will
be biased towards the clustered sample conditions as those conditioned
are effectively weighted higher by virtue of the greater sampling
effort.

The ideal design is to be able to have a random configuration that
still prevents the clustering of samples.  In affect, a way to
generate random locations whose probability of selection is
proportional to the distance from all already selected sites.  This is
the inspiration behind **Spatially balanced designs**.

There are numerous spatially balanced designed techniques. Some such
as Generalized Random-Tessellation Stratified (GRTS), map the full two
dimensional spatial domain into a single dimension (in a way that
preserves the spatial ordering) before applying a systematic $\pi$ps
sampling technique to ensure a balanced distribution of samples
throughout the spatial domain. @Grafstrom-2012 introduced an
alternative technique in which unequal inclusion probabilities are
generated via a pivotal method.

A further (and perhaps alternative) ideal is to be able to have a
balanced design not only based on spatial proximity, but also on the
basis of dissimilarity of underlying conditions.  Spatial areas that
are homogeneous with respect to some underlying sampling conditions
require fewer sampling locations to characterise the underlying
patterns than areas that are relatively heterogeneous.  The spatially
balanced design via pivotal method allows any number of dimensions to
determine the inclusion probabilities.

A key determinant in selecting which of the above techniques to adopt
is based on an evaluation of the purpose of the sampling design.  If
for example, the purpose is to characterise the overall condition
mean, then a 2D spatially balanced design is arguably most appropriate
as it should represent the general underlying conditions.  If on the
other hand, the purpose is to be able to model the underlying patterns
and understand where any changes in these patterns occur, then
arguably a design that has been optimised around the underlying
conditions (such as a n-dimensional spatially balanced design or
conditioned latin hypercube sampling technique) is arguably more
appropriate.

## Middle Arm

For a range of sample sizes (5, 10, 20, 30, 40, 50, 60, 70, 80, 90,
100, 120, 200, 1000), for Middle Arm, each of the above sampling
approaches was repeated five times.  For each run, the $Error$ metric
was calculated.  The results are presented in Figures
@fig:samplingEffort (mean error) and @fig:samplingEffortMax (max
error).  As expected, as the sample sizes increase, the error
declines.  The regular sampling design performs worst.  The
random sampling is better than the regular grid sampling.  Whilst
clusters of samples might be appropriate for representing conditions
when the conditions cluster correspondingly, totally random samples
are highly unlikely to resemble the correct cluster configuration.

The regular grid sampling suffered from a lack of available locations from which sample (once masking from hydrodynamic modelling and shipping channels etc were taken into account).
Even a sample size of 10 was too many for a grid.  Hence, although a range of sample sizes were requested, in reality only a small number were actually collected.                                       

The non uniform distribution of cLHS on the other hand was directly
due to the clustering patterns in the underlying conditions and thus
it was not surprising that this technique had the least error.

Interestingly, the reduction in error after a sample size of 50 is
relatively mild (notwithstanding that the figure is presented on a
log-y scale).  For each of sample sizes 50, 60, 70, 80 and 100, the
best (based on lowest error) cLHS configuration is presented in
Figure.  Comma delimited text files are also available with the
Latitude and Longitudes of these coordinates.
 
![Comparison of the mean Error conditional on sample size and sampling method for the Middle Arm](output/SamplingEffort_MA.pdf){#fig:samplingEffort width=80%}

![Comparison of the maximum Error conditional on sample size and sampling method for the Middle Arm](output/SamplingEffortMax_MA.pdf){#fig:samplingEffortMax width=80%}

![Comparison of the minimum Error conditional on sample size and sampling method for the Middle Arm](output/SamplingEffortMin_MA.pdf){#fig:samplingEffortMin width=80%}

On the basis of Figures @fig:samplingEffort, @fig:samplingEffortMax
and @fig:samplingEffortMin we could conclude that a sample size of 100
within Middle Arm is a sound choice, although it is likely that as few
as 50 could still potentially yield similar overall patterns.  The
sample size of 100 also accommodates a buffer against sample
loss. Nevertheless, this entire simulation process is contingent on a
number of unverifiable assumptions:

1. that the Munksgaard 2012 sediment sampling data are representative of the typical conditions and spatial patterns.
2. all Munksgaard 2012 sediment chemicals are equally useful and informative.
3. the INLA models are able to fully represent the true underlying conditions.
4. the costs and logistics of sampling are equal irrespective of location.

The conditioned latin hypercube sampling technique consistently
outperforms the other techniques.  Interestingly, there was very
little difference between the 2D and nD Spatially balanced designs.
This suggests that either the sediment chemicals were relatively
homogeneous over space or else patterns in one chemical species was
countered by patterns in another chemical species.

The conditioned latin hypercube sampling technique appeared to be able
to tune its design on the underlying sediment chemical patterns better
than the spatially balanced designs.  Thus if the above assumptions
are reasonable and the main intention of the sampling was to be able
to describe the patterns in the sediment chemicals, then the sampling
design derived from this technique seems most appropriate.

If however, the purpose of the sampling design was be provide an
unbiased representative sample of the general conditions across the
spatial domain, then it could be argued that the 2D spatially balanced
design was most appropriate - particularly if there was any doubt in
the above assumptions.

![Sampling configurations associated with the lowest mean Error for each sample size for cLHS for the Middle Arm](output/SamplingDesign_clhs_MA.pdf){#fig:SamplingDesign_clhs width=100%}

![Sampling configurations associated with the lowest mean Error for each sample size for Regular grid sampling for the Middle Arm](output/SamplingDesign_regular_MA.pdf){#fig:SamplingDesign_regular width=100%}

![Sampling configurations associated with the lowest mean Error for each sample size for nD Spatially balanced sampling for the Middle Arm](output/SamplingDesign_spatiallybalanced2_MA.pdf){#fig:SamplingDesignspatiallybalanced2 width=100%}

![Sampling configurations associated with the lowest mean Error for each sample size for 2D Spatially balanced sampling for the Middle Arm](output/SamplingDesign_spatiallybalanced_MA.pdf){#fig:SamplingDesignspatiallybalanced width=100%}


![Two dimensional spatially balanced sampling configuration for the Middle Arm (100 samples)](output/SamplingDesign_spatiallybalanced2_final_MA.pdf){#fig:SamplingDesign_spatiallybalanced2_final width=100%}

Conclusions
==============

Overall, 2D spatially balanced sampling designs for Middle Arm would seem most appropriate.  These designs are immune to
any uncertainty in previous data and spatial modelling and should
yield well balanced spatial configurations.  A total of 100 samples
collected from Middle Arm was likely to yield
representative samples from which to construct a variety of
spatio-temporal models into the future.


Appendix A
===========

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Vanadium.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_V.png){#fig:inla_model_middle.arm_V width=80%}

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Chromium.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Cr.png){#fig:inla_model_middle.arm_Cr width=80%}

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Cobult.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Co.png){#fig:inla_model_middle.arm_Co width=80%}

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Nickel.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Ni.png){#fig:inla_model_middle.arm_Ni width=80%}

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Copper.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Cu.png){#fig:inla_model_middle.arm_Cu width=80%}

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Zinc.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Zn.png){#fig:inla_model_middle.arm_Zn width=80%}

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Arsenic.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_As.png){#fig:inla_model_middle.arm_As width=80%}

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Selenium.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Se.png){#fig:inla_model_middle.arm_Se width=80%}
  
![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of molybdenum.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Mo.png){#fig:inla_model_middle.arm_Mo width=80%}

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Cadmium.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Cd.png){#fig:inla_model_middle.arm_Cd width=80%}

![Integrated Nested Laplace Approximation (INLA) barrier spatial modelling of Lead.  The diagram illustrates a) the mesh, b) the barrier mask, c) the resulting predicted 2D surface (and observed training data as points) and d) the resulting predicted 2D surface scaled to the range of predictions.](output/inla_model_middle_arm1_Pb.png){#fig:inla_model_middle.arm_Pb width=80%}  
  
This document was produced from markdown using knitr on `r version$version.string` on a `r version$platform` system.

```{r, results='markdown'}
sessionInfo()
```

References
====================