-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDHS.Rmd
922 lines (710 loc) · 60.1 KB
/
DHS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
% Darwin Harbour Sediment Sampling Design
% Murray Logan
% 22-03-2019
<top>
Synopsis
==========
The purpose of this work was to inform the spatial component of the
Darwin Harbour Sediment Monitoring sampling design. In particular, to
determine the 'best' location for 100 sediment monitoring sites in the
East Arm and Outer Harbour sections of Darwin Harbour.
To help inform this process, there were three broad sources of data
available:
1. Hydrodynamic modelling of the entire Darwin Harbour. These data
(availed via geoTiffs), comprise broadly tidal bed shear and velocity
as well as wave driven forces at 10m resolution and will be used to
isolate areas likely to experience deposition (rather than erosion) of
sediments.
2. Munksgaard sediment chemical survey from 2012 provided by Lynda
Radke (as an Excel workbook). These data provided background
information that was used to predict the full spatial distribution
of a range of sediment chemicals. These spatial distributions
then helped tune and evaluate a range of sampling designs.
3. Offset shallow Outer Harbour sediment survey provided by Lynda
Radke (as an Excel workbook). Similar to the Munksgaard, data
these data provided background information for the Outer
Harbour.
The basic procedure involved the following steps:
1. Read in a process the data sources.
2. Fit a barrier spatial model to each of the Munksgaard sediment
chemicals and predict/develop spatial layers for the East Arm section.
3. Fit a barrier spatial model to each of the Offset shallow Outer
Harbour sediment chemicals and predict/develop spatial layers for
the Outer Harbour.
4. Develop masks out of the hydrodynamic model data and use them to
exclude areas of likely erosion from the chemical spatial layers.
5. Use spatial layers representing shipping channels, ports and other
exclusion zones to establish additional masks to apply alongside
hydrodynamic modelling masks to further restrict the sampling domains
and prevent sampling configurations containing sites in the exclusion
zones.
6. Explore three different sample generation routines for a range of
sample sizes to establish an optimal sampling design. The five
routines will be:
a) Using the masked chemical spatial layers to inform Conditioned Latent Hypercube Sampling - this will
generate samples of nominated sizes that are located in a manner that most represents the underlying patterns
in the chemical spatial layers.
b) Completely random sampling - this will select the nominated number of samples from within the masked area
and is completely naive to any underlying spatial patterns (and hence is only likely to be representative of the underlying patterns when
the number of samples is large).
c) A regular sampling grid - this will select approximately the nominated number of samples configured in a
regular grid within the masked area. Like the completely random sampling, the regular sampling grid is completely
naive to the underlying spatial patterns, yet it does guarantee a more even spatial coverage.
d) A spatially balanced design - this will yield a spatially balanced design in which sampling sites are spread out throughout the
spatial domain.
e) A high dimensional spatially balanced design - this will yield a design in which sampling sites are spread in multiple
dimensions (spatial and according to patterns in the underlying chemical distributions).
In addition to the 100 long-term monitoring sites, there are 20
designated sites. These sites are to be sampled more regularly and are
for the purpose of compliance monitoring specific areas. Although
these sites are additional to the long-term samples, they do form part
of the overall design and thus need to be considered when considering
candidate configurations.
All code for this project is available on github <https://github.com/AIMS/darwin-harbour-sampling.git>
Data processing
=================
## GIS data
A shapefile of Darwin Harbour (see Figure @fig:Map1) was utilized in
order to define the initial sampling domain(s). This project focused
on the Outer Harbour and East Arm. For the purpose of the sediment
monitoring program, East Arm was defined as East Arm, Elizabeth River
and a section of the Middle Harbour adjacent the city of Darwin. The
Outer Harbour was defined as Outer Harbour and Shoal Bay.
{#fig:Map1 width=70%}
<!-- the above figure is generated in DHS_readData.R -->
## Munksgaard 2012 chemical sediment data
Munksgaard 2012 chemical sediment data were provided by Lynda Radke in
the form of an Excel workbook. These data were consolidated together
into a single flat csv text file to support analyses. The spatial
configuration of the Munksgaard sediment sampling sites are
illustrated in Figure @fig:Map2 (circular points). Primarily only
the sites within the Outer Harbour and East Arm will be used to inform
the current exploration of future sampling
designs.
Note: while the coverage of East Arm sites was extensive, the Outer
Harbour sites were clustered together in the south east corner of the
Outer Harbour (see Figure @fig:Map2). The use of these Munksgaard
2012 Outer Harbour sediment data to estimate the underlying patterns
throughout the entire Outer Harbour was not appropriate. Any
modelling patterns are only reliable within the spatial bounds of the
available data. Any attempts to extrapolate to a broader area
(e.g. the rest of the Outer Harbour), is not appropriate.
Consequently, and unfortunately, the Munksgaard sediment data were of
little utility for designing a sampling program for the Outer Harbour.
## Offset Outer Harbour sediment monitoring data
Offset Outer Harbour sediment monitoring data were provided by Lynda
Radke in the form of an Excel workbook. These data were consolidated
together into a single flat csv text file to support analyses. The
spatial configuration of the Offset Outer Harbour sediment sampling
sites are illustrated in Figure @fig:Map2 (square ponts). These data
were used to inform the selection of Outer Harbour sites.
## Designated sampling sites
In addition to the long-term monitoring sites, a number of more
regularly sampled designated sites were provided by Lynda Radke in the
form of an Excel workbook. These sites formed additional sites, yet
they needed to be considered in the formulation on site configurations
and are illustrated in Figure @fig:Map2 as red points. Note: none of
the designated sites fall in the defined Outer Harbour and East Arms
regions and thus they were not considered further.
{#fig:Map2 width=80%}
<!-- the above figure is generated in DHS_readData.R -->
\newpage
## Hydrodynamic and wave modelling
When designing a sediment monitoring program, it is important to
consider the erosive, transportation and deposition forces operating
on the seabed. Ideally all, if not most, of the sampling sites should
be located in areas that are more likely to experience sediment
deposition than erosion or transportation.
Various hydrodynamic and wave modelling products were made available
by Dr. Richard Brinkman (AIMS) that provide estimates of erosive,
transportation and deposition likelihoods and include:
- current velocity (50th and 75 percentiles - these were calculated
from a 30 day characteristic spring-neap tidal cycle. Higher
velocity equates to higher likelihood of sediment transport and
erosion and thus lower probability of sediment deposition.
- seabed shear stress (50th and 75 percentiles - these were derived
from the current velocity and are a measure of the shear forces
likely to be experienced on the sea bed due to tidal movement.
Higher bed shear stress equates to higher likelihood of sediment
transport and erosion and thus lower probability of sediment
deposition.
- wave derived orbital velocity magnitude at seabed - a shallow water
wave model was applied using a 10 m/s wind forcing from a range of
directions (0, 90, 140, 270 and 315 degrees) to simulate the likely
orbital velocity experienced by the seabed. Higher orbital velocity
equates to a higher likelihood of sediment transport and erosion and
thus lower probability of sediment deposition.
- wave derived seabed shear stress - the same shallow water wave model
was expressed in terms of wave bed shear stress. Higher shear stress
equates to higher likelihood of sediment transport and erosion and
thus lower probability of sediment deposition.
Visual representations of each of the above products within the Outer
Harbour and East Arm focal areas are depicted in Figures
@fig:hydro_outer -- @fig:waves_east.
{#fig:hydro_outer width=100%}
{#fig:hydro_east width=100%}
{#fig:waves_outer width=100%}
{#fig:waves_east width=100%}
<!-- the above figures are generated in DHS_readData.R -->
### Masks
The seabed shear stress products provided spatial modelling of the
expected forces acting on the sea bed during a typical spring-neap
tidal cycle. In so doing, they provided proxies for the likelihood
for sediment erosion, transportation and deposition. These products
were used to create masks that exclude areas of high erosive or
transportation potential.
To establish a mask (to focus only on deposition areas), thresholds
need to be established for what represent the critical values below
which deposition is likely. Figures @fig:Hjulstroms and
@fig:shearStress provide these for a range of sediment particle sizes
for water velocity and sea bed stress respectively.
{#fig:Hjulstroms width=100%}
<!-- the above figure came from Hemerson -->
{#fig:shearStress width=60%}
#### East Arm sampling domain
Figure @fig:particle_sizes illustrates the distribution of percentage
abundance of a range of particle sizes class categories from the
Munksgaard sampling program. On average, particles in the size
classes 4-62$\mu$m (Silt) and 62-250$\mu$m (fine sand) made up 25.4
and 43.4 percent of the sediment samples respectively. Hence, it was
important that future East Arm sampling designs focus on sites that
will ensure deposition of particles of these sizes. According to
Figures @fig:Hjulstroms and @fig:shearStress, particles at or above
29$\mu$m (middle of the silt range) correspond to critical deposition
values of approximately 0.2 m/s velocity and 0.1 seabed shear stress.
The hydrodynamic modeling seabed shear stress products represent the
50th and 75th percentile values. In the case of the 50th percentile,
this means that 50 percent of the time, seabed shear stress was
estimated to be above this value and 50 percent of values where below.
If a mask was based on setting the threshold to correspond to the 50th
percentile, then the masked layer represents areas where sediment
deposition is likely to occur more regularly than erosion and
transport.
Nevertheless, it was important to also establish the distribution of
seabed shear stress across the seabed in order to better understand
the distribution of values. Figures @fig:hist_hydro_east and
@fig:hist_waves_east illustrate the frequency of seabed shear stresses
(and velocity) for the East Arm area. The 50th percentiles for seabed
shear stress appear to drop off the peak at around 0.2 m/s, hence this
appears to be a sensible threshold value.
{#fig:particle_sizes width=80%}
<!-- the above figure is generated in DHS_readData.R -->
{#fig:hist_hydro_east width=80%}
{#fig:hist_waves_east width=80%}
<!-- the above two figures were generated in DHS_thresholdLayers.R -->
Both current velocity and seabed shear stress were derived from the
same model and were thus closely correlated. Similarly, both wave
derived orbital velocity and wave derived seabed shear stress were
correlated. In each case, the shear stress proxies are intended to be
expressions of the forces that are likely to be acting on the seabed
(from tides and waves respectively). Hence only the seabed shear
stress versions of the hydrodynamic and wave models were used as the
proxy estimates of tidal and wave forces.
Figure @fig:masks illustrates the resulting masks for the East Arm
area using a threshold of 0.2 m/s (or 0.3 m/s for the 75th
percentiles). Each of the wave derived seabed shear masks were added
together and then joined with the 50 percentile hydrodynamic seabed
shear mask. The resulting mask is illustrated in Figure
@fig:mask_east. This mask represents (in blue shading) the areas most
likely to experience more deposition than erosion and thus suitable
areas for sediment monitoring. When we compared this mask to the
spatial extent of the Munksgaard 2012 sediment monitoring site
configuration, it was evident that while they broadly overlap, there
was substantially less suitable area in the region adjacent to the
City and substantially more in the East Arm reaches.
<!--
Alarmingly,
most of the area is masked out on the basis of hydrodynamically derived seabed shear stress (bedshear* and velocity*), and what remains is virtually all then
masked out by the wave model derived seabed shear stresses. Consultation about the appropriate threshold values will be required.
-->
{#fig:masks width=90%}
{#fig:mask_east width=80%}
## Outer Harbour sampling domain
Figure @fig:particle_sizes.outer illustrates the distribution of
percentage abundance of a range of particle sizes class categories
from the Offsets Outer Harbour sampling program. The majority of
particles were classified as sand. Based on Figures @fig:Hjulstroms
and @fig:shearStress, it is likely that the majority of the sedient
particles were between 0.1mm and 1mm and that this corresponds to
critical deposition values of approximately 1 m/s to 5 m/s and 0.2-0.3
seabed shear stress.
{#fig:particle_sizes.outer width=80%}
<!-- the above figure is generated in DHS_readData.R -->
The distribution of hydrodynamic seabed shear stress and velocity (Figure @fig:hist_hydro_outer) had an initial valley at 0.3.
The distribution of wave derived seabed shear stress and velocity is
illustrated in Figure @fig:hist_waves_outer, however, and suggests
that the majority of the Outer Harbour was un-affected by high erosive
wave shear forces.
{#fig:hist_hydro_outer width=80%}
{#fig:hist_waves_outer width=80%}
<!-- the above two figures were generated in DHS_thresholdLayers.R -->
{#fig:masks_outer width=90%}
A single combined mask, incorporating the 50th percentile seabed shear
stress and each of the wave derived seabed shear stresses is
illustrated in Figure @fig:mask_outer for Outer Harbour.
{#fig:mask_outer width=80%}
## Exclusion zone masks
In addition to using hydrodynamic modelling masks to exclude areas
that might be considered unsuitable for sediment monitoring, there are
areas throughout Darwin Harbour that are not practical or appropriate
for monitoring. These include the areas of a high probability of hard
seabed [@Siwabessy-2019], shipping channel and port exclusion
zones as well as other exclusions zones associated with major
infrastructure of military bases. Hence additional masks were
developed from spatial
layers provided by Lynda Radke.
Figures @fig:Map3a and @fig:Map3b illustrate the sampling masks
associated with East Arm and Outer Harbour respectively.
{#fig:Map3a width=70%}
{#fig:Map3b width=70%}
\clearpage
Spatial Model fitting
===============
A target or set of targets is required against which the effectiveness
and accuracy of candidate sampling designs can be tuned or gauged.
This target should represent the full underlying conditions and in
essence represent a saturated sampling design - a sampling design in
which all possible locations/sites are sampled. Whilst this is
logistically not possible, given an adequate set of baseline data,
statistical spatial models can be generated to estimate the underlying
patterns. The resulting predicted layers can be used to represent the
targets.
Spatial models are complex statistical models that attempt to recreate
the full feature space from which a limited set of samples were
collected. In so doing, they attempt to incorporate two-dimensional
patterns and correlations to allow prediction to areas in between
samples.
In the simplest cases, simple surfaces can be derived by linear
interpolation between all the sampling points. However, when samples
are distributed unevenly, there are strong spatial dependencies and/or
the bounding domain is not a simple rectangle, more complex
methodologies are required.
Ecological and environmental processes are often correlated through
space. To account for these spatial dependencies within a spatial
model it is useful to incorporate a Gaussian Random Field (GRF) which
specifies a spatially dependent covariance structure in which
locations that are closer to one another in space will in general be
more highly correlated to one another than locations that are further
apart.
Large or complex spatial models soon become intractable using a
traditional frequentist modelling framework. By contrast, equivalent
Bayesian models are typically very computationally expensive.
Integrated Nested Laplace Approximation [INLA: @Rue-2009-319] is a
Bayesian approximation framework that offers the philosophical
advantages of a full Bayesian approach, yet with the computational
efficiency of frequentist approaches.
We can consider a GRF to be stationary if the degree of correlation
between two points is dependent only on the distance between the
points, or non-stationary if we allow the correlation function to vary
over the landscape. An extreme form of non-stationary model occurs
when there are physical barriers that disrupt or block the flow of
contagious processes. In such cases, just because two locations are
in close proximity, does not necessarily mean that they will be highly
correlated. Consider a simple example of the diffusion of a dye in
water throughout a tank. The dye will spread out from the source and
gradually disperse throughout the tank. Consequently, the correlation
between the concentration of dye at any two locations during
dispersion will be dependent on the distance between the two
locations. If however, the tank had a partial barrier that restricted
the flow of dye molecules, then two locations either side of the
barrier might have very different dye concentrations despite being in
close proximity. Barrier models are able to account for these
obstructions.
$$
\begin{align}
y_i &\sim{} \Gamma(\mu_i, \alpha)\\
log(\mu_i) &= \beta_0 + u(s_i) + \varepsilon_i\\[1em]
\beta_0 &\sim{} N(0,1000)\\
u(s_i) &\sim{} GRF(r, \sigma_u)\\
\varepsilon_i &\sim{} N(0, \sigma^2)\\[1em]
\pi(\sigma_e) &\sim{} \lambda_e e^{-\lambda_e \sigma_e}\\
\pi(\sigma_u) &\sim{} \lambda_0 e^{-\lambda_0 \sigma_u}\\
\pi\left(\frac{1}{r}\right) & \sim{} \lambda_1 e^{-\lambda_1 \frac{1}{r}}
\end{align}
$$
where $y_i$ is the $ith$ observation of the target chemical variable,
$\varepsilon_i$ is the independent and individually variable random
effect used to model the very short range erratic dependencies and
$u(s_i)$ is the Gaussian Random Field and is used to model the
long-range structural (autocorrelation) dependencies. A diffuse prior
is applied to the intercept ($\beta_0$) and $\varepsilon_i$ a vector
of independent Gaussians. The Matern family spatial random effect
($u(s_i)$) and its covariance is defined by two parameters: the range
($r$: represents the length scale of the spatial dependencies) and
standard deviation ($\simga_u$: representing the ratio of spatial to
independent noise). The smaller the range, the lower the correlation
between two proximal locations.
In a boundary model, two different range parameters ($r$) are applied.
One of the range parameters is applied to the boundary area (in this
case land) and the other to the focal area (in this case Harbour). By
setting the boundary range smaller (and close to zero) than the focal
area range, the dependency structure across boundaries is disrupted
and approach zero.
Hyper-parameter priors for the random effects ($\sigma_e$, $\sigma_u$)
as well as $r$ are defined in terms of a $\lambda$ parameter which is
determined by the scale of the response (on the log scale in this
case). Both $\lambda_e$ and $\lambda_0$ were set to the median value
of the target response (on a log scale). and the focal area $r$ was
set to half the width of the spatial domain after [@Bakka-2016-1608].
The random field was approximated via a Constrained Refined Delaunay
Triangulation (CRDT) mesh. The mesh comprised of an inner region that
surrounded all the Munksgaard sediment monitoring sites as well as an
outer mesh provides a buffer to help ensure estimates close to the
inner mesh boundary are robust. In doing so, the maximum permissible
triangle edge length for the inner and outer mesh regions was set to
0.01 and 0.04 (units of latlong projection) respectively. The smaller
the values, the finer the mesh. This mesh was then projected onto the
location of the observed sample location.
The above models were fit for each of the sediment chemical recorded
in the Munksgaard 2012 sediment sampling program. Figure
@fig:inla_model_east.arm_Zn provides an example of the major elements
of one of the chemical (Zinc) spatial models. Equivalent figures for
the other chemicals are presented in Appendix A.
Figure {@fig:inla_model_east.arm_Zn}a depicts the random field mesh
with the Muksgaard 2012 sampling sites and Harbour boundary overlayed.
Figure {@fig:inla_model_east.arm_Zn}b illustrates the boundary used
for the barrier in the spatial model and Figure
{@fig:inla_model_east.arm_Zn}c illustrates the final predicted spatial
layer (with original sample data overlayed) for Zinc within the East
Arm area. For comparison, both predicted (model output) and observed
(Munksgaard samples) are presented on the same colour scale. Figure
{@fig:inla_model_east.arm_Zn}c illustrates the final predicted spatial
layer for Zinc within the East Arm area and where the colour scale is
based purely on the range of predicted values.
{#fig:inla_model_east.arm_Zn width=80%}
Although the shrinkage in models is a little high (there is a tendency
for the magnitude of change over space to be dampened), generally the
models do a very good job of depicting the relative patterns of change
in space. For this application, the absolute scale of the changes in
patterns are not important (only the relative patterns), since in
order to gauge the accuracy of any candidate sampling designs, it is
necessary to standardize the patterns anyway. Hence it is more
important that the model depict the general patterns in the observed
data than the exact values of the observed data.
\clearpage
Sampling
=============
Ideally, a good sampling design should comprise a configuration of
sites that collectively represent the broader area as efficiently as
possible. In this context, efficiency is a compromise between
complete representation (resulting from saturating the spatial domain
with sites) and minimizing sampling effort.
There are numerous approaches for generating candidate sampling
configurations. Irrespective of the approach, there must be a metric
by which the suitability of the configuration can be gauged. If we
assume that full saturation must provide maximum representation, then
all other configurations can be gauged relative to the full
saturation. Hence a measure of how well a configuration is likely to
represent the full spatial domain is the difference between some
empirical property calculated from the candidate configuration and
full saturation. For example, we could calculate the difference
between the estimated mean Magnesium from a candidate configuration
and the equivalent mean calculated from the full saturation. The
magnitude of this difference is thus a measure of the inaccuracy and
thus suitability of the candidate configuration.
In the current application, there are numerous sediment chemicals that
can be used to describe the underlying conditions within the spatial
domain. Consequently, the metric needs to incorporate the differences
across each of the chemicals. The following metric will be adopted.
$$
Mean Error = \frac{1}{n} \sum_{i=1:n}{\frac{abs(\bar{c_i} - \bar{s_i})}{\bar{r_i}}}
$$
$$
Max Error = \max_{i=1:n}{\frac{abs(\bar{c_i} - \bar{s_i})}{\bar{r_i}}}
$$
$$
Min Error = \min_{i=1:n}{\frac{abs(\bar{c_i} - \bar{s_i})}{\bar{r_i}}}
$$
where $\bar{c}$ and $\bar{s}$ are the estimated domain means of the
$ith$ chemical (out of $n$) from the candidate and full saturation
configurations respectively and the normalizing constant ($r_i$) is
the difference between maximum and minimum predicted values for the
$ith$ chemical.
There are three metrics presented so capture three broad
characteristics of the 'accuracy' of the candidate sampling designs:
- MeanError - this is a measure of the average deviation between the estimated zone mean (from the candidate model) and target mean from across all chemical species.
- MaxError - this is a measure of the maximum deviation between the estimated zone mean (from the candidate model) and the target mean from across all chemical species.
As a maximum, it can be used to compare the worst performing aspects of each candidate and thus acts as a worst case scenario.
- MinError - this is a measure of the minimum deviation between the estimated zone mean (from the candidate model) and the target mean from across all chemical species.
As a minimum, it can be used to compare the best performing aspects of each candidate and thus acts as a best case scenario.
#### Random sampling
The simplest approach to generating a spatial sampling design is to
repeatedly simulate sampling from the spatial domain with a range of
sample sizes and use the above metric to help determine the optimum
sampling size and configuration. Given a sufficiently large sample
size, random sampling should provide an unbiased and representative
sampling design. However, it is highly likely that at low sample
sizes this approach will not yield highly representative samples (high
$Error$). Yet increasing the sample size should result (on average)
in lower $Error$ (= greater accuracy). To counter the natural
stochasticity associated with simulation, we repeat each sample size
five times.
#### Sampling on a regular grid
In the simple random sampling approach above, each simulated random
draw was independent of all other draws. As a result, all
configurations are possible - even those in which multiple samples are
aggregated together in close proximity. In the absence of any prior
knowledge about the underlying environmental conditions and
heterogeneity, an even and regular spread of samples can ensure that
the sampling design does offer general representativeness. Grids of
increasing sample size offer progressively finer granularity and thus
the ability to detect smaller scale perturbations in space.
#### Conditioned Latin Hypercube Sampling
In drawing both random samples and regular grid samples, the process
is completely naive to the underlying environmental conditions.
Consequently, they were only likely to be representative once a large
number of samples were been collected. Conditional Latin Hypercube
Sampling (cLHS) is a robust and efficient statistical sampling
procedure that generates samples based on the variability within a
multidimensional covariate feature space [@Minasny-2006-1378]. Since
the sampling process is supervised by the underlying conditions (or
proxies thereof), for a given sample size, the resulting candidate
sampling configurations are more likely to yield samples that are
representative of the complex underlying conditions.
In introducing the procedure, @Minasny-2006-1378 provided a real world
example of a soil mapping project in which relatively small samples
sizes generated by cLHS closely represented the original distribution
of a set of environmental covariates. Furthermore, @Minasny-2006-153
found cLHS superior (in terms of accuracy) to a range of alternative
sampling procedures including random and stratified random sampling.
#### Spatially balanced design
Whilst regular grid sampling designs do space samples out throughout
the spatial domain, they do have the potential to introduce biases due
to any underlying systematic processes that might align with the
design (albeit unintentionally) and do not necessarily provide good
representation. On the other hand, random sampling designs offer some
degree of protection from those biases. The shear nature of
randomization means that it is possible that some sampling locations
can be clustered in very close proximity. When this is the case, not
only does it waste valuable sampling effort (as there is not need to
provide multiple estimates of the same location), it introduces
another bias. Any resulting estimates from the sampling design will
be biased towards the clustered sample conditions as those conditioned
are effectively weighted higher by virtue of the greater sampling
effort.
The ideal design is to be able to have a random configuration that
still prevents the clustering of samples. In affect, a way to
generate random locations whose probability of selection is
proportional to the distance from all already selected sites. This is
the inspiration behind **Spatially balanced designs**.
There are numerous spatially balanced designed techniques. Some such
as Generalized Random-Tessellation Stratified (GRTS), map the full two
dimensional spatial domain into a single dimension (in a way that
preserves the spatial ordering) before applying a systematic $\pi$ps
sampling technique to ensure a balanced distribution of samples
throughout the spatial domain. @Grafstrom-2012 introduced an
alternative technique in which unequal inclusion probabilities are
generated via a pivotal method.
A further (and perhaps alternative) ideal is to be able to have a
balanced design not only based on spatial proximity, but also on the
basis of dissimilarity of underlying conditions. Spatial areas that
are homogeneous with respect to some underlying sampling conditions
require fewer sampling locations to characterise the underlying
patterns than areas that are relatively heterogeneous. The spatially
balanced design via pivotal method allows any number of dimensions to
determine the inclusion probabilities.
A key determinant in selecting which of the above techniques to adopt
is based on an evaluation of the purpose of the sampling design. If
for example, the purpose is to characterise the overall condition
mean, then a 2D spatially balanced design is arguably most appropriate
as it should represent the general underlying conditions. If on the
other hand, the purpose is to be able to model the underlying patterns
and understand where any changes in these patterns occur, then
arguably a design that has been optimised around the underlying
conditions (such as a n-dimensional spatially balanced design or
conditioned latin hypercube sampling technique) is arguably more
appropriate.
## East Arm
For a range of sample sizes (5, 10, 20, 30, 40, 50, 60, 70, 80, 90,
100, 120, 200, 1000), for East Arm, each of the above sampling
approaches was repeated five times. For each run, the $Error$ metric
was calculated. The results are presented in Figures
@fig:samplingEffort (mean error) and @fig:samplingEffortMax (max
error). As expected, as the sample sizes increase, the error
declines. The simple random sampling design performs worst. The
regular grid sampling is better than the random sampling. Whilst
clusters of samples might be appropriate for representing conditions
when the conditions cluster correspondingly, totally random samples
are highly unlikely to resemble the correct cluster configuration.
The non uniform distribution of cLHS on the other hand was directly
due to the clustering patterns in the underlying conditions and thus
it was not surprising that this technique had the least error.
Interestingly, the reduction in error after a sample size of 50 is
relatively mild (notwithstanding that the figure is presented on a
log-y scale). For each of sample sizes 50, 60, 70, 80 and 100, the
best (based on lowest error) cLHS configuration is presented in
Figure. Comma delimited text files are also available with the
Latitude and Longitudes of these coordinates.
{#fig:samplingEffort width=80%}
{#fig:samplingEffortMax width=80%}
{#fig:samplingEffortMin width=80%}
On the basis of Figures @fig:samplingEffort, @fig:samplingEffortMax
and @fig:samplingEffortMin we could conclude that a sample size of 100
within East Arm is a sound choice, although it is likely that as few
as 50 could still potentially yield similar overall patterns. The
sample size of 100 also accommodates a buffer against sample
loss. Nevertheless, this entire simulation process is contingent on a
number of unverifiable assumptions:
1. that the Munksgaard 2012 sediment sampling data are representative of the typical conditions and spatial patterns.
2. all Munksgaard 2012 sediment chemicals are equally useful and informative.
3. the INLA models are able to fully represent the true underlying conditions.
4. the costs and logistics of sampling are equal irrespective of location.
The conditioned latin hypercube sampling technique consistently
outperforms the other techniques. Interestingly, there was very
little difference between the 2D and nD Spatially balanced designs.
This suggests that either the sediment chemicals were relatively
homogeneous over space or else patterns in one chemical species was
countered by patterns in another chemical species.
The conditioned latin hypercube sampling technique appeared to be able
to tune its design on the underlying sediment chemical patterns better
than the spatially balanced designs. Thus if the above assumptions
are reasonable and the main intention of the sampling was to be able
to describe the patterns in the sediment chemicals, then the sampling
design derived from this technique seems most appropriate.
If however, the purpose of the sampling design was be provide an
unbiased representative sample of the general conditions across the
spatial domain, then it could be argued that the 2D spatially balanced
design was most appropriate - particularly if there was any doubt in
the above assumptions.
{#fig:SamplingDesign_clhs width=100%}
{#fig:SamplingDesign_regular width=100%}
{#fig:SamplingDesignspatiallybalanced2 width=100%}
{#fig:SamplingDesignspatiallybalanced width=100%}
{#fig:SamplingDesign_spatiallybalanced2_final width=100%}
## Outer Harbour
For a range of sample sizes (5, 10, 20, 30, 40, 50, 60, 70, 80, 90,
100, 120, 200, 1000), for Outer Harbour, each of the above sampling
approaches was repeated five times. For each run, the $Error$ metric
was calculated. The results are presented in Figures
@fig:samplingEffort_outer (mean error) and
@fig:samplingEffortMax_outer (max error). As expected, as the sample
sizes increase, the error declined. The simple random sampling design
performs worst. The regular grid sampling was better than the random
sampling. Whilst clusters of samples might be appropriate for
representing conditions when the conditions cluster correspondingly,
totally random samples are highly unlikely to resemble the correct
cluster configuration. The non uniform distribution of cLHS on the
other hand was directly due to the clustering patterns in the
underlying conditions and thus it is not surprising that this
technique had the least error.
Interestingly, the reduction in error after a sample size of 50 was
relatively mild (notwithstanding that the figure is presented on a
log-y scale). For each of sample sizes 50, 60, 70, 80 and 100, the
best (based on lowest error) cLHS configuration is presented in
Figure.. Comma delimeted text files are also available with the
Latitude and Longitudes of these coordinates.
{#fig:samplingEffort_outer width=80%}
{#fig:samplingEffortMax_outer width=80%}
{#fig:samplingEffortMin_outer width=80%}
On the basis of Figures @fig:samplingEffort_outer,
@fig:samplingEffortMax_outer and @fig:samplingEffortMin_outer we could
conclude that a sample size of 100 within Outer Harbour was a sound
choice, although it is likely that as few as 50 could still
potentially yield similar overall patterns. The sample size of 100
also accommodates a buffer against sample loss. Nevertheless, this
entire simulation process is contingent on a number of unverifiable
assumptions:
1. that the Offset Outer Harbour sediment sampling data are representative of the typical conditions and spatial patterns.
2. all Offset Outer Harbour sediment chemicals are equally useful and informative.
3. the INLA models are able to fully represent the true underlying conditions.
4. the costs and logistics of sampling are equal irrespective of location.
Contrary to the situation for the East Arm area, the conditioned latin
hypercube sampling technique only outperformed the other techniques at
very low sample sizes. After a sample size of approximately 30, the
2D spatially balanced design had better Minimum and Maximum Error.
Also of interest is the finding that the multidimensional spatially
balanced design was consistently worse than both a regular grid and 2D
spatially balanced design and on par with a totally random design.
This might suggest that there were fewer distinct patterns in the
underlying sediment chemical data as observed in the Offset Outer
Harbour sampling program.
Again, if however the purpose of the sampling design was be provide an
unbiased representative sample of the general conditions across the
spatial domain, then it could be argued that the 2D spatially balanced
design was most appropriate - particularly if there was any doubt in
the above assumptions.
{#fig:SamplingDesign_clhs_outer width=100%}
{#fig:SamplingDesign_regular_outer width=100%}
{#fig:SamplingDesign_spatiallybalanced2_outer width=100%}
{#fig:SamplingDesign_spatiallybalanced_outer width=100%}
{#fig:SamplingDesign_spatiallybalanced2_final_outer width=100%}
<!--
The Outer Harbour presented two issues that required modifications to the procedure listed above for East Arm.
Firstly, the absence of more extensive spatial coverage in the Munksgaard 2012 sediment monitoring program from which to develop
predictive layers precludes the use of cLHS. As a result, only random sampling and regular grid sampling can be considered.
Secondly, the combined mask derived from the hydrodynamic and wave models leaves virtually no suitable sampling area in the Outer Harbour.
As a result, this second of the investigation will be postponed until further consultation.
-->
<!--
Candidate sampling configurations are generated in a manner that minimizes deviations from the original
latin hypercube and thus yields sampling locations that are most likely to represent the complex underlying conditions.
Whist originally devised by McKay 1979 to improve the speed of computationally expensive operations (such as
Monte-Carlo integration), more recent modifications by Minasny and McBratney 2006 have provided a
Latin Hypercube Sampling (LHS) is a statistical method devised for computational experiments for generating near random samples
from multidimensional feature spaces such that each variable is maximally stratified (Mckay, 1979).
Conditional Latin Hypercube Sampling (cLHS) is a modification of LHS in which the sampling is guaranteed to yield samples
that actually occur.
- Minasny and McBratney (2006)
- efficient method for sampling from the variability of the feature space of multiple environmental covariates
- based on the concept of Latin Hypercube Sampling (LHS) where a sample is drawn from the covariates such that each variable is maximally stratified.
- Conditioned Latin Hypercube Sampling adds the condition that the sample chosen must actually occur on the landscape
- Minasny and McBratney (2006) showed that cLHS closely represented the original distribution of the environmental covariates with relatively small sample sizes in a digital soil mapping project in the Hunter Valley of New South Wales, Australia
- robust sample allocation tool that utilizes a stack ofcontinuous and/or categorical covariates
- The cLHS method is a stratified random procedurethat picks samples based on the covariates’ distribu-tions.
- This is achieved using an optimisation routine,which minimizes an energy function that describeshow well the candidate sampling scheme representsa Latin hypercube of the covariates’ distributions
- tsa Latin hypercube of the covariates’ distributions. AsetP={p1,...,pN}forms a Latin hypercube of a(N×N×P)matrix if there is exactly one pointpk occurring once in each row of every dimension
- (LHS) is a statistical method for generating a near-random sample of parameter values from a multidimensional distribution.
- In the context of statistical sampling, a square grid containing sample positions is a Latin square if (and only if) there is only one sample in each row and each column.
- A Latin hypercube is the generalisation of this concept to an arbitrary number of dimensions, whereby each sample is the only one in each axis-aligned hyperplane containing it.
-->
Conclusions
==============
Overall, 2D spatially balanced sampling designs for both Outer Harbour
and East Arm would seem most appropriate. These designs are immune to
any uncertainty in previous data and spatial modelling and should
yield well balanced spatial configurations. A total of 100 samples
collected from both East Arm and Outer Harbour was likely to yield
representative samples from which to construct a variety of
spatio-temporal models into the future.
Appendix A
===========
{#fig:inla_model_east.arm_Al width=80%}
{#fig:inla_model_east.arm_P width=80%}
{#fig:inla_model_east.arm_S width=80%}
{#fig:inla_model_east.arm_Ca width=80%}
{#fig:inla_model_east.arm_V width=80%}
{#fig:inla_model_east.arm_Cr width=80%}
{#fig:inla_model_east.arm_Fe width=80%}
{#fig:inla_model_east.arm_Mn width=80%}
{#fig:inla_model_east.arm_Co width=80%}
{#fig:inla_model_east.arm_Ni width=80%}
{#fig:inla_model_east.arm_Cu width=80%}
{#fig:inla_model_east.arm_Zn width=80%}
{#fig:inla_model_east.arm_Ga width=80%}
{#fig:inla_model_east.arm_As width=80%}
{#fig:inla_model_east.arm_Se width=80%}
{#fig:inla_model_east.arm_Mo width=80%}
{#fig:inla_model_east.arm_Ca width=80%}
{#fig:inla_model_east.arm_Pb width=80%}
Appendix B
===========
{#fig:inla_model_outer_Al width=80%}
{#fig:inla_model_outer_As width=80%}
{#fig:inla_model_outer_Ce width=80%}
{#fig:inla_model_outer_Cr width=80%}
{#fig:inla_model_outer_Cu width=80%}
{#fig:inla_model_outer_Fe width=80%}
{#fig:inla_model_outer_Mn width=80%}
{#fig:inla_model_outer_Ni width=80%}
{#fig:inla_model_outer_Pb width=80%}
{#fig:inla_model_outer_Sb width=80%}
{#fig:inla_model_outer_V width=80%}
{#fig:inla_model_outer_Zn width=80%}
{#fig:inla_model_outer_TOC width=80%}
{#fig:inla_model_outer_TN width=80%}
This document was produced from markdown using knitr on `r version$version.string` on a `r version$platform` system.
```{r, results='markdown'}
sessionInfo()
```
References
====================