The OpenQuake engine uses (Monte Carlo) sampling strategies for propagating epistemic uncertainty at various stages in a calculation. The sampling is based on numpy's pseudo-random number generator. Setting a 'seed' is useful for controlling the initialization of the random number generator, and repeating a calculation using the same seed should result in identical random numbers being generated each time.
Three different seeds are currently recognized and used by the OpenQuake engine.
-
random_seed
is the seed that controls the sampling of branches from both the source model logic tree and the ground motion model logic tree, when the parameternumber_of_logic_tree_samples
is non-zero. It affects both classical calculations and event based calculations. -
ses_seed
is used to generate the seeds for the ruptures involved in a scenario or event based calculation. In an event based calculation the generation of ruptures is also affected by therandom_seed
, unless full enumeration of the logic tree is used, due to the reasons mentioned in the previous paragraph. For both event based and scenario calculations the rupture seeds are used for sampling ground motion values / intensities from a GMPE / IPE, when the parametertruncation_level
is non-zero. NB: before engine 3.11, sampling ground motion values / intensities from a GMPE / IPE in a scenario calculation was incorrectly controlled by therandom_seed
and not theses_seed
. -
master_seed
is used when generating the epsilons in a calculation involving vulnerability functions with non-zero coefficients of variations. This is a purely risk-related seed, while the previous two are hazard-related seeds.
What values should I use for investigation_time
, ses_per_logic_tree_path
, and number_of_logic_tree_samples
in my calculation? And what does the risk_investigation_time
parameter for risk calculations do?
Setting the number_of_logic_tree_samples
is relatively straightforward. This
parameter controls the method used for propagation of epistemic uncertainty
represented in the logic-tree structure and calculation of statistics such as
the mean, median, and quantiles of key results.
number_of_logic_tree_samples = 0
implies that the engine will perform a
so-called 'full-enumeration' of the logic-tree, i.e., it will compute the
requested results for every end-branch, or 'path' in the logic-tree. Statistics
are then computed with consideration of the relative weights assigned to each
end-branch.
For models that have complex logic-trees containing thousands, or even millions
of end-branches, a full-enumeration calculation will be computationally
infeasible. In such cases, a sampling strategy might be more preferable and much
more tractable. Setting, for instance, number_of_logic_tree_samples = 100
implies that the engine will randomly choose (i.e., 'sample') 100 end-branches
from the complete logic-tree based on the weight assignments. The requested
results will be computed for each of these 100 sampled end-branches. Statistics
are then computed using the results from the 100 sampled end-branches, where the
100 sampled end-branches are considered to be equi-weighted (1/100 weight for
each sampled end-branch). Note that once the end-branches have been chosen for
the calculation, the initial weights assigned in the logic-tree files have no
further role to play in the computation of the statistics of the requested
results. As mentioned in the previous section, changing the random_seed
will
result in a different set of paths or end-branches being sampled.
The risk_investigation_time
parameter is also fairly straightforward. It
affects only the risk part of the computation and does not affect the hazard
calculations or results. Two of the most common risk metrics are (1) the
time-averaged risk value (damages, losses, fatalities) for a specified
time-window, and (2) the risk values (damages, losses, fatalities) corresponding
to a set of return periods. The risk_investigation_time
parameter controls the
time-window used for computing the former category of risk metrics.
Specifically, setting risk_investigation_time = 1
will produce average
annual risk values; such as average annual collapses, average annual losses,
and average annual fatalities. This parameter does not affect the computation of
the latter category of risk metrics. For example, the loss exceedance curves
will remain the same irrespective of the value set for
risk_investigation_time
, provided all other parameters are kept the same.
Next, we come to the two parameters investigation_time
and
ses_per_logic_tree_path
.
If the hazard model includes time-dependent sources, the choice of the
investigation_time
will most likely be dictated by the source model(s), and
the engine will raise an error unless you set the value to that required by the
source model(s). In this case, the ses_per_logic_tree_path
parameter can be
used to control the effective length of the stochastic event-set (or event
catalog) for each end-branch, or 'path', for both full-enumeration and
sampling-based calculations. As an example, suppose that the hazard model
requires you to set investigation_time = 1
, because the source model defines
1-year occurrence probabilities for the seismic sources. Further, suppose you
have decided to sample 100 branches from the complete logic-tree as your
strategy to propagate epistemic uncertainty. Now, setting
ses_per_logic_tree_path = 10000
will imply that the engine will generate
10,000 'event-sets' for each of the 100 sampled branches, where each 'event-set'
spans 1 year. Note that some of these 1-year event-sets could be empty, implying
that no events were generated in those particular 1-year intervals.
On the other hand, if the hazard model contains only time-independent sources,
there is no hard constraint on the investigation_time
parameter. In this case,
the ses_per_logic_tree_path
parameter can be used in conjunction with the
investigation_time
to control the effective length of the stochastic event-set
(or event catalog) for each end-branch, or 'path', for both full-enumeration and
sampling-based calculations. For instance, the following three calculation
settings would produce statistically equivalent risk results:
Calculation 1
number_of_logic_tree_samples = 0
investigation_time = 1
ses_per_logic_tree_path = 10000
risk_investigation_time = 1
Calculation 2
number_of_logic_tree_samples = 0
investigation_time = 50
ses_per_logic_tree_path = 200
risk_investigation_time = 1
Calculation 3
number_of_logic_tree_samples = 0
investigation_time = 10000
ses_per_logic_tree_path = 1
risk_investigation_time = 1
The effective catalog length per branch in such cases is investigation_time × ses_per_logic_tree_path
. The choice of how to split the effective catalog
length amongst the two parameters is up to the modeller/analyst's preferrence,
and there are no performance implications for perferring particular choices.
Note that if you were also computing hazard curves and maps in the above example calculations, the hazard curves output in the first calculation would provide probabilities of exceedance in 1 year, whereas the hazard curves output in the second calculation would provide probabilities of exceedance in 50 years. All risk results for the three calculations will be statistically identical.
Starting from engine 3.10 you can get a summary of the total losses across your portfolio of assets arising from each seismic source, over the effective investigation time. For instance run the event based risk demo as follows:
$ oq engine --run job.ini
and export the output "Source Loss Table". You should see a table like the one below:
source | loss_type | loss_value |
---|---|---|
231 | nonstructural | 1.07658E+10 |
231 | structural | 1.63773E+10 |
386 | nonstructural | 3.82246E+07 |
386 | structural | 6.18172E+07 |
238 | nonstructural | 2.75016E+08 |
238 | structural | 4.58682E+08 |
239 | nonstructural | 4.51321E+05 |
239 | structural | 7.62048E+05 |
240 | nonstructural | 9.49753E+04 |
240 | structural | 1.58884E+05 |
280 | nonstructural | 6.44677E+03 |
280 | structural | 1.14898E+04 |
374 | nonstructural | 8.14875E+07 |
374 | structural | 1.35158E+08 |
⋮ | ⋮ | ⋮ |
from which one can infer the sources causing the highest total losses for the portfolio of assets within the specified effective investigation time.
The PML for a given return period is built from the losses in the event loss table depending on the effective investigation time. The algorithm used is documented in detail in the advanced manual at the end of the section about risk calculations. The section also explains why sometimes the PML or the loss curves contain NaN values (the effective investigation time is too short compared to the return period).