-
Notifications
You must be signed in to change notification settings - Fork 16
SHRIMP: Sub EqnInterp
[The following is a generalised function, used and reused by SQUID 2.50 on many different occasions during a 'typical' data-reduction. As such, the inputs and outputs of the functions have generalised names. The identities of the specific inputs and outputs are defined when the function is called.]
The function takes a text-string written as an 'Excel-formatted' equation, including a prescribed format for referencing 'ratios of interest' (square brackets surrounding double-quotes: e.g. ["206/270"] if the specified equation intends to make use of that ratio). It then calculates EITHER the error-weighted linear regression of Dodson-interpolated nuclide ratios (or Task equations) vs time, with the regression-error calculated at analysis midtime OR the error-weighted mean of Dodson-interpolated nuclide ratios (or Task equations) and its error, independent of time. The function is intended specifically for use in 'Excel-formatted' equations which DO utilise one or more 'ratios of interest': ordinary Excel equations that do not reference any 'ratio of interest' are handled elsewhere.
Note that in BOTH cases, the error-weighting relates solely to the y-errors (x-errors associated with timestamps, if applicable, are assumed zero throughout), and that in BOTH cases, the calculations use index-adjacent ratio error-correlations as described by Ludwig (2009; http://dx.doi.org/10.1016/j.chemgeo.2009.07.004). Much of the process is similar to that outlined in Step 4 (and many of the same subroutines are reused), but with modifications to reflect the potential for multiple 'ratios of interest' to be used in the text-string equation. The EqnInterp subroutine also uses an 'all-purpose' approximation of the index-adjacent ratio error-correlations, again to deal with the potential for multiple 'ratios of interest'.
Once the value, its error, and its MSWD and probability of fit have been determined, they are tested against a threshold for statistical coherence (probability of fit > 0.1). If this test is failed, each of the constituent data-points are deleted (sequentially, one at a time) and the effect of each rejection on the MSWD derived from the remaining points is assessed. If a deletion decreases the MSWD of the remaining points by a specified factor (which depends on the number of data-points in the input, and the type of calculation specified) AND statistical coherence is attained, then the (single) rejection is permitted. If the number of data-points in the input is relatively large (>8 for weighted average, >9 for linear regression), then a search for a second rejection is permitted.
EqnInterp(Formula, EqNum, MeanEq, EqValFerr, SqrMinMSWD, WLrej, NoRatSht)
Formula: This is a text-string specifying the equation to be evaluated, in 'Excel format' and with specific syntax wherever a 'ratio of interest' is to be employed (e.g. ["206/238"] when the isotopic ratio 206/238 is included in the equation). Specific examples of text-strings representing Excel-format equations to be evaluated include:
ln(["254/238"])
ln(["206/238"])
["206/238"]/["254/238"]^2
["238/196"]/["254/238"]^0.66
(0.03446*["254/238"]+0.868)*["248/254"]
EqNum: Integer defining the index-number of the equation to be evaluated, as an input. In 'U-Pb geochronology' mode, the candidate values span -4 to -1 ('Special U-Pb equations'), and then 1 to 50 ('General equations'). Negative indices correspond to specific 'Special U-Pb equations' as follows:
-4 = expression for element concentration (optional, and our initial 2016-17 SQUID replication caters for U only, with no option for Th);
-3 = expression for 232Th/238U (optional, although this is the only way 232Th/238U can be defined in our initial 2016-17 SQUID replication, because we will not be covering dual 206Pb/238U-208Pb/232Th calibration in the initial instance);
-2 = expression for 208Pb/232Th if evaluated directly via calibration (this equation is not applicable in our initial 2016-17 SQUID replication);
-1 = expression for 206Pb/238U (mandatory in our initial 2016-17 SQUID replication).
Positive EqNum values refer to the 'Index' numbers used in the user-defined Task equations. For context, the five specific examples above are re-cited below with their real EqNum values:
ln(["254/238"]) has EqNum = 1
ln(["206/238"]) has EqNum = 2
["206/238"]/["254/238"]^2 has EqNum = -1
["238/196"]/["254/238"]^0.66 has EqNum = -4
(0.03446*["254/238"]+0.868)*["248/254"] has EqNum = -3
MeanEq: The calculated 'mean' y-value calculated (EITHER error-weighted linear regression to analysis midtime (NOT the y-intercept at x = 0!) OR as error-weighted average independent of time), as an output.
EqValFerr: The calculated 1-sigma fractional error of the MeanEq value, as an output.
SqrMinMSWD: The calculated Mean Square of Weighted Deviates associated with the Intercept, as an output.
WLrej: Integer (= 0 at start of each equation and each analysis) incremented each time a valid rejection is identified.
NoRatSht: A Boolean dictating whether the EqnInterp output is to be placed on the 'Within-Spot Ratios' sheet (FALSE) or not (TRUE). Its universal default value is FALSE, and FALSE is also the case of interest in the initial 2016-17 SQUID replication.
Values of type Boolean
IsPkOrder, Bad, Singlescan
Values of type Integer
i, NumRatPks, InterpEqCt, NumPksInclDupes, NumRej, sIndx, ScanNum, RatNum, TotalNumRatios, Indx, Num1Denom2, PkOrder, UndupedPk
Values of type Double
InterpTime, InterpTimeSpan, PkFractErr, ScanPkCts, TotRatTime, RedPk1Ht, RedPk2Ht, NetPkCps, EqValTmp, EqFerr, MSWD, PkF1, PkF2, PkTdelt, MidTime, Intercept, SigmaIntercept, CovSlopeInter, MeanTime, MeanEqSig, Slope, SigmaSlope, Probfit, FractInterpTime, FractLessInterpTime
Vector of length Nspecies (addressed [1 to Nspecies], with Nspecies defined in Step 2) comprising values of type Double
PkFSig
Arrays of size Nscans x Nspecies (addressed [1 to Nscans, 1 to Nspecies], with Nscans defined in Step 2) comprising values of type Double
ReducedPkHt, ReducedPkHtFerr
Other arrays of size specified below, comprising values of type Double
PkInterp, PkInterpFerr, SigRho, EqVal, EqTime, AbsErr, FractErr
EqnInterp is intended specifically for the evaluation of 'Excel-formatted' equations that DO invoke at least one 'ratio of interest' as part of their formulation. In the below code-block, EqnRats is a vector array with length equal to the total number of equations defined for the Task, and for each equation, EqnRats contains an integer value corresponding to the total number of 'ratios of interest' invoked in that equation (i.e. all duplicate occurrences of a 'ratio of interest' are counted separately). If the equation is of the 'ordinary Excel' variety (i.e. no 'ratio of interest' invoked), then EqnInterp should not be used. The code also contains a check that Nscans >= 1:
TotalNumRatios = EqnRats[EqNum] --in our initial example, TotalNumRatios = 1
If (TotalNumRatios = 0 Or Nscans < 1)
MeanEq = _[SQUID error-value]_
Exit Sub
End If
Once a positive value for TotalNumRatios has been established, it is doubled to defined the total number of 'ratio-peaks'. The next step is to define how many double-interpolated values there will be (i.e. 1 when Nscans = 1, and (Nscans - 1) when Nscans > 1):
NumRatPks = 2 * TotalNumRatios
Indx = Nscans - 1
If Nscans = 1
Singlescan = TRUE
sIndx = 1
Else
Singlescan = FALSE
sIndx = Indx
End If
(I know that code-blocks like this are significantly more long-winded than necessary, but I am keen to keep my documentation of the code as explicit and human-readable as possible, particularly with respect to Booleans. Ludwig made extensive use of abbreviated, implicit syntax for Booleans, and the result is that sections of the VBA code are very difficult to understand, particularly in combination with logic tests.)
With sIndx defined, the sizes of some arrays can now be defined accordingly:
ReDim PkInterpFerr[1 To sIndx, 1 To Nspecies], PkInterp[1 To sIndx, 1 To Nspecies]
The next step is to convert SBM peak-heights, calculate uncertainties, and assign 'working' peak-heights ("ReducedPkHt") values to both SBM-normalised data and data not normalised to SBM.
In the below code-block, EqPkOrd is a three-dimensional array with a 'depth' of 2, with one row for each equation defined for the Task, and one column for each occurrence of a 'ratio of interest' used in that equation (i.e. all duplicate occurrences of a 'ratio of interest' are counted separately). The third dimension of the array corresponds to numerators when 1, and denominators when 2. All values within this array are integers, corresponding to the indices of the numerators and denominators (within the list [1 to Nspecies]: see text at the beginning of Step 3 for a reminder):
For RatNum = 1 To TotalNumRatios --in our initial example, TotalNumRatios = 1
For ScanNum = 1 To Nscans
For Num1Denom2 = 1 To 2
PkOrder = EqPkOrd[EqNum, RatNum, Num1Denom2]
--in first formula, PkOrder = 7 for denominator 238, and 9 for numerator 254
If PkOrder > 0
NetPkCps = PkNetCps[ScanNum, PkOrder] --note that PkNetCps comes from Step 2!
If NetPkCps = _[SQUID error-value]_
ReducedPkHt[ScanNum, PkOrder] = _[SQUID error-value]_
Exit For
End If
ScanPkCts = NetPkCps * count_time_sec[PkOrder] --counts for this scan
If ScanPkCts <= 0 And ScanPkCts > 16 --verbatim, seems nonsensical
ReducedPkHt[ScanNum, PkOrder] = _[SQUID error-value]_
Exit For
End If
If SBMCps[ScanNum, PkOrder] <= 0 --note that SBMCps comes from Step 2!
SbmNorm = FALSE --looks universal, no recourse to reset to TRUE later?
End If
If SbmNorm = TRUE --Normalise to SBM counts for the on-peak time
ScanPkCts = ScanPkCts / SBMCps[ScanNum, PkOrder]
End If
PkFractErr = PkFerr[ScanNum, PkOrder] --note that PkFerr comes from Step 2!
If SbmNorm = TRUE
PkFractErr = sqrt( PkFractErr ^ 2 +
1 / SBMCps[ScanNum, PkOrder] / count_time_sec[PkOrder] )
End If
ReducedPkHt[ScanNum, PkOrder] = ScanPkCts / count_time_sec[PkOrder]
ReducedPkHtFerr[ScanNum, PkOrder] = PkFractErr
End If
Next Num1Denom2
Next ScanNum
Next RatNum
The next step is to calculate the time of interpolation (if needed), followed by the interpolated peak-heights:
For ScanNum = 1 To sIndx
If Singlescan = FALSE
InterpTimeSpan = 0
For RatNum = 1 To TotalNumRatios --calculate mean time of interpolation
For Num1Denom2 = 1 To 2
PkOrder = EqPkOrd[EqNum, RatNum, Num1Denom2]
If PkOrder > 0
InterpTimeSpan = InterpTimeSpan + time_stamp_sec[ScanNum, PkOrder]
+ time_stamp_sec[1 + ScanNum, PkOrder]
End If
Next Num1Denom2
Next RatNum
InterpTime = InterpTimeSpan / NumRatPks / 2 --time of interpolation
End If
For RatNum = 1 To TotalNumRatios --calculate the interpolated peak-heights
For Num1Denom2 = 1 To 2
PkOrder = EqPkOrd[EqNum, RatNum, Num1Denom2]
If PkOrder > 0
If Singlescan = FALSE
PkInterp[ScanNum, PkOrder] = _[SQUID error-value]_
--Set to error-value unless/until overwritten
PkTdelt = time_stamp_sec[1 + ScanNum, PkOrder] -
time_stamp_sec[ScanNum, PkOrder]
If PkTdelt <= 0
GoTo [Bailout 1]
End If
FractInterpTime = (InterpTime - time_stamp_sec[ScanNum, PkOrder]) / PkTdelt
FractLessInterpTime = 1 - FractInterpTime
RedPk2Ht = ReducedPkHt[1 + ScanNum, PkOrder]
End If
RedPk1Ht = ReducedPkHt[ScanNum, PkOrder]
If RedPk1Ht = _[SQUID error-value]_ Or RedPk2Ht = _[SQUID error-value]_
GoTo [Bailout 1]
End If
PkF1 = ReducedPkHtFerr[ScanNum, PkOrder]
If Singlescan = TRUE
PkInterp[ScanNum, PkOrder] = RedPk1Ht
PkInterpFerr[ScanNum, PkOrder] = PkF1
Else
PkInterp[ScanNum, PkOrder] = (FractLessInterpTime * RedPk1Ht) +
(FractInterpTime * RedPk2Ht)
PkF2 = ReducedPkHtFerr[1 + ScanNum, PkOrder]
PkInterpFerr[ScanNum, PkOrder] = sqrt( (FractLessInterpTime * PkF1) ^ 2 +
(FractInterpTime * PkF2) ^ 2 )
End If
End If
Next Num1Denom2
Next RatNum
Bailout 1:
Next ScanNum
The next step is to evaluate the equation ('FormulaEval', documented separately), and approximate the uncertainties:
InterpEqCt = 0
For ScanNum = 1 To sIndx
FormulaEval Formula, EqNum, ScanNum, PkInterp, PkInterpFerr, EqValTmp, EqFerr
--to be described and documented separately
If (EqFerr > 0 And EqValTmp <> _[SQUID error-value]_) --very long 'if'
InterpEqCt = InterpEqCt + 1
--define and redefine the following vectors, preserving their contents:
ReDim Preserve EqVal[1 To InterpEqCt], FractErr[1 To InterpEqCt],
AbsErr[1 To InterpEqCt], EqTime[1 To InterpEqCt]
EqVal[InterpEqCt] = EqValTmp
--Ludwig describes the following as 'a kluge to approximate effect of interpolation
--combining peaks'. On 2 April 2010, he commented out the following line:
--If piNscans > 2 And Not Singlescan Then EqFerr = EqFerr * 1.2
AbsErr[InterpEqCt] = Abs(EqFerr * EqValTmp)
FractErr[InterpEqCt] = EqFerr
If AbsErr[InterpEqCt] = 0
AbsErr[InterpEqCt] = 1E-32
FractErr[InterpEqCt] = 1E-32
End If
NumPksInclDupes = 0
TotRatTime = 0
For UndupedPk = 1 To NoDupePkN[EqNum]
--NoDupePkN is (I believe) a vector array with length equal to the total
--number of equations defined for the Task, and for each equation, NoDupePkN
--contains an integer value corresponding to the number of *distinct* species
--used in the 'ratios of interest' invoked in the equation.
PkOrder = EqPkUndupeOrd[EqNum, UndupedPk]
--EqPkUndupeOrd is a two-dimensional analogue of EqPkOrd, , with one row
--for each equation defined in the Task, and one column for distinct species
--used in that equation, containing integer values corresponding to the
--indices of those species, in order of measurement. See the FormulaEval
--documentation for more information and specific examples.
IsPkOrder = FALSE
For RatNum = 1 To TotalNumRatios
For Num1Denom2 = 1 To 2
If EqPkOrd[EqNum, RatNum, Num1Denom2] = PkOrder
TotRatTime = TotRatTime + time_stamp_sec[ScanNum, PkOrder]
NumPksInclDupes = NumPksInclDupes + 1
If Singlescan = FALSE
TotRatTime = TotRatTime + time_stamp_sec[1 + ScanNum, PkOrder]
NumPksInclDupes = NumPksInclDupes + 1
End If
IsPkOrder = TRUE
Exit For --verbatim, but not sure why?
End If
Next Num1Denom2
If IsPkOrder = TRUE
Exit For
End If
Next RatNum
Next UndupedPk
EqTime[InterpEqCt] = TotRatTime / NumPksInclDupes
End If
Bailout 3: --doesn't seem to be used, could probably be deleted
Next ScanNum
The final step is to assemble outputs EqTime, EqVal and AbsErr (which are analogous to the 'Step 3' outputs in the foregoing 'ratio of interest' calculations), to define SigRho (as input for the use of subroutine WtdLinCorr and its sub-subroutines), and to calculate MeanEq and EqValFerr (which are analogous to the 'Step 4' outputs in the foregoing 'ratio of interest' calculations):
If InterpEqCt > 0
ReDim Preserve EqTime[1 To InterpEqCt], EqVal[1 To InterpEqCt], AbsErr[1 To InterpEqCt]
ReDim SigRho[1 To InterpEqCt, 1 To InterpEqCt]
For i = 1 To InterpEqCt
SigRho[i, i] = AbsErr[i]
If i > 1 --Approximate kluge for correlation coefficients
SigRho[i, i - 1] = 0.25
SigRho[i - 1, i] = 0.25
End If
Next i
If InterpEqCt = 1
MeanEq = EqVal[1]
EqValFerr = FractErr[1]
ElseIf InterpEqCt > 3 And UserLinFits = TRUE
WtdLinCorr 2, InterpEqCt, EqVal, SigRho, MSWD, Probfit, NumRej, Intercept,
SigmaIntercept, Bad, Slope, SigmaSlope, CovSlopeInter, EqTime
If Bad = TRUE
MeanEq = _[SQUID error-value]_
EqValFerr = _[SQUID error-value]_
Exit Sub
End If
MidTime = ( time_stamp_sec[Nscans, Nspecies] + time_stamp_sec[1, 1] ) / 2
MeanEq = Slope * MidTime + Intercept
MeanEqSig = sqrt( (MidTime * SigmaSlope) ^ 2 + (SigmaIntercept ^ 2) +
2 * MidTime * CovSlopeInter)
Else
WtdLinCorr 1, InterpEqCt, EqVal, SigRho, MSWD, Probfit, NumRej, MeanEq,
MeanEqSig, Bad
If Bad = TRUE
MeanEq = _[SQUID error-value]_
EqValFerr = _[SQUID error-value]_
Exit Sub
End If
End If
If MeanEq = 0
EqValFerr = 1
Else
EqValFerr = Abs(MeanEqSig / MeanEq)
End If
End If
--Bodorkos 2017-04-04: next 2 lines to force VBA-Java match
MeanEq = SigDigStrDbl( MeanEq, 12 ) --12 sig. digits
EqValFerr = SigDigStrDbl( EqValFerr, 12 ) --12 sig. digits
End Sub
Sanity checks for EqnInterp: There are two associated with the output of EqnInterp: one that is analogous to that previously used for Step 3 ('SQUID_03...') output, and one that is analogous to that previously used for Step 4 ('SQUID_04...') output.
Step 3-style output: The structure here is completely analogous to that previously used for Step 3 ('SQUID_03...') output. The rows are exactly the same, so the new columns could just be added to the right-hand end of SQUID_03... This is what SQUID does.
For each analysis/fraction, there will be InterpEqCt (mostly equivalent to sIndx = Ndod = Nscans - 1) rows, so for the demo XML, there will be 114 analyses x (Ndod = 5) rows per analysis = 570 rows total. Four columns at left: (1) spot name, (2) date/time, (3) 'scan number' (i.e. InterpEqCt), and (4) 'analysis type' should identify whether the analysis has been identified as being of the 206Pb/238U reference zircon (prefix 'T' in the demo XML), or not ('unknown', all other prefixes in the XML).
The new data-columns (15 in total) will constitute three sets of values for each of the 5 'example equations'. Each set of values is a vector of length InterpEqCt, as follows:
- EqTime (label "Time")
- EqVal (custom labels to begin with: "U Conc Const" when EqNum = -4; "232/238" when EqNum = -3; "206/238 Calib Const" when EqNum = -1; "Ln254/238" when EqNum = 1; Ln206/238 when EqNum = 2)
- AbsErr (label "+/- 1sig absolute")
Finally, we'll need some exotic sorting, to facilitate comparison with the SQUID-book:
- 'analysis type' alphabetical ascending (or whatever is needed to ensure that all 'ref materials' precede all 'unknowns')
- 'date/time' ascending (so within the 'analysis type' subdivision, analyses appear in chronological order, as per Step 1 sanity-check)
- 'scan (InterpEqCt) number' ascending (so within each analysis, interpolation-rows appear in chronological order)
When the rows are sorted, the contents of the 15 columns will be closely comparable to the 15 columns (starting at AF) in worksheet 'Within-Spot Ratios' in SQUID-book 100142_G6147_original_frozen.xls.
Sanity check: Mean ratio and percentage error, per 'ratio of interest', per analysis
The structure here is completely analogous to that previously used for Step 4 ('SQUID_04...') output. The rows are exactly the same, so the new columns could just be added to the right-hand end of SQUID_04... This is what SQUID does.
This array will have one row per analysis, and two columns per 'ratio of interest' (mean value and percentage error). For the demo XML, the array will have 114 rows of data (one per analysis). Three 'left-hand' columns to allow the rows to be identified and sorted:
- Title = analysis-specific text-string read from XML
- Date = analysis-specific date read from XML, to be expressed as YYYY-MM-DD HH24:MI:SS
- Type = 'ref mat' or 'unknown'; analyses with prefix 'T.' to be labelled 'ref mat', all others 'unknown'
The new data-columns (10 in total) will constitute two values for each of the 5 'example equations', as follows:
- MeanEq (custom labels to begin with: "U Conc Const" when EqNum = -4; "232/238" when EqNum = -3; "206/238 Calib Const" when EqNum = -1; "Ln254/238" when EqNum = 1; Ln206/238 when EqNum = 2)
- EqValFerr (label "1sigma %err"), as a percentage, with a maximum (display) value of 9999.
Sorting: Primary criterion = Type (ascending; 'ref mat' before 'unknown', so alphabetical would do), secondary criterion = Date (ascending).