Skip to content

SHRIMP: Sub EqnInterp

sbodorkos edited this page Apr 18, 2017 · 11 revisions

SQUID 2.50 Sub: EqnInterp

[The following is a generalised function, used and reused by SQUID 2.50 on many different occasions during a 'typical' data-reduction. As such, the inputs and outputs of the functions have generalised names. The identities of the specific inputs and outputs are defined when the function is called.]

The function takes a text-string written as an 'Excel-formatted' equation, including a prescribed format for referencing 'ratios of interest' (square brackets surrounding double-quotes: e.g. ["206/270"] if the specified equation intends to make use of that ratio). It then calculates EITHER the error-weighted linear regression of Dodson-interpolated nuclide ratios (or Task equations) vs time, with the regression-error calculated at analysis midtime OR the error-weighted mean of Dodson-interpolated nuclide ratios (or Task equations) and its error, independent of time. The function is intended specifically for use in 'Excel-formatted' equations which DO utilise one or more 'ratios of interest': ordinary Excel equations that do not reference any 'ratio of interest' are handled elsewhere.

Note that in BOTH cases, the error-weighting relates solely to the y-errors (x-errors associated with timestamps, if applicable, are assumed zero throughout), and that in BOTH cases, the calculations use index-adjacent ratio error-correlations as described by Ludwig (2009; http://dx.doi.org/10.1016/j.chemgeo.2009.07.004). Much of the process is similar to that outlined in Step 4 (and many of the same subroutines are reused), but with modifications to reflect the potential for multiple 'ratios of interest' to be used in the text-string equation. The EqnInterp subroutine also uses an 'all-purpose' approximation of the index-adjacent ratio error-correlations, again to deal with the potential for multiple 'ratios of interest'.

Once the value, its error, and its MSWD and probability of fit have been determined, they are tested against a threshold for statistical coherence (probability of fit > 0.1). If this test is failed, each of the constituent data-points are deleted (sequentially, one at a time) and the effect of each rejection on the MSWD derived from the remaining points is assessed. If a deletion decreases the MSWD of the remaining points by a specified factor (which depends on the number of data-points in the input, and the type of calculation specified) AND statistical coherence is attained, then the (single) rejection is permitted. If the number of data-points in the input is relatively large (>8 for weighted average, >9 for linear regression), then a search for a second rejection is permitted.

Usage

EqnInterp(Formula, EqNum, MeanEq, EqValFerr, SqrMinMSWD, WLrej, NoRatSht)

Mandatory variables

Formula: This is a text-string specifying the equation to be evaluated, in 'Excel format' and with specific syntax wherever a 'ratio of interest' is to be employed (e.g. ["206/238"] when the isotopic ratio 206/238 is included in the equation). Specific examples of text-strings representing Excel-format equations to be evaluated include:

ln(["254/238"])  
ln(["206/238"])  
["206/238"]/["254/238"]^2
["238/196"]/["254/238"]^0.66  
(0.03446*["254/238"]+0.868)*["248/254"]  

EqNum: Integer defining the index-number of the equation to be evaluated, as an input. In 'U-Pb geochronology' mode, the candidate values span -4 to -1 ('Special U-Pb equations'), and then 1 to 50 ('General equations'). Negative indices correspond to specific 'Special U-Pb equations' as follows:

-4 = expression for element concentration (optional, and our initial 2016-17 SQUID replication caters for U only, with no option for Th);
-3 = expression for 232Th/238U (optional, although this is the only way 232Th/238U can be defined in our initial 2016-17 SQUID replication, because we will not be covering dual 206Pb/238U-208Pb/232Th calibration in the initial instance);
-2 = expression for 208Pb/232Th if evaluated directly via calibration (this equation is not applicable in our initial 2016-17 SQUID replication);
-1 = expression for 206Pb/238U (mandatory in our initial 2016-17 SQUID replication).

Positive EqNum values refer to the 'Index' numbers used in the user-defined Task equations. For context, the five specific examples above are re-cited below with their real EqNum values:

ln(["254/238"]) has EqNum = 1  
ln(["206/238"]) has EqNum = 2  
["206/238"]/["254/238"]^2 has EqNum = -1
["238/196"]/["254/238"]^0.66 has EqNum = -4  
(0.03446*["254/238"]+0.868)*["248/254"] has EqNum = -3

MeanEq: The calculated 'mean' y-value calculated (EITHER error-weighted linear regression to analysis midtime (NOT the y-intercept at x = 0!) OR as error-weighted average independent of time), as an output.

EqValFerr: The calculated 1-sigma fractional error of the MeanEq value, as an output.

SqrMinMSWD: The calculated Mean Square of Weighted Deviates associated with the Intercept, as an output.

WLrej: Integer (= 0 at start of each equation and each analysis) incremented each time a valid rejection is identified.

Optional variables

NoRatSht: A Boolean dictating whether the EqnInterp output is to be placed on the 'Within-Spot Ratios' sheet (FALSE) or not (TRUE). Its universal default value is FALSE, and FALSE is also the case of interest in the initial 2016-17 SQUID replication.


Definition of variables

Values of type Boolean
IsPkOrder, Bad, Singlescan

Values of type Integer
i, NumRatPks, InterpEqCt, NumPksInclDupes, NumRej, sIndx, ScanNum, RatNum, TotalNumRatios, Indx, Num1Denom2, PkOrder, UndupedPk

Values of type Double
InterpTime, InterpTimeSpan, PkFractErr, ScanPkCts, TotRatTime, RedPk1Ht, RedPk2Ht, NetPkCps, EqValTmp, EqFerr, MSWD, PkF1, PkF2, PkTdelt, MidTime, Intercept, SigmaIntercept, CovSlopeInter, MeanTime, MeanEqSig, Slope, SigmaSlope, Probfit, FractInterpTime, FractLessInterpTime

Vector of length Nspecies (addressed [1 to Nspecies], with Nspecies defined in Step 2) comprising values of type Double
PkFSig

Arrays of size Nscans x Nspecies (addressed [1 to Nscans, 1 to Nspecies], with Nscans defined in Step 2) comprising values of type Double
ReducedPkHt, ReducedPkHtFerr

Other arrays of size specified below, comprising values of type Double
PkInterp, PkInterpFerr, SigRho, EqVal, EqTime, AbsErr, FractErr


'Entry test'

EqnInterp is intended specifically for the evaluation of 'Excel-formatted' equations that DO invoke at least one 'ratio of interest' as part of their formulation. In the below code-block, EqnRats is a vector array with length equal to the total number of equations defined for the Task, and for each equation, EqnRats contains an integer value corresponding to the total number of 'ratios of interest' invoked in that equation (i.e. all duplicate occurrences of a 'ratio of interest' are counted separately). If the equation is of the 'ordinary Excel' variety (i.e. no 'ratio of interest' invoked), then EqnInterp should not be used. The code also contains a check that Nscans >= 1:

TotalNumRatios = EqnRats[EqNum]  --in our initial example, TotalNumRatios = 1
  
If (TotalNumRatios = 0 Or Nscans < 1)  
  MeanEq = _[SQUID error-value]_  
  Exit Sub  
End If   

EqnInterp

Once a positive value for TotalNumRatios has been established, it is doubled to defined the total number of 'ratio-peaks'. The next step is to define how many double-interpolated values there will be (i.e. 1 when Nscans = 1, and (Nscans - 1) when Nscans > 1):

NumRatPks = 2 * TotalNumRatios    
Indx = Nscans - 1  

If Nscans = 1
  Singlescan = TRUE  
  sIndx = 1
Else  
  Singlescan = FALSE  
  sIndx = Indx
End If

(I know that code-blocks like this are significantly more long-winded than necessary, but I am keen to keep my documentation of the code as explicit and human-readable as possible, particularly with respect to Booleans. Ludwig made extensive use of abbreviated, implicit syntax for Booleans, and the result is that sections of the VBA code are very difficult to understand, particularly in combination with logic tests.)

With sIndx defined, the sizes of some arrays can now be defined accordingly:

ReDim PkInterpFerr[1 To sIndx, 1 To Nspecies], PkInterp[1 To sIndx, 1 To Nspecies]

The next step is to convert SBM peak-heights, calculate uncertainties, and assign 'working' peak-heights ("ReducedPkHt") values to both SBM-normalised data and data not normalised to SBM.

In the below code-block, EqPkOrd is a three-dimensional array with a 'depth' of 2, with one row for each equation defined for the Task, and one column for each occurrence of a 'ratio of interest' used in that equation (i.e. all duplicate occurrences of a 'ratio of interest' are counted separately). The third dimension of the array corresponds to numerators when 1, and denominators when 2. All values within this array are integers, corresponding to the indices of the numerators and denominators (within the list [1 to Nspecies]: see text at the beginning of Step 3 for a reminder):

For RatNum = 1 To TotalNumRatios --in our initial example, TotalNumRatios = 1   
  
  For ScanNum = 1 To Nscans
    
    For Num1Denom2 = 1 To 2   
      PkOrder = EqPkOrd[EqNum, RatNum, Num1Denom2]  
      --in first formula, PkOrder = 7 for denominator 238, and 9 for numerator 254  

      If PkOrder > 0
        NetPkCps = PkNetCps[ScanNum, PkOrder] --note that PkNetCps comes from Step 2!

        If NetPkCps = _[SQUID error-value]_  
          ReducedPkHt[ScanNum, PkOrder] = _[SQUID error-value]_  
          Exit For
        End If
    
        ScanPkCts = NetPkCps * count_time_sec[PkOrder] --counts for this scan
    
        If ScanPkCts <= 0 And ScanPkCts > 16 --verbatim, seems nonsensical
          ReducedPkHt[ScanNum, PkOrder] = _[SQUID error-value]_
          Exit For
        End If
    
        If SBMCps[ScanNum, PkOrder] <= 0 --note that SBMCps comes from Step 2!
          SbmNorm = FALSE  --looks universal, no recourse to reset to TRUE later?
        End If
    
        If SbmNorm = TRUE --Normalise to SBM counts for the on-peak time
          ScanPkCts = ScanPkCts / SBMCps[ScanNum, PkOrder]
        End If
    
        PkFractErr = PkFerr[ScanNum, PkOrder] --note that PkFerr comes from Step 2!
    
        If SbmNorm = TRUE  
          PkFractErr = sqrt( PkFractErr ^ 2 + 
                         1 / SBMCps[ScanNum, PkOrder] / count_time_sec[PkOrder] )  
        End If
    
        ReducedPkHt[ScanNum, PkOrder] = ScanPkCts / count_time_sec[PkOrder]
        ReducedPkHtFerr[ScanNum, PkOrder] = PkFractErr
      End If
    Next Num1Denom2
    
  Next ScanNum

Next RatNum   

The next step is to calculate the time of interpolation (if needed), followed by the interpolated peak-heights:

For ScanNum = 1 To sIndx

  If Singlescan = FALSE  
    InterpTimeSpan = 0

    For RatNum = 1 To TotalNumRatios --calculate mean time of interpolation  

      For Num1Denom2 = 1 To 2
        PkOrder = EqPkOrd[EqNum, RatNum, Num1Denom2]  

        If PkOrder > 0  
          InterpTimeSpan = InterpTimeSpan + time_stamp_sec[ScanNum, PkOrder]
                       + time_stamp_sec[1 + ScanNum, PkOrder]  
        End If  
      Next Num1Denom2
  
    Next RatNum

    InterpTime = InterpTimeSpan / NumRatPks / 2 --time of interpolation
  End If
  
  For RatNum = 1 To TotalNumRatios --calculate the interpolated peak-heights  

    For Num1Denom2 = 1 To 2
      PkOrder = EqPkOrd[EqNum, RatNum, Num1Denom2]
  
      If PkOrder > 0  
  
        If Singlescan = FALSE  
          PkInterp[ScanNum, PkOrder] = _[SQUID error-value]_
          --Set to error-value unless/until overwritten  
          
          PkTdelt = time_stamp_sec[1 + ScanNum, PkOrder] -  
                     time_stamp_sec[ScanNum, PkOrder]  

          If PkTdelt <= 0  
            GoTo [Bailout 1]  
          End If  
 
          FractInterpTime = (InterpTime - time_stamp_sec[ScanNum, PkOrder]) / PkTdelt
          FractLessInterpTime = 1 - FractInterpTime
          RedPk2Ht = ReducedPkHt[1 + ScanNum, PkOrder]
        End If
    
        RedPk1Ht = ReducedPkHt[ScanNum, PkOrder]  

        If RedPk1Ht = _[SQUID error-value]_ Or RedPk2Ht = _[SQUID error-value]_  
          GoTo [Bailout 1]  
        End If  

        PkF1 = ReducedPkHtFerr[ScanNum, PkOrder]  
    
        If Singlescan = TRUE  
          PkInterp[ScanNum, PkOrder] = RedPk1Ht
          PkInterpFerr[ScanNum, PkOrder] = PkF1
        Else
          PkInterp[ScanNum, PkOrder] = (FractLessInterpTime * RedPk1Ht) +  
                                       (FractInterpTime * RedPk2Ht)
          PkF2 = ReducedPkHtFerr[1 + ScanNum, PkOrder]
          PkInterpFerr[ScanNum, PkOrder] = sqrt( (FractLessInterpTime * PkF1) ^ 2 +  
                                                 (FractInterpTime * PkF2) ^ 2 )  
        End If
    
      End If
  
    Next Num1Denom2  

  Next RatNum  

Bailout 1:

Next ScanNum

The next step is to evaluate the equation ('FormulaEval', documented separately), and approximate the uncertainties:

InterpEqCt = 0

For ScanNum = 1 To sIndx
  FormulaEval Formula, EqNum, ScanNum, PkInterp, PkInterpFerr, EqValTmp, EqFerr  
  --to be described and documented separately
  
  If (EqFerr > 0 And EqValTmp <> _[SQUID error-value]_)  --very long 'if'
    InterpEqCt = InterpEqCt + 1  
    --define and redefine the following vectors, preserving their contents:
    ReDim Preserve EqVal[1 To InterpEqCt], FractErr[1 To InterpEqCt],  
    AbsErr[1 To InterpEqCt], EqTime[1 To InterpEqCt] 
           
    EqVal[InterpEqCt] = EqValTmp

    --Ludwig describes the following as 'a kluge to approximate effect of interpolation  
    --combining peaks'. On 2 April 2010, he commented out the following line:  
    --If piNscans > 2 And Not Singlescan Then EqFerr = EqFerr * 1.2  

    AbsErr[InterpEqCt] = Abs(EqFerr * EqValTmp)
    FractErr[InterpEqCt] = EqFerr

    If AbsErr[InterpEqCt] = 0  
      AbsErr[InterpEqCt] = 1E-32
      FractErr[InterpEqCt] = 1E-32
    End If

    NumPksInclDupes = 0  
    TotRatTime = 0

    For UndupedPk = 1 To NoDupePkN[EqNum]
    --NoDupePkN is (I believe) a vector array with length equal to the total  
    --number of equations defined for the Task, and for each equation, NoDupePkN  
    --contains an integer value corresponding to the number of *distinct* species    
    --used in the 'ratios of interest' invoked in the equation.  
  
      PkOrder = EqPkUndupeOrd[EqNum, UndupedPk]  
      --EqPkUndupeOrd is a two-dimensional analogue of EqPkOrd, , with one row  
      --for each equation defined in the Task, and one column for distinct species  
      --used in that equation, containing integer values corresponding to the  
      --indices of those species, in order of measurement. See the FormulaEval  
      --documentation for more information and specific examples.  
      
      IsPkOrder = FALSE  
    
      For RatNum = 1 To TotalNumRatios
    
        For Num1Denom2 = 1 To 2
      
          If EqPkOrd[EqNum, RatNum, Num1Denom2] = PkOrder  
            TotRatTime = TotRatTime + time_stamp_sec[ScanNum, PkOrder]
            NumPksInclDupes = NumPksInclDupes + 1  
           
            If Singlescan = FALSE  
              TotRatTime = TotRatTime + time_stamp_sec[1 + ScanNum, PkOrder]
              NumPksInclDupes = NumPksInclDupes + 1
            End If
          
            IsPkOrder = TRUE  
            Exit For --verbatim, but not sure why?
          End If
      
        Next Num1Denom2
    
        If IsPkOrder = TRUE  
          Exit For  
        End If  
     
      Next RatNum
  
    Next UndupedPk  
  
    EqTime[InterpEqCt] = TotRatTime / NumPksInclDupes
  End If

Bailout 3: --doesn't seem to be used, could probably be deleted
Next ScanNum

The final step is to assemble outputs EqTime, EqVal and AbsErr (which are analogous to the 'Step 3' outputs in the foregoing 'ratio of interest' calculations), to define SigRho (as input for the use of subroutine WtdLinCorr and its sub-subroutines), and to calculate MeanEq and EqValFerr (which are analogous to the 'Step 4' outputs in the foregoing 'ratio of interest' calculations):

If InterpEqCt > 0  
  ReDim Preserve EqTime[1 To InterpEqCt], EqVal[1 To InterpEqCt], AbsErr[1 To InterpEqCt]
  ReDim SigRho[1 To InterpEqCt, 1 To InterpEqCt]

  For i = 1 To InterpEqCt
    SigRho[i, i] = AbsErr[i]

    If i > 1 --Approximate kluge for correlation coefficients
      SigRho[i, i - 1] = 0.25  
      SigRho[i - 1, i] = 0.25  
    End If  
  Next i

  If InterpEqCt = 1  
    MeanEq = EqVal[1]  
    EqValFerr = FractErr[1]  
  
  ElseIf InterpEqCt > 3 And UserLinFits = TRUE    
    WtdLinCorr 2, InterpEqCt, EqVal, SigRho, MSWD, Probfit, NumRej, Intercept,  
               SigmaIntercept, Bad, Slope, SigmaSlope, CovSlopeInter, EqTime
  
    If Bad = TRUE  
      MeanEq = _[SQUID error-value]_  
      EqValFerr = _[SQUID error-value]_  
      Exit Sub
    End If

    MidTime = ( time_stamp_sec[Nscans, Nspecies] + time_stamp_sec[1, 1] ) / 2
    MeanEq = Slope * MidTime + Intercept
    MeanEqSig = sqrt( (MidTime * SigmaSlope) ^ 2 + (SigmaIntercept ^ 2) +  
                     2 * MidTime * CovSlopeInter)

  Else
    WtdLinCorr 1, InterpEqCt, EqVal, SigRho, MSWD, Probfit, NumRej, MeanEq,  
               MeanEqSig, Bad  
           
    If Bad = TRUE  
      MeanEq = _[SQUID error-value]_  
      EqValFerr = _[SQUID error-value]_  
      Exit Sub
    End If

  End If

  If MeanEq = 0  
    EqValFerr = 1
  Else
    EqValFerr = Abs(MeanEqSig / MeanEq)
  End If

End If

--Bodorkos 2017-04-04: next 2 lines to force VBA-Java match  
MeanEq = SigDigStrDbl( MeanEq, 12 ) --12 sig. digits
EqValFerr = SigDigStrDbl( EqValFerr, 12 ) --12 sig. digits  

End Sub

Sanity checks for EqnInterp: There are two associated with the output of EqnInterp: one that is analogous to that previously used for Step 3 ('SQUID_03...') output, and one that is analogous to that previously used for Step 4 ('SQUID_04...') output.

Step 3-style output: The structure here is completely analogous to that previously used for Step 3 ('SQUID_03...') output. The rows are exactly the same, so the new columns could just be added to the right-hand end of SQUID_03... This is what SQUID does.

For each analysis/fraction, there will be InterpEqCt (mostly equivalent to sIndx = Ndod = Nscans - 1) rows, so for the demo XML, there will be 114 analyses x (Ndod = 5) rows per analysis = 570 rows total. Four columns at left: (1) spot name, (2) date/time, (3) 'scan number' (i.e. InterpEqCt), and (4) 'analysis type' should identify whether the analysis has been identified as being of the 206Pb/238U reference zircon (prefix 'T' in the demo XML), or not ('unknown', all other prefixes in the XML).

The new data-columns (15 in total) will constitute three sets of values for each of the 5 'example equations'. Each set of values is a vector of length InterpEqCt, as follows:

  1. EqTime (label "Time")
  2. EqVal (custom labels to begin with: "U Conc Const" when EqNum = -4; "232/238" when EqNum = -3; "206/238 Calib Const" when EqNum = -1; "Ln254/238" when EqNum = 1; Ln206/238 when EqNum = 2)
  3. AbsErr (label "+/- 1sig absolute")

Finally, we'll need some exotic sorting, to facilitate comparison with the SQUID-book:

  1. 'analysis type' alphabetical ascending (or whatever is needed to ensure that all 'ref materials' precede all 'unknowns')
  2. 'date/time' ascending (so within the 'analysis type' subdivision, analyses appear in chronological order, as per Step 1 sanity-check)
  3. 'scan (InterpEqCt) number' ascending (so within each analysis, interpolation-rows appear in chronological order)

When the rows are sorted, the contents of the 15 columns will be closely comparable to the 15 columns (starting at AF) in worksheet 'Within-Spot Ratios' in SQUID-book 100142_G6147_original_frozen.xls.

Sanity check: Mean ratio and percentage error, per 'ratio of interest', per analysis
The structure here is completely analogous to that previously used for Step 4 ('SQUID_04...') output. The rows are exactly the same, so the new columns could just be added to the right-hand end of SQUID_04... This is what SQUID does.

This array will have one row per analysis, and two columns per 'ratio of interest' (mean value and percentage error). For the demo XML, the array will have 114 rows of data (one per analysis). Three 'left-hand' columns to allow the rows to be identified and sorted:

  • Title = analysis-specific text-string read from XML
  • Date = analysis-specific date read from XML, to be expressed as YYYY-MM-DD HH24:MI:SS
  • Type = 'ref mat' or 'unknown'; analyses with prefix 'T.' to be labelled 'ref mat', all others 'unknown'

The new data-columns (10 in total) will constitute two values for each of the 5 'example equations', as follows:

  1. MeanEq (custom labels to begin with: "U Conc Const" when EqNum = -4; "232/238" when EqNum = -3; "206/238 Calib Const" when EqNum = -1; "Ln254/238" when EqNum = 1; Ln206/238 when EqNum = 2)
  2. EqValFerr (label "1sigma %err"), as a percentage, with a maximum (display) value of 9999.

Sorting: Primary criterion = Type (ascending; 'ref mat' before 'unknown', so alphabetical would do), secondary criterion = Date (ascending).


Clone this wiki locally