Skip to content

Commit f63f810

Browse files
committed
stats fail
1 parent ea01aa8 commit f63f810

10 files changed

+5100
-36
lines changed

cdms.html

+544
Large diffs are not rendered by default.

ctt.html

+493
Large diffs are not rendered by default.

docs/estimation.html

+28-24
Original file line numberDiff line numberDiff line change
@@ -446,60 +446,63 @@ <h3>Convergence</h3>
446446
<p>(or use <span class="math inline">\(=-KL(P(Z|X,\theta^t)||P(Z|X,\theta^{t+1})\leq 0)\)</span></p>
447447
</div>
448448
</div>
449-
<div id="fisher-information" class="section level2">
450-
<h2>Fisher Information</h2>
449+
<div id="computerized-adaptive-testingcat" class="section level2">
450+
<h2>Computerized Adaptive Testing(CAT)</h2>
451+
<p>This section introduces the CAT method.</p>
452+
<div id="fisher-information" class="section level3">
453+
<h3>Fisher Information</h3>
451454
<p>The Fisher Information (MFI) method was introduced by Lord (Lord, 1980; Thissen &amp; Mislevy, 2000) and it was the most widespread ISS in the early days of CAT.</p>
452455
<p>Fisher information is a measurement of the amount of information about the unknown capacity <span class="math inline">\(\theta\)</span> generated by the response pattern(Davier et al., 2019).</p>
453-
<div id="definition-1" class="section level3">
454-
<h3>Definition</h3>
456+
<div id="definition-1" class="section level4">
457+
<h4>Definition</h4>
455458
<p>According to Davier et al. (2019), firstly, We give the definition of the first derivative of log likelihood function as Score function:</p>
456459
<p><span class="math display">\[S(X;\theta)=\sum_{i=1}^n\frac{dlogf(X_i;\theta)}{d\theta}\]</span></p>
457460
<p>where <span class="math inline">\(f(X_i;\theta)\)</span> refers to the likelihood function, θ is the underlying latent trait, and x represents the observed response pattern.</p>
458461
<p>Fisher information is second moment of this Score function:</p>
459462
<p><span class="math display">\[I(\theta)=E[S(X;\theta^2)]\]</span> where <span class="math inline">\(I(\theta)\)</span> is fisher information.</p>
460463
</div>
461-
<div id="mathematical-meanings" class="section level3">
462-
<h3>Mathematical meanings</h3>
464+
<div id="mathematical-meanings" class="section level4">
465+
<h4>Mathematical meanings</h4>
463466
<p>According to Davier et al. (2019), ①It can estimate the variance of the MLE equation: As <span class="math inline">\(E[S(X;\theta)=0\)</span>,we can get that <span class="math display">\[I(\theta)=E[S(X;\theta^2)]-E[S(X;\theta)]=Var[S(X|\theta)]\]</span> ②It is the expectation of the negative second order derivative of log likelihood at the true value of the parameter</p>
464467
<p><span class="math display">\[I(\theta)=-E[l&#39;&#39;(x|\theta)]=-\int[\frac{d^2logf(x|\theta)f(x|\theta)}{d\theta^2}]\]</span> <span class="math display">\[I(\theta)=-E\lbrace[\frac {\partial^2 ln f(x;\theta)}{\partial^2 \theta} \rbrace\]</span></p>
465468
<p>③Fisher Information reflects the accuracy of our parameter estimates; the larger it is, the more accurate the parameter estimate, i.e. the more information it represents.</p>
466469
</div>
467-
<div id="application" class="section level3">
468-
<h3>Application</h3>
470+
<div id="application" class="section level4">
471+
<h4>Application</h4>
469472
<p>The item k’s Fisher information is given by <span class="math inline">\(I_k(\theta)=\frac {[P_k&#39;(\theta)]^2}{P_k(\theta)Q_k(\theta)}\)</span> according to Davier et al. (2019), where <span class="math inline">\(P_k(\theta)\)</span> is the item response function for item k which is specified by the selected IRT model, and <span class="math inline">\(Q_k(θ) = 1 − P_k(θ)\)</span>, and <span class="math inline">\(P_k&#39; (θ)\)</span> refers to the first derivative of the item response function in relation to <span class="math inline">\(\theta\)</span>.</p>
470473
<p>Assuming local independence the test information I(θ) is additive in item information, that means <span class="math inline">\(I(\theta)=\Sigma I_k(\theta)\)</span>.</p>
471474
<p>For the three-parameter logistic (3PL) model, <span class="math inline">\(P_j(θ)\)</span> is given by <span class="math display">\[P_k(\theta)=c_k+(1-c_k)\frac{e^{a_k(\theta-b_k)}}{1+e^{a_k(\theta-b_k)}}\]</span></p>
472475
<p>where <span class="math inline">\(a_k\)</span>, <span class="math inline">\(b_k\)</span> and <span class="math inline">\(c_k\)</span> respectively refer to the discrimination, hardness, and guessing parameter for the kth item.</p>
473476
<p>If the MFI method is applied to item selection, under the current estimate of <span class="math inline">\(\theta\)</span> , an eligible item in the bank with the largest Fisher information will be selected as the next item to be managed.</p>
474477
<p>Since the asymptotic variance of <span class="math inline">\(\theta ^{ML}\)</span>,i.e. the maximum likelihood estimate of <span class="math inline">\(\theta\)</span>, is in inverse proportion to the test information, the MFI method is widely considered to be a method to minimize the asymptotic variance of the θ estimate, that is, to asymptotically maximize the measurement precision.</p>
475478
</div>
476-
<div id="drawbacks" class="section level3">
477-
<h3>Drawbacks</h3>
479+
<div id="drawbacks" class="section level4">
480+
<h4>Drawbacks</h4>
478481
<p>Firstly, Fisher information does not naturally apply to cognitive diagnosis as it is by definition on a continuous variable.In the early phases of CAT, capacity estimation may not yet be accurate. Maximizing information on the basis of an inaccurate and erratic estimate of <span class="math inline">\(\theta\)</span> can be described as “capitalization on chance”(van der Linden &amp; Glas, 2000). Thus, using the MFI in the early stages of a CAT program may not be ideal.</p>
479482
<p>Secondly, the MFI prefers to pick items with large distinguishing parameters, but uses few items with smaller discrimination parameters. This means that some of the items in the item pool may be underutilized. At the same time,, the excessive exposure of a small number of items with a high degree of distinction may be a critical threat to the security of the test(Chang, 2015; Chang &amp; Ying, 1999).</p>
480483
<p>In addition, the number of items from various content areas or sub-areas often need to be balanced in order to keep the CAT surface and content valid (Cheng, Chang, &amp; Yi, 2007; Yi &amp; Chang, 2003).</p>
481484
</div>
482-
<div id="improvement" class="section level3">
483-
<h3>Improvement</h3>
485+
<div id="improvement" class="section level4">
486+
<h4>Improvement</h4>
484487
<p>The global information method was put forward by Chang and Ying (1996), which use KL distance or information rather than Fisher information in item selection. They demonstrated that global information is more robust for addressing the problem of instability in capacity estimation in the early stage of CAT.</p>
485488
</div>
486489
</div>
487-
<div id="kl-algorithm" class="section level2">
488-
<h2>KL Algorithm</h2>
490+
<div id="kl-algorithm" class="section level3">
491+
<h3>KL Algorithm</h3>
489492
<p>Chang &amp; Ying (1996) proposed the global information method which utilized the KL distance or information instead of Fisher information in item selection. Being more robust, global information could be used to combat the instability of ability estimation in the early stage of CAT.</p>
490493
<p>Fisher information is defined on a continuous variable, if involves discrete, KL Algorithm is preferred.</p>
491494
<p>The Kullback Leibler distance (KL-distance) is defined as a natural distance function from a “true” probability distribution, p, to a “target” probability distribution, q. It can be interpreted as the expected extra message-length per datum due to using a code based on the wrong (target) distribution compared to using a code based on the true distribution.</p>
492-
<div id="definition-2" class="section level3">
493-
<h3>Definition</h3>
495+
<div id="definition-2" class="section level4">
496+
<h4>Definition</h4>
494497
<p>For discrete (not necessarily finite) probability distributions, p={p1, …, pn} and q={q1, …, qn}, the KL-distance is defined to be</p>
495498
<p><span class="math display">\[D_{KL}(P||Q)=\sum_i P(i)ln(\frac {P(i)}{Q(i)})\]</span></p>
496499
<p>For continuous probability densities, <span class="math display">\[D_{KL}(P||Q)=\int_{-\infty}^{\infty} P(x)ln(\frac {P(x)}{Q(x)})\]</span></p>
497500
<p>Xu et al.’s (2005) KL Algorithm:</p>
498501
<p>According to Cover &amp; Thomas(1991), KL information is a measure of “distance” between two probability distributions, which can be defined as: <span class="math display">\[d[f,g]=E_f[log \frac{f(x)}{g(x)}]\]</span> where f(x) and g(x) are two probability distributions.</p>
499502
<p>However, because the unsymmetrical of d[f, g] and d[g, f], KL information is not a real distance measure. KL distance is still introduced due to the meaning of it, the larger d[f, g] is corresponding to the easier it is to single out between the two probability distributions f(x) and g(x) statistically (Henson &amp; Douglas, 2005).</p>
500503
</div>
501-
<div id="the-kl-algorithm-based-on-kullbackleibler-information-cheng-2009" class="section level3">
502-
<h3>The KL Algorithm Based on Kullback–Leibler Information (Cheng, 2009)</h3>
504+
<div id="the-kl-algorithm-based-on-kullbackleibler-information-cheng-2009" class="section level4">
505+
<h4>The KL Algorithm Based on Kullback–Leibler Information (Cheng, 2009)</h4>
503506
<p>Suppose t items are selected, and the available items in the pool form a set R(t) at this stage. Consider item h in <span class="math inline">\(R^{(t)}\)</span>. In cognitive diagnosis, conditional distribution of person i’s item responses <span class="math inline">\(U_{ih}\)</span> given his or her latent state, or cognitive profile, <span class="math inline">\(\alpha_i\)</span> are what interested. According to the notation of McGlohen and Chang (2008), <span class="math inline">\(\alpha_{i}=(\alpha_{i1},\alpha_{i2},...,\alpha_{ik},...,\alpha_{iK})&#39;\)</span>.</p>
504507
<p>Here <span class="math inline">\(\alpha_{ik}\)</span> = 0 indicates that the ith examinee not masters the kth attribute and <span class="math inline">\(\alpha_{ik}\)</span> = 1 otherwise. An attribute is a task, cognitive process, or skill involved in answering an item.</p>
505508
<p>Due to the unknown true state, a global measure of discrimination can be constructed on the basis of the KL distance between the distribution of <span class="math inline">\(U_{ih}\)</span> given the current estimate of person i’s latent cognitive state (i.e., <span class="math inline">\(f(U_{ih}|\hat \alpha_i^{(t)}\)</span>)) and the distribution of <span class="math inline">\(U_{ih}\)</span> given other states.</p>
@@ -508,22 +511,23 @@ <h3>The KL Algorithm Based on Kullback–Leibler Information (Cheng, 2009)</h3>
508511
<p>Xu et al. (2003) stated using the straight sum of the KL distances between <span class="math inline">\(f(U_{ih}|\hat \alpha_i^{(t)}\)</span>)) and all the <span class="math inline">\(f(U_{ih}|\alpha_c\)</span>)), c = 1, 2,…, <span class="math inline">\(2^K\)</span> (when there are K attributes, there are <span class="math inline">\(2^K\)</span> possible latent cognitive states): <span class="math display">\[KL_h(\hat \alpha_i^{(t)})=\sum _{c=1}^{2^K}D_h(\hat \alpha_i^{(t)}||\alpha_c)\]</span></p>
509512
<p>Then the (t + 1)th item for the ith examinee is the item in R(t) that maximizes <span class="math inline">\(KL_h(\hat \alpha_i^{(t)})\)</span>. This is referred to as the KL algorithm. The items selected using this algorithm are the most powerful ones on average in distinguishing the current latent class estimate from all other possible latent classes.</p>
510513
</div>
511-
<div id="use-of-kl-distance" class="section level3">
512-
<h3>Use of KL Distance</h3>
514+
<div id="use-of-kl-distance" class="section level4">
515+
<h4>Use of KL Distance</h4>
513516
<p>It is helpful to choose the optimal parameter. For instance, if p(x) is unknown, a <span class="math inline">\(q(x|\theta)\)</span> can be constructed to estimate p(x). In order to know <span class="math inline">\(\theta\)</span>, select N samples from p(x) and construct such function:</p>
514517
<p><span class="math display">\[D_{KL}(p||q)=\sum_{i=1}^Np(x_i)(logp(x_i)-log(q(x_i|\theta))\]</span></p>
515518
<p>Then use MLE to estimate <span class="math inline">\(\theta\)</span></p>
516519
</div>
517520
</div>
518-
<div id="shannon-entropy" class="section level2">
519-
<h2>Shannon entropy</h2>
521+
<div id="shannon-entropy" class="section level3">
522+
<h3>Shannon entropy</h3>
520523
<p>It is necessary to know the uncertainty of a random variable and Shannon entropy is a good candidate to measure the uncertainty. Cheng(2009) listed an example about the Shannon entropy: a fair coin has entropy of one unit while an unfair coin has lower entropy because there is less uncertainty when guessing the outcome of one unfair coin.</p>
521-
<div id="definition-3" class="section level3">
522-
<h3>Definition</h3>
524+
<div id="definition-3" class="section level4">
525+
<h4>Definition</h4>
523526
<p>For a discrete random variable X which takes value among <span class="math inline">\(x_1,x_2,...x_n\)</span>, the Shannon entropy is defined as:</p>
524527
<p><span class="math display">\[H(X)=-\sum_{i=1}^np(x_i)log_b(x_i)\]</span></p>
525528
<p>In the definition, <span class="math inline">\(p(x_i)\)</span> is the probability when X = <span class="math inline">\(x_i\)</span>. H(X) can also be written as H(P) or H(<span class="math inline">\(p_1,p_2,...p_n\)</span>). Owing to the formula, we can conclude that independent uncertainties are additive. b is the base of logarithm, which takes value among 2,e and 10. The differences among choices of b are the unit of entropy. For b=2, unit is bit; for b = e, unit is nat; for b = 10, unit is dit or digit.</p>
526529
</div>
530+
</div>
527531
<div id="properties" class="section level3">
528532
<h3>Properties</h3>
529533
<p>The choice of b does not influence the properties of Shannon entropy, so we do not need care the value of b in this part.</p>

0 commit comments

Comments
 (0)