October 19, 2004 --- Class 15 --- Biased Estimators and the Jackknife Method I
Activities:
Biased Esimators Using Mathematica
We again trained Mathematica to do multiple integrals analytically
for the problem we are considering. Without
waiting too long, we were able to get up to a sample size of 6;
however, this result still shows significant bias. In
~sg/oct19_2004.nb you will find a mathematica notebook showing how to
time the command for doing a four-dimensional integral and the analytic
and numerical results.
We then worked though the latter part of section H of the Mathematica
notes. This involves reformulating the denominator in our
integrand as an integral over a variable that goes from 0 to
infinity. When that is done, we can do all the integrals over the
x variables and are left with a 1-dimensional integral that can
be done numerically. (Earlier versions of Mathematica required that
we cut the integral off near 0, although the integral is well defined.
Fortunately, this no longer seems to be necessary.)
Using this approach, we can easily do our integral with a sample
size of up to 50. Plotting the result versus 1/n, we find a linear
approach for large samples sizes.
f[lam_, n_] := n( (1 - E^-lam)/lam)^n
intlist = Table[ NIntegrate[f[x, n], {x, 0, Infinity}], {n, 2, 50}]
{2.77259, 2.35462, 2.22918, 2.16965, 2.13485, 2.11198, 2.09578, 2.0837,
2.07433, 2.06685, 2.06074, 2.05566, 2.05137, 2.04769, 2.0445, 2.04171,
2.03925, 2.03707, 2.03511, 2.03336, 2.03177, 2.03032, 2.029, 2.02779,
2.02668, 2.02565, 2.0247, 2.02382, 2.023, 2.02223, 2.02151, 2.02084,
2.02021, 2.01961, 2.01905, 2.01852, 2.01802, 2.01755, 2.0171, 2.01667,
2.01626, 2.01588, 2.01551, 2.01515, 2.01482, 2.01449, 2.01419,
2.01389, 2.01361}
A Monte Carlo Approach --- Introduction to the Jackknife Method
Another approach is to select samples of the desired size from the
distribution under study. This can easily be done in either C or
Mathematica.
We considered the jackknife approach to statistical analysis and
how it can be used to reduce sample size bias. In the next class,
we will apply this method.
Assuming that error goes like 1/n, where n is the sample size,
S_n = A +e/n
where A is the limit of n -> infinity and e determines the size of
the error. Considering samples sizes n and n-1, it is easy to
solve for A.
A = n S_n - (n-1) S_{n-1}
If we have results for the two different sample sizes, we can determine
A. If we are using a Monte Carlo approach, when we generate our sample
of size n, it is easy to generate samples of size n-1, by throwing
out one value from the sample of size n.
S'_{n-1},i represents our statistic calculated on a sample of size
n-1 with the ith element removed. Instead of using just one of
these values we take the average of all of the samples that result
from leaving out a single element. So the jackknife formula becomes:
A = n S_n - (n-1)/n Sum_{i=1}^{n} S'_{n-1},i
We looked at the C code ~sg/jackknife/jackknife.c that implements this
idea.