October 19, 2004 --- Class 15 --- Biased Estimators and the Jackknife Method I Activities: Biased Esimators Using Mathematica We again trained Mathematica to do multiple integrals analytically for the problem we are considering. Without waiting too long, we were able to get up to a sample size of 6; however, this result still shows significant bias. In ~sg/oct19_2004.nb you will find a mathematica notebook showing how to time the command for doing a four-dimensional integral and the analytic and numerical results. We then worked though the latter part of section H of the Mathematica notes. This involves reformulating the denominator in our integrand as an integral over a variable that goes from 0 to infinity. When that is done, we can do all the integrals over the x variables and are left with a 1-dimensional integral that can be done numerically. (Earlier versions of Mathematica required that we cut the integral off near 0, although the integral is well defined. Fortunately, this no longer seems to be necessary.) Using this approach, we can easily do our integral with a sample size of up to 50. Plotting the result versus 1/n, we find a linear approach for large samples sizes. f[lam_, n_] := n( (1 - E^-lam)/lam)^n intlist = Table[ NIntegrate[f[x, n], {x, 0, Infinity}], {n, 2, 50}] {2.77259, 2.35462, 2.22918, 2.16965, 2.13485, 2.11198, 2.09578, 2.0837, 2.07433, 2.06685, 2.06074, 2.05566, 2.05137, 2.04769, 2.0445, 2.04171, 2.03925, 2.03707, 2.03511, 2.03336, 2.03177, 2.03032, 2.029, 2.02779, 2.02668, 2.02565, 2.0247, 2.02382, 2.023, 2.02223, 2.02151, 2.02084, 2.02021, 2.01961, 2.01905, 2.01852, 2.01802, 2.01755, 2.0171, 2.01667, 2.01626, 2.01588, 2.01551, 2.01515, 2.01482, 2.01449, 2.01419, 2.01389, 2.01361} A Monte Carlo Approach --- Introduction to the Jackknife Method Another approach is to select samples of the desired size from the distribution under study. This can easily be done in either C or Mathematica. We considered the jackknife approach to statistical analysis and how it can be used to reduce sample size bias. In the next class, we will apply this method. Assuming that error goes like 1/n, where n is the sample size, S_n = A +e/n where A is the limit of n -> infinity and e determines the size of the error. Considering samples sizes n and n-1, it is easy to solve for A. A = n S_n - (n-1) S_{n-1} If we have results for the two different sample sizes, we can determine A. If we are using a Monte Carlo approach, when we generate our sample of size n, it is easy to generate samples of size n-1, by throwing out one value from the sample of size n. S'_{n-1},i represents our statistic calculated on a sample of size n-1 with the ith element removed. Instead of using just one of these values we take the average of all of the samples that result from leaving out a single element. So the jackknife formula becomes: A = n S_n - (n-1)/n Sum_{i=1}^{n} S'_{n-1},i We looked at the C code ~sg/jackknife/jackknife.c that implements this idea.