October 21, 2004 --- Class 16 --- Hist, Application of the Jackknife Method
Activities:
Biased Estimators and the Jackknife method
We reviewed the idea of the jackknife method and then we started
working with it. A program ~sg/jackknife.c implements the method
for our problem. The jackknife program can take up to two
arguments. The first is the sample size, the second, optional
argument, is the number of samples to create. On each line of
output is the sample estimate of 1/(sum of x_i), and the jackknife
value (which is corrected for bias).
To analyze this method, we pick a bunch of samples sizes to study
and generate a large number of samples for each size.
To see the difference between the naive estimate and the bias
corrected values, two programs hist and blockerr are useful.
(Another simple technique is to use axis to create a scatterplot
using the m 0 option so the points are not connected. A diagonal line
can easily be added to the plot. We see that the points in the
scatterplot are mostly on one side of the diagonal. The jackknife
values tend to be smaller than the naive value. This is most dramatic
for small sample sizes.)
hist is a program for making histograms and may be found in ~sg/src
for the source (hist.c) and the executable is in bin. The options
for hist are described in the comments at the top of hist.c:
/* Make histogram of a list of data *
* option -n specifies number of bins *
* option -s specifies size of bin *
* option -x specifies lower and upper limits of histogram *
* option -g specifies output in form suitable for "graph" *
* or "axis" *
* if -g 1 we get a full bar for each bin *
* if -g 2 the bars don't go down to the origin *
* option -m specifies the line type for use with "axis" *
hist takes it input from the stardard input. It will also take
10 bins by default. So, if you have file of number called file,
hist samples_$i
echo -n Naive $i " " >>! results
awk '{print $1}' samples_$i | blockerr 1 |awk '{print $8, $10}' >>results
echo -n Jackknife $i " " >>! results
awk '{print $2}' samples_$i | blockerr 1 |awk '{print $8, $10}' >>results
end
(Students spent time creating and running such a shell script for
themselves. With sufficient practice, this should only take a few
minutes. If it takes you considerably longer, you need more practice
writing shell scripts.)
The results file can easily be displayed with the command:
awk '{print $2,$3,$4}' results |axis e |xplot
The upper set of values is from the naive estimate, and the lower
is the jackknife corrected values. One immediately sees that the
jackknife values much more quickly approach 2, the asymptotic
value.
The file ~/jackknife/jackknife.ax, prepared before class, includes
both the above results and the analytic results derived in the last
class. The crosses are from doing the integral numerically with
Mathematica. One can see excellent agreement between the
statistical approach here and the analytic results.
Other approaches to removing the sample bias include using the
basic idea behind the jackknife bias reduction of the Mathematica
generated results, and applying a "second order" jacknife technique
on the jackknifed results to remove the 1/n^2 errors.
(Assuming there are error terms e/n + e'/n^2, derive a formula
involving S_n, S_(n-1) and S_(n-2) that removes both error terms to
get a second order bias reduced result.)
Applying the jackknife to the Mathematica results works
quite nicely as you can see by the following command:
cat jackknife.ax j.math.ax |axis |xplot
The lower crosses are the results from Mathematica. The commands
for doing the analysis can be found in ~sg/jackknife/bias2.math