February 23, 2017 --- Class 14 --- Energy Conservation Tests,
Jackknife Bias Reduction, Hist, Blockerr, Jackknife Error,
Introduction to Chaos and the Logistic Map
Activities:
Energy Conservation Tests
I placed several axis files in ~sg/energy_conservation_test. Algorithms
are Euler, Euler-Cromer, Euler-Richardson and Runge-Kutta 4. We
saw that E-R and RK4 are higher order algorithms since as we reduce
the step size by and order of magnitude the improvement in energy
conservation is several orders of magnitude. We also saw that we
can't do better than double precision accuracy.
Jackknife Method Via C Programming
A program ~sg/jackknife/jackknife.c implements the jackknife method
of bias reduction. The jackknife program can take up to two
arguments. The first is the sample size, the second, optional
argument, is the number of samples to create. On each line of
output is the sample estimate of 1/(sum of x_i), and the jackknife
value (which is corrected for bias). We examined the code an how
it implements the formulae displayed on the board.
We want to test the idea of bias reduction by doing a numerical
experiment. In the last class, we discussed a program that uses
pseudorandom numbers to simulate the process of measuring the
velocities unformly distributed between zero and one.
For sample size five, I ran the program to repeat the experiment a
few times and we saw that the number varied, I then ran
jackknife 5 100
to produce 100 instances of the experiment with sample size 5. The
class spent some time discussing how we would look at the data.
We know that for infinite sample size, the correct answer is 2, but
that for finite sample size the answer is not even 2 on average. We
also want to know if the bias reduction really works.
[sw246-00:~/jackknife] sg% jackknife 5 10
3.000586 2.913277
2.216128 2.177672
1.885894 1.680463
1.866940 1.760818
1.968507 1.794492
1.777420 1.644192
1.891794 1.675787
1.708738 1.600792
1.493175 1.391724
2.159595 1.920740
Here are ways to analyze the data:
1) Create a scatter plot and see if the 2nd value is closer to 2 on
average.
2) Produce a histogram of the first column and histogram of the 2nd
column. Compare them.
3) Compute the average value of the first column and the average
value of the second column. Compare them.
A scatter plot is very easy with axis, you just need the argument -m 0 .
It was also easy to add a diagonal line and to see that most points
fell below the diagonal, i.e., the first number is usually larger than
the second. Since we know from the integral approach that on average
sample size 5 gives 2.1696, we know the unbiased value is too big.
The fact that the bias reduced values are smaller is encouraging.
I love to make histograms and I have a great program to do that which
you are welcome to copy and use. The program is called hist. Here are
some comments that explain how to use it.
Hist for Making Histograms
hist is a program for making histograms and may be found in
~sg//src/misc
for the source (hist.c) and the executable is in ~sg/bin. The options
for hist are described in the comments at the top of hist.c:
/* Make histogram of a list of data *
* option -n specifies number of bins *
* option -s specifies size of bin *
* option -x specifies lower and upper limits of histogram *
* option -g specifies output in form suitable for "graph" *
* or "axis" *
* if -g 1 we get a full bar for each bin *
* if -g 2 the bars don't go down to the origin *
* option -m specifies the line type for use with "axis" *
hist takes it input from the stardard input. It will also take
10 bins by default. So, if you have file of number called file,
hist samples_$i
echo -n Naive $i " " >>! results
awk '{print $1}' samples_$i | blockerr 1 |awk '{print $8, $10}' >>results
echo -n Jackknife $i " " >>! results
awk '{print $2}' samples_$i | blockerr 1 |awk '{print $8, $10}' >>results
end
This is another example of how we can figure out a useful procedure
interactively and then we find that being able to automate it for
different cases is a great advantage.
The results file can easily be displayed with the command:
awk '{print $2,$3,$4}' results |axis e |xplot
The upper set of values is from the naive estimate, and the lower
is the jackknife corrected values. One immediately sees that the
jackknife values much more quickly approach 2, the asymptotic
value.
The file ~sg/jackknife/jackknife.ax on Swain cluster, was
prepared before class, and includes both the above results
and the analytic results derived in the last
class. The diamonds are from doing the integral numerically with
Mathematica. One can see excellent agreement between the
statistical approach here and the analytic results.
Jackknife Error
In addition to using the jackknife method for bias reduction, you
can use it to calculate the sample error. The sample error is
related to variance of the jackknife samples. Because these
samples are so correlated (only one variable changes in each
sample), the variance of the jackknife values must be multiplied by
(n-1):
(\sigma_J)^2 = (n-1)/n \sum (S'_i - S)^2
where S'_i is the jackknife statistic with the i'th measurement
removed and S is the statistic on the entire set of n measurements.
Notice that in the simple case that the statistic is the mean value
of the sample, the naive variance would be given by:
\sigma^2 = 1/[n(n-1)] \sum (x_i - x_avg)^2
In this case, it is easy to show S'_i - S = (x_avg - x_i)/(n-1).
Here is a deriviation:
(n-1) S'_i + x_i = sum of all x_i = n(S) = n x_avg = (n-1)S + x_avg
since S and x_avg are the same in this example. Bring the terms
involving S to the left, the terms involving x to the right and
divide by (n-1) to complete the derivation.
You should look at the distribution of values and distibution
of the jackknife samples. The dispersion of the latter is very small.
This verifies why we need the factor n in the numerator for the
jackknife error.
Use Mathematica to create a sample of size 20 and to compare the
histogram and variance of the values in the sample and the jackknife
samples. You should find that the jackknife samples hav3 a variance
exactly (19)^2 times smaller than that of the original sample.
All of this looks much nicer in the introductory mathematica handout.
Chaos and the Logistic Map
We introduced the logistic map from
consideration of an insect population model.
P_{n+1} = P_n (A - B P_n)
Upon scaling the population, the population is
replaced by x which lies in the range [0,1]. The next generation
is given by
x <- 4 r x * (1-x) .
A program ~sg/map.c allows you to follow the time history of a
population. Map takes three command line arguments: r, the
parameter that appears above in the definition of the map; x_0, the
initial value of the population; and iterations, the total number
of generations to calculate. The executable is ~sg/bin/map.
We saw what happens after a long time if we use r=0.1.
The population dies out if we start with x=0.1
map 0.1 0.1 100
A shell script was written to see what happens with different values
of the starting value x_0. Here is a sample script:
#!/bin/csh
set r = 0.1
foreach x (0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9)
map $r $x 100 >>! temp
end
When r is sufficiently small, no matter what the starting
population, after a long time the population approaches 0.
With r a little greater than 0.25, after many generations the
population approaches a nonzero constant for any starting value
other than 0 or 1.
With larger r values, we see that the final value is increasing.
With r=0.78, we find something very curious. The
population oscillates in time between two different values
5.475744e-01 and 7.729384e-01. You should verify that this does
not depend on the starting value (as long as it is not 0 or 1).
This result is called having an attractor of period 2.
Now, we are getting the idea that there can be different long time
behaviors. For small r, we go to zero; for larger r, we approach a
non-zero constant; for yet larger r, we oscillate between two
values.
In our next class, we will increase r and find more curious behavior.