November 28, 2006---Class 24--- Shell Scripts for HW, Monte Carlo Error
Analysis (continued), Multiprecision Library
Activities:
Shell scripts for homework
--------------------------
In ~sg/metropolis you will find some shell and awk scripts that you
can use to go from plotting the autocorrelation as a function of
the lag for fixed delta in the gaussian random walk, to plotting
for fixed lag, the autocorrelation as a function of delta. This is
useful for finding the optimal delta as you are requested to do for
the homework. The way these scripts work was discussed in class.
We ran makeauto.csh and then made a graph from the
autocorrelatons file using aug_vs_delta.csh.
Monte Carlo Error Analysis (continued)
-----------------------------------------
In the last class, we were trying to show that there is a big
difference between correlated and uncorrelated sampling when
doing Monte Carlo integration. We did this by comparing what
happens when we block data before averaging it. We expected to
find that for the correlated data the error would grow with
blocksize, but that with the uncorrelated data the error would
not depend on the blocksize. Unfortunately, with a large input
file, the error was not correctly calculated because of round
off error.
We looked at two solutions to this problem.
I ran details for 100,000 steps, but then instead of averaging
the whole sample, I used split to break the file up into blocks
of 10,000 samples.
Here is a shell script out analyze the split files:
#!/bin/csh
echo "rwalk"
foreach dat (x*)
foreach bl (1 2 4 8 16 32 64 128)
awk '{print $1}' $dat |blockerr $bl
end
end
echo "normal"
foreach dat (x*)
foreach bl (1 2 4 8 16 32 64 128)
awk '{print $2}' $dat |blockerr $bl
end
end
The output consists of many lines that look like this:
rwalk
10000 items, block size = 1 average= 3.861012e-01 +/- 3.172200e-03
10000 items, block size = 2 average= 3.861012e-01 +/- 4.476223e-03
10000 items, block size = 4 average= 3.861012e-01 +/- 6.309403e-03
10000 items, block size = 8 average= 3.861012e-01 +/- 8.872789e-03
10000 items, block size = 16 average= 3.861012e-01 +/- 1.240090e-02
10000 items, block size = 32 average= 3.861012e-01 +/- 1.711713e-02
10000 items, block size = 64 average= 3.861012e-01 +/- 2.309630e-02
10000 items, block size = 128 average= 3.861012e-01 +/- 3.018832e-02
repeated 9 times for the other files with 10,000 lines
normal
10000 items, block size = 1 average= 5.009796e-01 +/- 3.535679e-03
10000 items, block size = 2 average= 5.009796e-01 +/- 3.513706e-03
10000 items, block size = 4 average= 5.009796e-01 +/- 3.530357e-03
10000 items, block size = 8 average= 5.009796e-01 +/- 3.560046e-03
10000 items, block size = 16 average= 5.009796e-01 +/- 3.674959e-03
10000 items, block size = 32 average= 5.009796e-01 +/- 3.711299e-03
10000 items, block size = 64 average= 5.009796e-01 +/- 3.786352e-03
10000 items, block size = 128 average= 5.009796e-01 +/- 4.015603e-03
repeated 9 times for the other files with 10,000 lines
For the rwalk (correlated) case, the error changes by an order
of magnitude when the block size is increased from 1 to 128.
There is no evidence yet that the error has saturated. For the
normal (uncorrelated) case, the error hardly changes at all.
Also note that the average value varies greatly for the rwalk
case: 3.861012e-01 5.300028e-01 5.591833e-01 5.538783e-01
4.552062e-01 3.288843e-01 5.996343e-01 4.868478e-01
5.384931e-01 4.757377e-01
For the uncorrelated case, we have: 5.009796e-01
5.088782e-01 4.979054e-01 4.992321e-01 4.985086e-01 4.917234e-01
5.026108e-01 4.995261e-01 5.046086e-01 4.991509e-01
In summary, for the uncorrelated case, the error is independent
of the block size and the variation of the mean values is
consistent with the reported size of the error. For the
correlated case, the error is highly dependent on the block size
and the variation of the mean values is much larger than the
error as reported with a block size of 128. We need to consider
block sizes even greater than 128.
A second approach is based on the gnu multiprecision arithmetic
library. It allows one to keep more bits of precision that the
computer hardware normally handles.
Here is a simple test program that uses the gmp library:
#include
#include
main () {
mpf_t x;
mpf_set_default_prec( (unsigned long int) 96 );
mpf_init( x );
mpf_set_d(x, 3.1415926535);
mpf_mul(x,x,x);
mpf_out_str(stdout, 10,24,x);
mpf_clear(x);
}
We discussed in detail how this code works. We then went over
how I modified two routines in statistics.c that are neede by
blockerr.c. You can find gmp_test.c, blockerr.c and
statistics.c on Nations in ~sg/nov21. To compile, you need to
be on a machine with the gmp library. My linux desktop has it
installed as part of the RedHat distro.
cc -o gmp_test gmp_test.c -lgmp
cc -o blockerr blockerr.c statistics.c -lgmp -lm
should compile the code if the libraries are available.
Otherwise, you will have to first compile the library. Use
google to search for gnu multiple precision to find the library.
The script multiprecision.csh is set up run the new version of
blockerr with an input set of a million points in file
tt_million. I ran it on my desktop to produce the file
multiprecision.out. This file contains some of the details of
the differences between double precision and 96 bit precision.
We see there can be big differences in the calculation of the
variance. The multiprecision errors follow exactly the pattern
we expect. Not that the errors are about a factor of 10 smaller
than for the files with 10^4 lines.