November 28, 2006---Class 24--- Shell Scripts for HW, Monte Carlo Error Analysis (continued), Multiprecision Library Activities: Shell scripts for homework -------------------------- In ~sg/metropolis you will find some shell and awk scripts that you can use to go from plotting the autocorrelation as a function of the lag for fixed delta in the gaussian random walk, to plotting for fixed lag, the autocorrelation as a function of delta. This is useful for finding the optimal delta as you are requested to do for the homework. The way these scripts work was discussed in class. We ran makeauto.csh and then made a graph from the autocorrelatons file using aug_vs_delta.csh. Monte Carlo Error Analysis (continued) ----------------------------------------- In the last class, we were trying to show that there is a big difference between correlated and uncorrelated sampling when doing Monte Carlo integration. We did this by comparing what happens when we block data before averaging it. We expected to find that for the correlated data the error would grow with blocksize, but that with the uncorrelated data the error would not depend on the blocksize. Unfortunately, with a large input file, the error was not correctly calculated because of round off error. We looked at two solutions to this problem. I ran details for 100,000 steps, but then instead of averaging the whole sample, I used split to break the file up into blocks of 10,000 samples. Here is a shell script out analyze the split files: #!/bin/csh echo "rwalk" foreach dat (x*) foreach bl (1 2 4 8 16 32 64 128) awk '{print $1}' $dat |blockerr $bl end end echo "normal" foreach dat (x*) foreach bl (1 2 4 8 16 32 64 128) awk '{print $2}' $dat |blockerr $bl end end The output consists of many lines that look like this: rwalk 10000 items, block size = 1 average= 3.861012e-01 +/- 3.172200e-03 10000 items, block size = 2 average= 3.861012e-01 +/- 4.476223e-03 10000 items, block size = 4 average= 3.861012e-01 +/- 6.309403e-03 10000 items, block size = 8 average= 3.861012e-01 +/- 8.872789e-03 10000 items, block size = 16 average= 3.861012e-01 +/- 1.240090e-02 10000 items, block size = 32 average= 3.861012e-01 +/- 1.711713e-02 10000 items, block size = 64 average= 3.861012e-01 +/- 2.309630e-02 10000 items, block size = 128 average= 3.861012e-01 +/- 3.018832e-02 repeated 9 times for the other files with 10,000 lines normal 10000 items, block size = 1 average= 5.009796e-01 +/- 3.535679e-03 10000 items, block size = 2 average= 5.009796e-01 +/- 3.513706e-03 10000 items, block size = 4 average= 5.009796e-01 +/- 3.530357e-03 10000 items, block size = 8 average= 5.009796e-01 +/- 3.560046e-03 10000 items, block size = 16 average= 5.009796e-01 +/- 3.674959e-03 10000 items, block size = 32 average= 5.009796e-01 +/- 3.711299e-03 10000 items, block size = 64 average= 5.009796e-01 +/- 3.786352e-03 10000 items, block size = 128 average= 5.009796e-01 +/- 4.015603e-03 repeated 9 times for the other files with 10,000 lines For the rwalk (correlated) case, the error changes by an order of magnitude when the block size is increased from 1 to 128. There is no evidence yet that the error has saturated. For the normal (uncorrelated) case, the error hardly changes at all. Also note that the average value varies greatly for the rwalk case: 3.861012e-01 5.300028e-01 5.591833e-01 5.538783e-01 4.552062e-01 3.288843e-01 5.996343e-01 4.868478e-01 5.384931e-01 4.757377e-01 For the uncorrelated case, we have: 5.009796e-01 5.088782e-01 4.979054e-01 4.992321e-01 4.985086e-01 4.917234e-01 5.026108e-01 4.995261e-01 5.046086e-01 4.991509e-01 In summary, for the uncorrelated case, the error is independent of the block size and the variation of the mean values is consistent with the reported size of the error. For the correlated case, the error is highly dependent on the block size and the variation of the mean values is much larger than the error as reported with a block size of 128. We need to consider block sizes even greater than 128. A second approach is based on the gnu multiprecision arithmetic library. It allows one to keep more bits of precision that the computer hardware normally handles. Here is a simple test program that uses the gmp library: #include #include main () { mpf_t x; mpf_set_default_prec( (unsigned long int) 96 ); mpf_init( x ); mpf_set_d(x, 3.1415926535); mpf_mul(x,x,x); mpf_out_str(stdout, 10,24,x); mpf_clear(x); } We discussed in detail how this code works. We then went over how I modified two routines in statistics.c that are neede by blockerr.c. You can find gmp_test.c, blockerr.c and statistics.c on Nations in ~sg/nov21. To compile, you need to be on a machine with the gmp library. My linux desktop has it installed as part of the RedHat distro. cc -o gmp_test gmp_test.c -lgmp cc -o blockerr blockerr.c statistics.c -lgmp -lm should compile the code if the libraries are available. Otherwise, you will have to first compile the library. Use google to search for gnu multiple precision to find the library. The script multiprecision.csh is set up run the new version of blockerr with an input set of a million points in file tt_million. I ran it on my desktop to produce the file multiprecision.out. This file contains some of the details of the differences between double precision and 96 bit precision. We see there can be big differences in the calculation of the variance. The multiprecision errors follow exactly the pattern we expect. Not that the errors are about a factor of 10 smaller than for the files with 10^4 lines.