March 27, 2013---Class 22---Correlations, Metropolis Method, Autocorrelations
Activities:
Correlations
------------
In our last class, we briefly touched on corrlations between variables
and the autocorrelation. Using google and wikipedia, we found
information about covariance and correlation. A relevant URL is
http://en.wikipedia.org/wiki/Covariance_and_correlation
Covariance is a dimensionful quantity that takes the dimension of
the two variables whose covariance is calculated. Correlation, on the
other hand, is dimensionless and defined so that it varies between
-1 and 1.
In ~sg/src/misc, there is code to calculate the correlation between two
variables. You will also find a file gaussian.dat that has gaussian
random numbers, one per line. An awk script twolines.a can be used to
get two gaussian numbers per line. We find that the averages and
correlation are all very small, as expected.
[sw246-01:~/src/misc] sg% awk -f twolines.a gaussian.dat | correlation
1250000 items, x_avg= -4.839094e-04 y_avg= 9.334131e-04 correlation= 1.454055e-03
A scatter plot of the data looks like a circular distribution.
I prepared an awk script to take linear combinations of the two
variables. The script is called mix.a. Here is an example:
[sw246-01:~/src/misc] sg% cat mix.a
BEGIN{m11=1.; m22=1.; m12=0.5; m21=0.5}
NR%2==1{x1=$1}
NR%2==0{x2=$1; print (m11*x1+m12*x2),(m21*x1+m22*x2)}
[sw246-01:~/src/misc] sg% awk -f mix.a gaussian.dat |correlation
1250000 items, x_avg= -1.720255e-05 y_avg= 6.914574e-04 correlation= 9.992058e-01
The correlation is very close to 1. We can adjust mix.a so that m22 is smaller:
cat mix.a
BEGIN{m11=1.; m22=.8; m12=0.5; m21=0.5}
NR%2==1{x1=$1}
NR%2==0{x2=$1; print (m11*x1+m12*x2),(m21*x1+m22*x2)}
[sw246-01:~/src/misc] sg% awk -f mix.a gaussian.dat | correlation
1250000 items, x_avg= -1.720255e-05 y_avg= 5.047756e-04 correlation= 8.991208e-01
You might like to make scatterplots of the data to see what different
correlations look like.
Metropolis Method
-----------------
Another interesting method was developed by Metropolis, Rosenbluth,
Rosenbluth, Teller and Teller. It can be used to generate a random
walk that has the desired probability distibution. The key idea is
that we take the current value of x, take a random trial step, and then
test the new point's probability. If it is more probable, we accept
the new point. If it less probable, we accept it with probability
p(x_trial)/p(x). I showed the class a C code that accomplishes this.
The method is discussed in CSM section 11.7, starting on page 435.
The Metropolis method is regarded as one of the 10 greatest numerical
methods developed in the 20th century. It's use is quite widespread.
It is not a particularly good method for generating gaussian random
numbers, but we can learn a lot about the method, its efficiency and
data analysis by studing the method in this very simple context.
I displayed the code to implement the metropolis method. Read the
appendix in CSM to see how
detailed balance is achieved by the algorithm. Because this is such
an important method, it is important for me that you implement it
yourself on the next homework assignment.
Let's explore the efficiency of this method which often
goes by the name Monte Carlo because of the prominence of random
numbers.
Metropolis Optimization
-----------------------
Time history, histogram
Create some numbers with the command
metropolis 1 5000 0.2 >data
This should create a run with 5000 numbers of a Gaussian with width
1. A maximum step size of +- 0.2 is used.
Create the time history with the command
axis a >! autocorr.dat
end
The shell will give you continuation prompts for the second and
third lines since you have started a foreach loop. The >>! will
make sure the shell does not complain about appending to a file
that does not yet exist, if (like me) you like to set the noclobber
variable. Your file autocorr.dat will now have 210 lines in it from
each of the ten runs of the autocorr program.
If you type
axis y l x 0 19