March 24, 2016 --- Class 20 --- Inverse Transform Method (continued),
Inverse Transform Method applied to exponential distribution,
Box-Muller Method, Covariance and Correlation
Inverse Transform Method (continued)
Sometimes we are interested in generating random numbers with a non
uniform probability density. There are a number of ways of doing
this and we shall study a few. This is covered in CSM section
11.5, starting on page 429. The first method we will study is
called inverse transform. You may be used to the idea of the
probability density function. For a real variable, we have
$\int_{-\Infinity}^{+\Infinity} p(x) dx = 1 $.
We can define the monotonically increasing function called the
cumulative probability distribution by
$ P(x) = \int_{-\Infinity}^{x} p(x') dx' $.
Of course, $ d P(x)/dx = p(x)$. P(x) is a number between 0 and 1.
If we can solve the equation $ P(x)=r$ for $x(r)$, then if we
generate uniform random numbers r, then the numbers x(r) have the
distribution we want. So, the key is we start with p(x), and
the method only is useful if we can find P(x) and then solve
P(x) = r
for x as a function of r.
A very simple example is a constant probablility for x to be between
two numbers a and b. In this case, we easily showed that
P(x) = (x-a)/(b-a) for ab. Solving P(x)=r, we find
x = a + (b-a)r
This simple relation is exactly what one would intitutively select for
generating numbers uniformly distributed between a and b.
We will study other distributions in our next class.
Inverse Transform Method for Exponential Distribution
An interesting and useful example of the inverse tranform method
involves an exponential distribution. Here
p(x)= exp(-x/lambda)/lambda.
We were able to show that
P(x) = 1 - exp(-x/lambda)
Setting P(x)=r and solving for r, we find
x = - lambda * ln(1-r)
Since 1-r and r are both uniformly distributed between 0 and 1,
we can simplify the above and replace ln(1-r) by ln(r).
Thus, to generate numbers with an exponential distribution, we
first generate a uniform random variable between 0 and 1. Then
x = -lambda * ln(r)
has an exponential distribution. We used Mathematica to generate
such numbers.
v = Table[ -Log[ Random[] ], {1000}] ;
We used ListPlot to look at the numbers.
We can also make use of the Mathematica Histogram function:
Histogram[v]
I find it easy to use my histogram package and axis to make a
histogram as a semilog plot. This is useful because an exponential
distribution turns into a straight line on a semilog plot. First we
need to get the data out of Mathematica with the Export function.
Note the use of double quotes in the first and third arguments.
Export["exponential.dat", v, "List"]
This will put the vector v in a file called exponential.dat.
Then use hist to plot the distribution. (If you want to use a
logarithmic y axis, make sure to use -g 2 option for hist and don't
let axis try to plot down to 0.)
We can use hist to make a histogram of the numbers. If we use the
-g 2 option to hist, and the logarithmic option to axis, we find
a plot that looks rather linear, as we would expect with an
exponential distribution.
hist -n 20 -g 2 < exponential2.dat |axis y l 1 500 |xplot
Another way to get output from Mathematica is with the Write function.
Do[ Write[ "exponential2.dat", v[[i]] ], {i,Length[v]} ]
will do the trick. The first argument of the Write function is
the name of the file. It must be included in double quotes.
You may have additional arguments to specify what to write. Here
we only want a single element of the vector v. Note the use of
the function Length to determine the length of the vector v. We
thus don't need to explicitly remember the length of v in the
iterator expression.
A Mathematica notebook can be found in ~sg/Documents/November6_2014.nb
which is also useful for the next topic.
Box-Muller Method
The Box-Muller method is a clever way to generate random
numbers with a Gaussian Distribution. It is described on page
432 of CSM. For homework you will get to try out this method and
verify that the distribution is Gaussian. You can also find
this method described in Numerical Recipes. We derived the Box-Muller
method.
Making Gaussian Random Numbers in Mathematica
Here is an implementation of the Box-Muller method with sigma=1:
gaus[] := Module[ {theta,rho}, theta=2.*Pi*Random[];
rho = -Log[Random[]];
N[{Sqrt[2 rho]Cos[theta],Sqrt[2 rho]Sin[theta]}] ]
gaus is now a function with no argument. With the above definition,
you can just type
gaus[]
and you will get a list with two Gaussian random numbers in it.
Of course, we want a lot to test the distribution, so try
v = Table[ gaus[], {10000} ];
Don't forget the semicolon at the end of the command! Otherwise
you are going to be looking at a lot of output.
Check some values:
v[[4]]
A Mathematica notebook ~sg/Documents/March25_2013_boxmuller.nb
records some of what was done in class. Most of the rest of these
notes were not followed in class, but can be used to examine the
Gaussian random numbers outside of Mathematica.
Now export v to a file:
Export["myfile", v, "Table"]
This will result in a pair of numbers on each line of the file myfile.
axis m 0 myfile.2
To take a file with one number per line and then print pairs of numbers
is also pretty easy.
awk '{ if( NR % 2 == 1) x=$1; else print x,$1}' myfile.2 >myfile.3
In Mathematica we could instead have used
Do[Write["myfile", v[[i,1]], " ", v[[i,2]]], {i, Length[v]}]
to print two words per line. Note that we really do need the
string of blanks in double quotes or the two numbers will have
no space between them. How annoying! It is good to learn the options
of the Export function.
The file that has two words per line will work for either the
scatter plot or histogram. However, it is useful to know these simple
awk commands.
We prepared one graph with the x and y
numbers from the Box-Muller method histogrammed separately. here are
appropriate commands:
#!/bin/csh
(awk '{print $1}' myfile | hist -n 40 -x -4 4 -g 2 -m 1;\
awk '{print $2}' myfile | hist -n 40 -x -4 4 -g 2 -m 2) | axis |
xplot
Covariance & Correlation
Using google and wikipedia, we found
information about covariance and correlation. A relevant URL is
http://en.wikipedia.org/wiki/Covariance_and_correlation
Covariance is a dimensionful quantity that takes the dimension of
the two variables whose covariance is calculated. Correlation, on the
other hand, is dimensionless and defined so that it varies between
-1 and 1.
In ~sg/src/misc, there is code to calculate the covariance between two
variables. You will also find a file gaussian.dat that has gaussian
random numbers, one per line. An awk script twolines.a can be used to
get two gaussian numbers per line. We find that the averages and
covariance are all very small, as expected.
[sw246-01:~/src/misc] sg% awk -f twolines.a gaussian.dat | covariance
1250000 items, x_avg= -4.839094e-04 y_avg= 9.334131e-04 covariance= 1.454055e-03
A scatter plot of the data looks like a circular distribution.
I prepared an awk script to take linear combinations of the two
variables. The script is called mix.a. Here is an example:
[sw246-01:~/src/misc] sg% cat mix.a
BEGIN{m11=1.; m22=1.; m12=0.5; m21=0.5}
NR%2==1{x1=$1}
NR%2==0{x2=$1; print (m11*x1+m12*x2),(m21*x1+m22*x2)}
[sw246-01:~/src/misc] sg% awk -f mix.a gaussian.dat |covariance
1250000 items, x_avg= -1.720255e-05 y_avg= 6.914574e-04 covariance= 9.992058e-01
The covariance is very close to 1. We can adjust mix.a so that m22 is smaller:
cat mix.a
BEGIN{m11=1.; m22=.8; m12=0.5; m21=0.5}
NR%2==1{x1=$1}
NR%2==0{x2=$1; print (m11*x1+m12*x2),(m21*x1+m22*x2)}
[sw246-01:~/src/misc] sg% awk -f mix.a gaussian.dat | covariance
1250000 items, x_avg= -1.720255e-05 y_avg= 5.047756e-04 covariance= 8.991208e-01
You might like to make scatterplots of the data to see what different
correlations look like. You also might like to adjust the parameters
in mix.a and see what happens.