March 21, 2017 --- Class 19 --- Higher Dimensional Integrals,
Testing Hit or Miss Method, Aver
We spent a bit of time reviewing what was done at the end of the last
class.
Higher Dimensional Integrals
Monte Carlo methods like hit or miss and the sample mean methods
have errors that fall like n^{-1/2}. This is not very good. To
cut the error in half you need to do 4 times a much work. For
Simpson's rule the error goes like n^{-4}, so if you do twice as
much work, the error falls by a factor of 16. However, for
multidimensional integrals, a method with an error like h^a,
where h is the spacing between grid points, will have an error like
n^{-a/d} because the number of points on the grid goes like h^{-d}.
As an example, if a=4 and d=10, a/d=0.4 so the error is falling
more slowly than n^{-1/2}. So, for high dimension integrals,
Monte Carlo methods may be bad, but other methods are worse.
Hit-or-Miss and Sample Mean Methods
The Hit-or-Miss method and Sample Mean methods are described in CSM
Sec. 11.2. I introduced the hit-or-miss method first. We will
consider the sample mean method in a future class.
Testing Hit-or-Miss Method
Program ~sg/src/misc/pi3.c implements the hit-or-miss method using a
feedback shift register random number generator. Compile it with
this command:
cc -o pi3 pi3.c newfsr.c
You have to tell the program how many "pebbles you want to throw."
This is the first command line argument. You may also have it
repeat the experiment multiple times. This is the optional second
argument. Look at the code to see how this is done if you are
unfamiliar with this type of feature. Also note how the random
number generator gets its seed through a call to pid(). This is
the "process id" of the calling routine. It is essentially a
random number. If you like, you can modify the code so that you
specify the seed yourself. This can be very useful when you are
trying to debug the code and want to see if results are
reproducible. Sometimes I set up my code so that it asks for a
seed, but if the seed I give is negative, it calls pid() to get a
random seed. That way, I can do many runs without having to think
up new seeds, and when I want to reproduce runs with a known seed I
can also do that.
I ran the pi3 code a few times with this command
pi3 10
The output ranged from 2.4 to 3.6. Clearly the output is a random
variable. I used the second argument to repeat the experiment
multiple time and made historgrams of the result. With larger sample
sizes, the histograms were narrower.
We have to answer several questions.
Do we get the right answer on average?
Can we estimate the error?
How does the error depend on n, the number of trials?
We expect the error to fall like 1/sqrt(n), but can we show that?
Students were asked to write a shell script to get results for
different values of num.
I displayed a shell script ~sg/runpi3.csh with a loop over num.
It pipes the output of each run of pi3 to a new program called aver.
(It also multiplies the error by 100 using awk.)
[sw246-01:~] sg% runpi3.csh
1.0000000000e+01 3.1430400000e+00 0.518381
5.0000000000e+01 3.1412960000e+00 0.230508
1.0000000000e+02 3.1401240000e+00 0.166837
2.5000000000e+02 3.1409952000e+00 0.103798
1.0000000000e+03 3.1424588000e+00 0.0522629
2.5000000000e+03 3.1413792000e+00 0.0330492
5.0000000000e+03 3.1419865600e+00 0.0229892
1.0000000000e+04 3.1417096800e+00 0.0162435
Aver is a very useful code you may copy from ~sg/src/misc
Aver
For our purposes you only have to know that if you have a file
with x and y on each line, if you type
aver file
then for any lines that have the same x-value aver will average all
the corresponding y-values. It will print x y_average y_error.
This can be very useful when you want averages. It is also why I
arranged the output of pi3 so that the number of pebbles thrown was the
first word of output on each line.
In ~sg/src/pi_test.ax you can find the output of my average value
of pi and the error in the mean of 100 trails. I have also drawn
a horizontal line at pi. For most, but not every point, the line
crosses the error bar.
How Big Should the Errors Be?
We analyze the errors in terms of a random process.
Let p be the probablity of a hit and q = 1-p the probability of a miss.
F_N(p,q) = (p+q)^N
is a polynomial that represents all the possible outcomes of N tosses.
In terms of combinatorial factors
F_N(p,q) = \Sum_{i=0}^N p^i q^{N-i} ( N )
( i )
where (N) = N!/( i! (N-i)! )
(i)
If we want to find the average number of hits = *, we can
apply the operator p X partial derivative w.r.t. p
We can also get ** by applying that operator twice. After applying
the differential operators, we can set p+q = 1
We found ** = N p.
We also found that
** = N p + N(N-1) p^2 .
So the variance ** - **^2 = N p (1-p)
Our pi3 program takes i and multiplies it by 4/N to get a number
that should be Pi. Taking these factors into account, the error
in pi, proportional to the square root of the variance should be
4 \sqrt( p(1-p)/N )
Each student wrote a script to study the error and reproduce
data similar to that in my pi_test.ax file. We then went on to
look at the error and compare it with the theoretical prediction.
Agreement was perfect. The script ~sg/src/run.csh makes 10000
trials for batch sizes from 10 to 5000. The error that comes
out of aver is multiplied by 100 because aver prints the
error in the mean which is proportional to
1/sqrt(number of measurements averaged).
So, we need to multiply by sqrt(10000) to get the variance of a single
measurement. Using the fact that p=pi/4 above, we found that
the numerical result agrees prefectly with the prediction based on the
generating function F_N(p,q).
The following discussion is optional:
You can also write an awk script to print the absolute value of the
error for each trial. Here is the awk commaand:
{ err=3.1415926-$2; if(err<0) err= (-err); print $1,err}
Now you many apply this command to your file before sending it to
aver and then plot the error vs size. Use logarithmic scale for
the x and y axes. Can you verify that the error falls like
1/sqrt(size) ?
*