October 30, 2013 --- Class 19 --- Numerical Integration Examples,
Higher Dimensional Integrals, Testing Hit or Miss Method, Aver
Testing Trapeziod, Simpson and Romberg Methods
In ~sg/integration_test/double_precision, you will find source code
and output files. The source code has been modified so we can see
how the methods converge. We went over the source code in detail.
In particular, we discussed how static variables are used in the
evaluation of the trapiziod rule to reduce the bin width and only
evaluate the function at new points.
The integral of cos(x) from x=0 to x=pi is calculated by xqsimp2.
Here is the output:
1 7.853981633974e-01 3.333333333333e+29
2 9.480594489685e-01 1.002279877492e+00
3 9.871158009728e-01 1.000134584974e+00
4 9.967851718862e-01 1.000008295524e+00
5 9.991966804851e-01 1.000000516685e+00
6 9.997991943200e-01 1.000000032265e+00
7 9.999498000921e-01 1.000000002016e+00
8 9.999874501175e-01 1.000000000126e+00
9 9.999968625353e-01 1.000000000008e+00
10 9.999992156342e-01 1.000000000000e+00
The first column is the level at which we call the routine, i.e., with
2^(j-1) intervals in the integration range. The second column is the
trapzoid rule and the third column is for Simpson's rule. Note that
the j=1 value for Simpson's rule is bogus since we need to go at least
to j=2 to use Simpson's rule. In cos.ax, you will find the error
plotted vs. j. The crosses are for the trapezoid rule, the diamonds
are for Simpson's rule. You will also find three points from
the Romberg routine.
We found that from j=0 to j=10 the value for the trapezoid rule
fall by 6 orders of magnitude. This corresponds to increasing the
number of intervals by 2^10 or 1024. Since the error for the
trapzoid rule is supposed to fall like n^{-2}, and 1024^2 is about
10^6, this makes perfect sense.
Simpson's rule is seen to converge much more quickly and falls
12 orders of magnitude. This is the expected behavior. With the
Romberg method, the error is already about 10^{-12} for j=5 and
drops below 10^{-15} by j=7. Quite an impressive improvement!
A Second Example
A second example integrates sqrt(1-x^2) from 0 to 1. Curiously, it
looks very different. This is because of the singularity in the
derivative of the integrand at x=1. Simpson's rule does not converge
with a higher power of j. Plot sqrt_1minus_xsqr.ax or plot both graphs
together. We recalled that the expected error for the trapezoid rule
depends on the average value of the second derivative of the function.
A Third Example
In the third example, we again integrate sqrt(1-x^2), but from 0 to 0.5.
This avoids the region with high derivative and the result follows
the pattern of the first example where Simpson's rule is much better
than the trapeziod rule. Look at the newexample.ax file.
Higher Dimensional Integrals
Monte Carlo methods like hit or miss and the sample mean methods
have errors that fall like n^{-1/2}. This is not very good. To
cut the error in half you need to do 4 times a much work. For
Simpson's rule the error goes like n^{-4}, so if you do twice as
much work, the error falls by a factor of 16. However, for
multidimensional integrals, a method with an error like h^a,
where h is the spacing between grid points, will have an error like
n^{-a/d} because the number of points on the grid goes like h^{-d}.
As an example, if a=4 and d=10, a/d=0.4 so the error is falling
more slowly than n^{-1/2}. So, for high dimension integrals,
Monte Carlo methods may be bad, but other methods are worse.
Hit-or-Miss and Sample Mean Methods
The Hit-or-Miss method and Sample Mean methods are described in CSM
Sec. 11.2. I introduced the hit-or-miss method first. We will
consider the sample mean method in a future class.
Testing Hit-or-Miss Method
Program ~sg/src/misc/pi3.c implements the hit-or-miss method using a
feedback shift register random number generator. Compile it with
this command:
cc -o pi3 pi3.c newfsr.c
You have to tell the program how many "pebbles you want to throw."
This is the first command line argument. You may also have it
repeat the experiment multiple times. This is the optional second
argument. Look at the code to see how this is done if you are
unfamiliar with this type of feature. Also note how the random
number generator gets its seed through a call to pid(). This is
the "process id" of the calling routine. It is essentially a
random number. If you like, you can modify the code so that you
specify the seed yourself. This can be very useful when you are
trying to debug the code and want to see if results are
reproducible. Sometimes I set up my code so that it asks for a
seed, but if the seed I give is negative, it calls pid() to get a
random seed. That way, I can do many runs without having to think
up new seeds, and when I want to reproduce runs with a known seed I
can also do that.
I ran the pi3 code a few times with this command
pi3 10
The output ranged from 2.4 to 3.6. Clearly the output is a random
variable. I used the second argument to repeat the experiment
multiple time and made historgrams of the result. With larger sample
sizes, the histograms were narrower.
We have to answer several questions.
Do we get the right answer on average?
Can we estimate the error?
How does the error depend on n, the number of trials?
We expect the error to fall like 1/sqrt(n), but can we show that?
Students were asked to write a shell script to get results for
different values of num.
I displayed a shell script ~sg/runpi3.csh with a loop over num.
It pipes the output of each run of pi3 to a new program called aver.
(It also multiplies the error by 100 using awk.)
[sw246-01:~] sg% runpi3.csh
1.0000000000e+01 3.1430400000e+00 0.518381
5.0000000000e+01 3.1412960000e+00 0.230508
1.0000000000e+02 3.1401240000e+00 0.166837
2.5000000000e+02 3.1409952000e+00 0.103798
1.0000000000e+03 3.1424588000e+00 0.0522629
2.5000000000e+03 3.1413792000e+00 0.0330492
5.0000000000e+03 3.1419865600e+00 0.0229892
1.0000000000e+04 3.1417096800e+00 0.0162435
Aver is a very useful code you may copy from ~sg/src/misc
Aver
For our purposes you only have to know that if you have a file
with x and y on each line, if you type
aver file
then for any lines that have the same x-value aver will average all
the corresponding y-values. It will print x y_average y_error.
This can be very useful when you want averages. It is also why I
arranged the output of pi3 so that the number of pebbles thrown was the
first word of output on each line.
In ~sg/src/pi_test.ax you can find the output of my average value
of pi and the error in the mean of 100 trails. I have also drawn
a horizontal line at pi. For most, but not every point, the line
crosses the error bar.
How Big Should the Errors Be?
We analyze the errors in terms of a random process.
Let p be the probablity of a hit and q = 1-p the probability of a miss.
F_N(p,q) = (p+q)^N
is a polynomial that represents all the possible outcomes of N tosses.
In terms of combinatorial factors
F_N(p,q) = \Sum_{i=0}^N p^i q^{N-i} ( N )
( i )
where (N) = N!/( i! (N-i)! )
(i)
If we want to find the average number of hits = *, we can
apply the operator p X partial derivative w.r.t. p
We can also get ** by applying that operator twice. After applying
the differential operators, we can set p+q = 1
We find ** = N p and ** = N p + N(N-1) p^2 . So the variance
** - **^2 = N p (1-p)
Our pi3 program takes i and multiplies it by 4/N to get a number
that should be Pi. Taking these factors into account, the error
in pi, proportional to the square root of the variance should be
4 \sqrt( p(1-p)/N )
Each student wrote a script to study the error and reproduce
data similar to that in my pi_test.ax file. We then went on to
look at the error and compare it with the theoretical prediction.
Agreement was perfect. The script ~sg/src/run.csh makes 10000
trials for batch sizes from 10 to 5000. The error that comes
out of aver is multiplied by 100 because aver prints the
error in the mean which is proportional to
1/sqrt(number of measurements averaged).
So, we need to multiply by sqrt(10000) to get the variance of a single
measurement. Using the fact that p=pi/4 above, we found that
the numerical result agrees prefectly with the prediction based on the
generating function F_N(p,q).
You can also write an awk script to print the absolute value of the
error for each trial. Here is the awk commaand:
{ err=3.1415926-$2; if(err<0) err= (-err); print $1,err}
Now you many apply this command to your file before sending it to
aver and then plot the error vs size. Use logarithmic scale for
the x and y axes. Can you verify that the error falls like
1/sqrt(size) ?
*