April 19, 2016 --- Class 27 --- Polyfit (repeated), Verification of Error Estimates
Activities:
Polyfit
-------
Polyfit is a routine that fits data with an nth order polynomial.
You are welcome to use it. You will find the code and a makefile
in ~sg/src/misc. Before class, I made up a code to produce data
close to a quadratic function. It is called almostquad.c and is
set to produce a curve
2.0 + x + 0.1 x*x + noise
The noise is gaussian with a standard deviation sigma = 0.8.
You may run almostquad (it is in ~sg/bin) and put the output
in a file. You may then use polyfit to fit the data.
You can learn quite a bit by doing this once, but I wanted to explore
the statistics of fitting many data sets that only differ by the
random noise. We can ask questions such as:
1) what is the distribution of chi-squared produced by the fit?
2) what is the distribution of confidence level?
3) what is the distribution of each of the fitted parameters and
how does it compare with the error estimated by doing the fit?
Verification of Error Estimates
-------------------------------
~sg/src/misc/almostquad.csh is set up to do this experiment 10000 times
Here is what the script looks like:
#!/bin/csh
# a script to run my almostquad code many times and use polyfit
unset noclobber
set run = 0
while ($run <10000)
almostquad > almostquad.dat
polyfit <$i.out
foreach? end
We made histograms of the three parameters in the model
and used my variance program to find the standard deviation for each
parameters. These values all agree very well with the error on the
parameters reported by polyfit.
You might like to extend this by looking at the covariance of different
parameters and comparing that with the correlations reported by
polyfit.
We also considered what happens when we fit the data with either a
linear or cubic polynomial. With a linear polynomial the chisquared
is huge and the confidence levels are miniscule. With a cubic fit,
the confidence levels are fine, but we find that the coefficient of
the cubic term is not significantly different from zero. We also found
the because of the extra parameter in the model the error estimate
of the other parameters increases.
In our next class, we will talk about models that are non-linear
in their parameters and start discussion of parallel computing.
Class voted to learn about GPU computing rather than, MPI or OpenMP.