April 15, 2013 --- Class 27 --- Verification of Error Estimates, Non-linear
Data Fitting
Activities:
Verification of Error Estimates
-------------------------------
In our last class, we introduced polyfit for fitting data to
an nth order polynomial. With almostquad.c we can produce data
close to a quadratic function:
2.0 + x + 0.1 x*x + noise
The noise is gaussian with a standard deviation sigma = 0.8.
You can learn quite a bit by doing this once, but we want to explore
the statistics of fitting many data sets that only differ by the
random noise. We can ask questions such as:
1) what is the distribution of chi-squared produced by the fit?
2) what is the distribution of confidence level?
3) what is the distribution of each of the fitted parameters and
how does it compare with the error estimated by doing the fit?
~sg/src/misc/almostquad.csh is set up to do this experiment 10000 times
Here is what the script looks like:
#!/bin/csh
# a script to run my almostquad code many times and use polyfit
unset noclobber
set run = 0
while ($run <10000)
almostquad >! almostquad.dat
polyfit <$i.out
foreach? end
We made histograms of the three parameters in the model
and used my variance program to find the standard deviation for each
parameters. These values all agree very well with the error on the
parameters reported by polyfit.
You might like to extend this by looking at the covariance of different
parameters and comparing that with the correlations reported by
polyfit.
We also considered what happens when we fit the data with either a
linear or cubic polynomial. With a linear polynomial the chisquared
is huge and the confidence levels are miniscule. With a cubic fit,
the confidence levels are fine, but we find that the coefficient of
the cubic term is not significantly different from zero.
Non-linear model functions
--------------------------
When the model function depends non-linearly on the parameters, we have
to use an iterative method to minimize chi-squared. This problem is
discussed in Numerical Recipies. The present case differs from the
general minimization problem because we can calculate the first and
second derivatives of chi-squared because we know the form of the model
function. Numerical Recipies has the Levenberg-Marquardt method and
the user needs to supply the model function and its first derivatives
with respect to its parameters. I have my own fitting routines that
can be found in either ~sg/src/misc or ~sg/oldhome/sg/fitting .
The user needs to also supply second derivatives for my fitter.
Gnu Scientific Library also has data fitting routines that implement
Levenberg-Marquardt.
In our next class we will do some fitting with my code for non-linear
models.