November 29, 2007 --- Class 26 --- Feedback Shift Register Random Number
Generator, Fitting Data
Activities:
Comparing drand48 and fsr random number generators
Before class I prepared codes to test the drand48 random number
generator that comes with Unix. The codes can be found in my src/misc
directory. random_drand48.c puts out two numbers per line and
can be used for a scatter plot or to calculate autocorrelations.
box_drand48.c can be used to test triplets of random numbers as
we did before with box_rand.c for the fsr random number generator.
We compared the autocorrelation of the two random number generators
out to a lag of 20. Doing about 50 trials in each case, we found
that drand48 seemed to have a smaller dispersion than fsr, but that
it seemed to have more negative values of autocorrelation than
positive values. More work is required to see if this is just
a statistical fluctuation.
Understanding the Feedback Shift Register Random Number Generator
We examined how the feedback shift register (FSR) code makes a new
random 28 bit integer every time it is called. The reason this random
number generator has a period longer than 2^28 is because the period
is determined by the entire state of the feedback shift register
which has about 140 bits.
We also used wikipedia to find a description of IEEE floating point
format. 23 bits are used to store the fractional value of a number.
Feedback shift number random generators are based on bitwise exclusive
or operations. The recursion relation is
b_n = b_{n-p} ^ b_{n-q}
So the new bit depends on bits earlier in the sequence by p and q.
There are certain values of p and q that result in good randomness.
You can find good pairs of p and q in the book "Seminumerical
Algorithms" by Donald Knuth in Table 1. Our pair (p,q) = (97,127).
Within the code, 28 bits of each 32 bit word are used to store the
state of the FSR. rshift[1] to rshift[5] hold 140 bits of state.
The next 28 bits, numbered 141-168 are created in rshift[0].
Using b_n = b_{n-97} ^ b_{n-127}, we see that with n=141 we need
28 bit words starting with bit 44 (=141-97) and bit 14 (= 141-127).
Here are how the bits are stored in the rightmost 28 bits of rshift[i].
rshift[5] = 1-28
rshift[4] = 29-56
rshift[3] = 57-84
rshift[2] = 85-112
rshift[1] = 113-140
From the code, t2= (rshift[5]<<13))|(rshift[4]>>15)
contains the bits 14-41 that we need. In the C language x<> denotes a shift to the right.
We can get bits 44-56 from t1 = (rshift[4]<<15)|(rshift[3]>>13).
We then take the exclusive or of t1 and t2 and mask of the rightmost
28 bits. Finally, the shift register is moved to higher values,
dropping off the old value of rshift[5], and the new 28 bit integer
we created is divided by 2^28 to get a float between 0 and 1. All
the details can be found in ~sg/src/newfsr.c.
Other Sources of Random Numbers
We briefly looked at Gnu Scientific Library documentation for their
random number generators. We also used Google to find the web pages
for SPRNG a set of Scalable Parallel Pseudo Random Number Generators:
http://sprng.cs.fsu.edu/
These routines can be used in parallel programs that require a different
(independent) stream of random numbers on each node.
Fitting Data
We looked at what one should do to fit data to a model function.
If the data has a Gaussian distribution about its true value, then
we maximize the probability for the set of observations by
adjusting the parameters of the model to minimize \chi^2.
\chi^2= \Sum_{i=0}^N} (y_i- y_model(x_i)^2/ \sigma_i^2
The case of a linear model will be considered in great detail in the
next class. We will obtaine specific formulae for the slope
and intercept parameters of the fit. We will also find the error
on the slope and intercept, and the covariance of those quantities.
We also need consider whether or not our fit is a good one. We
minimized chi-squared as best we could, but how small should it be?