November 29, 2007 --- Class 26 --- Feedback Shift Register Random Number Generator, Fitting Data Activities: Comparing drand48 and fsr random number generators Before class I prepared codes to test the drand48 random number generator that comes with Unix. The codes can be found in my src/misc directory. random_drand48.c puts out two numbers per line and can be used for a scatter plot or to calculate autocorrelations. box_drand48.c can be used to test triplets of random numbers as we did before with box_rand.c for the fsr random number generator. We compared the autocorrelation of the two random number generators out to a lag of 20. Doing about 50 trials in each case, we found that drand48 seemed to have a smaller dispersion than fsr, but that it seemed to have more negative values of autocorrelation than positive values. More work is required to see if this is just a statistical fluctuation. Understanding the Feedback Shift Register Random Number Generator We examined how the feedback shift register (FSR) code makes a new random 28 bit integer every time it is called. The reason this random number generator has a period longer than 2^28 is because the period is determined by the entire state of the feedback shift register which has about 140 bits. We also used wikipedia to find a description of IEEE floating point format. 23 bits are used to store the fractional value of a number. Feedback shift number random generators are based on bitwise exclusive or operations. The recursion relation is b_n = b_{n-p} ^ b_{n-q} So the new bit depends on bits earlier in the sequence by p and q. There are certain values of p and q that result in good randomness. You can find good pairs of p and q in the book "Seminumerical Algorithms" by Donald Knuth in Table 1. Our pair (p,q) = (97,127). Within the code, 28 bits of each 32 bit word are used to store the state of the FSR. rshift[1] to rshift[5] hold 140 bits of state. The next 28 bits, numbered 141-168 are created in rshift[0]. Using b_n = b_{n-97} ^ b_{n-127}, we see that with n=141 we need 28 bit words starting with bit 44 (=141-97) and bit 14 (= 141-127). Here are how the bits are stored in the rightmost 28 bits of rshift[i]. rshift[5] = 1-28 rshift[4] = 29-56 rshift[3] = 57-84 rshift[2] = 85-112 rshift[1] = 113-140 From the code, t2= (rshift[5]<<13))|(rshift[4]>>15) contains the bits 14-41 that we need. In the C language x<> denotes a shift to the right. We can get bits 44-56 from t1 = (rshift[4]<<15)|(rshift[3]>>13). We then take the exclusive or of t1 and t2 and mask of the rightmost 28 bits. Finally, the shift register is moved to higher values, dropping off the old value of rshift[5], and the new 28 bit integer we created is divided by 2^28 to get a float between 0 and 1. All the details can be found in ~sg/src/newfsr.c. Other Sources of Random Numbers We briefly looked at Gnu Scientific Library documentation for their random number generators. We also used Google to find the web pages for SPRNG a set of Scalable Parallel Pseudo Random Number Generators: http://sprng.cs.fsu.edu/ These routines can be used in parallel programs that require a different (independent) stream of random numbers on each node. Fitting Data We looked at what one should do to fit data to a model function. If the data has a Gaussian distribution about its true value, then we maximize the probability for the set of observations by adjusting the parameters of the model to minimize \chi^2. \chi^2= \Sum_{i=0}^N} (y_i- y_model(x_i)^2/ \sigma_i^2 The case of a linear model will be considered in great detail in the next class. We will obtaine specific formulae for the slope and intercept parameters of the fit. We will also find the error on the slope and intercept, and the covariance of those quantities. We also need consider whether or not our fit is a good one. We minimized chi-squared as best we could, but how small should it be?