Thursday, March 7, 2013

C++ Coding - Random Numbers and Monte Carlo

Question Generate pseudo random numbers from the normal distribution.



In this example we use the standard in-built random number generator to generate normally distributed numbers. For a more up to date example using c++11 standards (available in gcc4.8/VS2010 and upwards) see click here. We shall use the Box-Muller method to transform numbers from a uniform distribution into numbers from the normal distribution. In order to get numbers from the uniform distribution, we will have to convert the set of random integers generated by the standard number generator into real numbers on the interval between 0 and 1. There are many better random number generators than the standard one not least the Mersenne Twister algorithm (download here), but for simplicity we stick with the standard one.

Start with your empty program with the appropriate libraries, in this case we also need cstdlib for random numbers, and later ctime to initialise the sequence. We also write 2 empty functions for the random number generators, which we will fill in as we go along.

#include <cstdlib>
#include <cmath>
#include <ctime>
#include <iostream>
using namespace std;
 // return a uniformly distributed random number
double uniformRandom()
{
  return 0.;
}
 // return a normally distributed random number
double normalRandom()
{
  return 0.; 
}

int main()
{
  return 0;
}
Now we can start with the uniformly distributed random number. The function rand() returns a random sequence of integer values between 0 and RAND_MAX a compiler specific predefined maximum number. To transform this set of integers into the numbers on the interval 0 to 1 is quite simple, simple cast them into doubles and divide by the maximum number. The following function generates numbers on the closed interval
double uniformRandom()
{
  return (double)(rand())/(double)(RAND_MAX);
}
This generator can cause problems if 0 or 1 result in invalid numbers in later use cases. For example, in the Box-Muller method we are required to calculate the logarithm of a number from the uniform distribution, and you can't take a log of 0. So we may wish to generate numbers on the open interval. By simply adding 1 or 2 to rand() or RAND_MAX we can shift onto the open/closed, the closed/open, or the open/open interval. We use the open/closed interval here and the code is as follows
#include <cstdlib>
#include <cmath>
#include <ctime>
#include <iostream>
using namespace std;
 // return a uniformly distributed random number
double uniformRandom()
{
  return ( (double)(rand()) + 1. )/( (double)(RAND_MAX) + 1. );
}
 // return a normally distributed random number
double normalRandom()
{
  return 0.; 
}

int main()
{
  for(int i=0;i<100;i++)
    cout << " u_i " << uniformRandom() << endl;
  return 0;
}
The for loop in the main will print out 100 numbers from the sequence. You should note that every time this code is run the numbers generated are not truly random and will be exactly the same every time. As a consequence if you want the program to run on a different set of numbers each time the program start you can use srand() to seed the generator. For instance we could seed to algorithm with the number 10 with the following
  srand(10);
  for(int i=0;i<100;i++)
    cout << " u_i " << uniformRandom() << endl;
  return 0;
Given this seed, a new set of numbers are generated. We can change that number to generate different sets, but the call to seed the algorithm should be done only once otherwise you will get the same numbers over and over. This doesn't help us get a random start though, but we can use the current system time to give us a sort of random start point. The call time(NULL) will return the current system time in seconds, and this can be used effectively as your random seed. The code
  srand(time(NULL));
  for(int i=0;i<100;i++)
    cout << " u_i " << uniformRandom() << endl;
  return 0;
will generate apparently random numbers each time it is run.

Before moving on we should really perform a test to see if the set of numbers we are generating really are from the uniform distribution. For a simple test we might look at the probability that a number u from the distribution is less than say 0.25. We know that this probability must be equal to 0.25, so we can use Monte-Carlo type integration to test this out. Note that the following is true

so we can calculate the probability by taking the average of the indicator function on u less than 0.25. The code for this might look something like
  // store the number of simulations
  int N=100;
  double prob=0.,x=0.25;
  for(int i=0;i<N;i++)
  {
    double u=uniformRandom();
    if(u<x)
      prob = prob + 1.;
  }
  // output the probability
  cout << " P(u<0.25) = " << prob/N << endl;  
  return 0;
Running this bit of code with larger and larger N should see the value get closer and closer to 0.25.

Once we are satisfied with the uniform random generator we can move onto the normal random generator. The Box-Muller method can be written as

where x1 and x2 will be independent normally distributed numbers. For a simplicity (although inefficient) we shall throw one of those numbers away and just use one of them. In code we write
#include <cstdlib>
#include <cmath>
#include <ctime>
#include <iostream>
using namespace std;
 // return a uniformly distributed random number
double uniformRandom()
{
  return ( (double)(rand()) + 1. )/( (double)(RAND_MAX) + 1. );
}
 // return a normally distributed random number
double normalRandom()
{
  double u1=uniformRandom();
  double u2=uniformRandom();
  return cos(8.*atan(1.)*u2)*sqrt(-2.*log(u1)); 
}

int main()
{
  for(int i=0;i<100;i++)
    cout << " x_i = " << normalRandom() << endl;  
  return 0;
}
Hopefully you should get what look like normally distributed numbers. Next, again we should check that the distribution of numbers we get has the correct mathematical properties. For this you could run a number of statistical tests. A simple check is just for the mean and variance which we can calculate as follows
 
  int N=1000;
  double mean=0.,variance=0.;
  for(int i=0;i<N;i++)
  {
    double z=normalRandom();
    mean = mean + z;
  }
  mean = mean/N;
  cout << " mean = " << mean << endl;
  
  for(int i=0;i<N;i++)
  {
    double z=normalRandom();
    variance = variance + (mean - z)*(mean - z);
  }
  variance = variance/N;
  cout << " variance = " << variance << endl;
Hopefully as you set the value of N larger and larger the mean and variance will tend to 0 and 1.

6 comments:

  1. This is really a very great and clear explanation of Monte carlo simulation. Thanks, I had been working for quite a few days but couldn't get through.

    ReplyDelete
  2. I have questions.

    1. I do not understand this part. For a simple test we might look at the probability that a number u from the distribution is less than say 0.25. We know that this probability must be equal to 0.25.

    How you come up with 0.25 probability?

    2. return cos(8.*atan(1.)*u2)*sqrt(-2.*log(u1));
    When formula say cos(2*pi*u2)..I am sorry but I do not understand (not from coding perspective) "8. * arctangent(1)".

    I am sorry I am bit weak in math but want to understand monte carlo so request you Sir to clarify on this.

    Thank you.

    Raj

    ReplyDelete
  3. 1. The uniform distribution has equal probability over the unit interval. So given that the interval [0,0.25] takes up 25% of the total, the probability of a number landing in the interval is 25% or 0.25.

    2. Given Pi is an irrational number by definition you can never write down the number to full accuracy. It is much better therefore to use a numerical formula to define the number which will adjust itself to the accuracy of the data storage (32bit float or 64bit double). Simple trigonometric identities give
    pi / 4 = atan(1.)
    so
    pi = 4 atan(1)
    and
    2 pi = 8 atan(1)
    Alternatively you can also use
    pi = acos(-1)
    I can't remember is there is any difference between the two!

    ReplyDelete
  4. I need help for an exercise :simulate a proces montecarlo in c++ when :n=100,N=20

    ReplyDelete
  5. Hi! I am a bit puzzled by the +1 or +2 to shift between intervals. So we have three intervals that we can switch between open/closed, closed/open and open/open. So if I want to choose open/open interval would the function be ((double)(rand()+1.0)/((double)(RAND_MAX)+2)? Is this correct? I do not fully understand how it works. Can you point me in the right direction.

    ReplyDelete
    Replies
    1. Ok imagine you have a random number generator that gives the first 10 integers. The function rand() will be drawn from the set
      { 0, 1, 2, ... , 9 }
      and RAND_MAX = 9. Then a function rand()/RAND_MAX gives
      {0 , 1/9 , 2/9 , ... , 1 }
      so is closed interval [0,1]. If you want open interval at the start, add one to rand(), but you also need to add 1 to RAND_MAX as well. To keep the interval open at the other end adding 2 will do the trick. In the example above this gives
      { 1/11 , 2/11 , 3/11 , ... , 10/11 }

      Delete