Sunday, November 30, 2008

Sieve of Eratosthenes

This is a famous algorithm, typically used for generating prime numbers from 1 to N.

This algorithm is attributed to Eratosthenes, a Greek Mathematician who lived around 200 B.C. - I have described the algorithm and how it works:

A non-zero, postive integer (natural number) is prime if it has no other factors other than 1 and itself. 1 is neither prime nor composite. And 2 is the only even-prime number. 

So, multiples of 2 cannot be prime (they have 2 as one of their factors) - Similarly, multiples of 3 also cannot be prime. And so on. So, the idea is multiples of a number cannot be prime. 

So, let's consider numbers 1 to 100. 

Create an array of 100 numbers and set them all to 1. 

Let's start from 2 and mark all the multiples of 2 till 100 as non-prime (a.k.a composite). To do this, let's set the value in the array as 0.  [The index should go till 100 / 2]

Then, let's start with 3, and again mark all the multiples of 3 till 100 as non-prime. 

Next you reach 4. But, this is 0 already. So, the multiples of 4 should already be set to 0. (Since all multiples of 4 are already multiples of 2) - So ignore 4. [And ignore multiples of those numbers which are already marked as 0]. Then start with 5.  

So, how far should you go? Some people choose to go till N / 2 (50) - But this is sub-optimal. It is sufficient if we check for all multiples till square root of N. This is because, any factor greater than SQRT(N)  should have a partner that is less than SQRT(N) and hence would have already been marked 0. [Please re-read this para-graph :-) if this sounds confusing]

Now iterate through the array and print all indices where the value is non-zero. Voila! You have generated prime-numbers in O(nloglogn). 

I wrote an implementation in Python. It was quite cool :o)

#!/usr/bin/env python
# eratosthenes.py
# Sriram V Iyer
#
# Implements the eratosthenes sieve program

# Initialize an array for 100(+1) 1s
# +1 is to accomodate the cases 2 * 50, ...
l = [ 1 for i in range( 100+1) ]

# make the multiples as 0
[ l.__setitem__(i*j, 0) for i in range(2,10) for j in range(2,(100/i)+1) if l[i] != 0 ]

# print indices of the list that are still 1
print [ i for i in range(2,100) if l[i] == 1 ]

I had a discussion with Tejaswi (my Beceem colleague) on an equivalent C code.

Btw, I used the C/C++ compiler feature that initializes unitialized elements of a a partially initialized array to 0. This way, I changed the polarity to 0 -> Prime and 1 -> Composite in the C code. [This way, I needn't write code to initialize the array to 0]

/*
    eratos.c
    
    Sieve of Etatosthenes in C/C++
*/
#include "stdio.h"
#include "math.h"

int main()
{
    int N = 100, i = 0, j = 0;
    int nArray[100+1] = { 0 };

    for( i = 2; i < (sqrt((double)N)); i++ ) {
        if(nArray[i] == 1) continue;
        
        for( j = 2; j <= (N/i); j += 1 ) 
            nArray[i * j] = 1;
    }
    
    for( i = 2; i <>
        if(!nArray[i]) printf( "%d ", i );
    }
}

There is one more possible optimization - If you can initialize nArray[4] = 1 in the code before the first loop, you can start the inner loop from 3 :-)

PS:- Looks like the "<<" in cout can confuse blogger. So, I rewrote the program without using cout, endl :-)

No comments: