开发者

Explaining this primes generator function, i cannot understand [python]

开发者 https://www.devze.com 2023-03-11 08:46 出处:网络
def primes(n): if n==2: return [2] elif n<2: return [] s=range(3,n+1,2) mroot = n ** 0.5 half=(n+1)/2-1
def primes(n): 
    if n==2: return [2]
    elif n<2: return []
    s=range(3,n+1,2)
    mroot = n ** 0.5
    half=(n+1)/2-1
    i=0
    m=3
    while m <= mroot:
        if s[i]:
            j=(m*m-3)/2
            s[j]=0
            while j<half:
                s[j]=0
                j+=m
        i=i+1
        m=2*i+3
    return [2]+[x for x in s if x]

Hello, I found this function here http://code.activestate.com/recipes/366178-a-fast-prime-number-list-generator/ and I'm stuck. I don't understand this. I think it's using some properties of prime numbers but all those single letter variables are so mysterious. Could you please shed some light?

This I understand: mroot is the limit of the numbers you want to check for primality and I know that the function changes the list s to 0's marking the multiples. I also understand the list comprehesion at the end, and I understand s.

But why half? What is j? What is m?

Could you please give some comment?开发者_开发知识库


A line-by-line breakdown:

def primes(n): 
    if n==2: return [2]

This function returns a list of primes <= n. So if n == 2, that list contains only 2. Easy enough.

    elif n<2: return []

Here again, there are no prime numbers below 2, so return an empty list.

    s=range(3,n+1,2)

So s is a list of odd numbers starting with 3 and going to n + 1. s stands for sieve -- this is a sieve algorithm, which means that composite (non-prime) numbers will be sifted out of the list. In effect, they will be "crossed out." See below for a detailed description of sieve algorithms.

    mroot = n ** 0.5

Since it's a sieve, we can stop the algorithm once we've hit the square root of n.

    half=(n+1)/2-1

This is an explicit formula for the length of s; it could be replaced with len(s), but that might take longer to calculate for large values of n. This will also be useful for terminating certain parts of the algorithm.

    i=0
    m=3

I is our index; i simply steps through the sieve, checking each value. If the value is 0, then the number has been "crossed off" because it isn't prime. m is simply the value of s[i] at any given moment; a later line keeps it updated.

    while m <= mroot:
        if s[i]:

Since s[i] evaluates to True, it hasn't been crossed off the list yet. That means it's prime! So now we have to figure out which numbers on the list are multiples of s[i] -- they are all non-primes, and should be crossed off the list.

            j=(m*m-3)/2
            s[j]=0

Now the fun starts. Because the sieve isn't a list of consecutive numbers, but a list of odd numbers, we have to figure out where the multiples of our prime live in s. In this case, our prime is 3, so we need to find the index of 9, 15, 21, 27... (we don't have to find 6, 12, 18... because they're even, and so not in the list). This particular technique for finding the indices is really clever, because the author has figured out that once all the multiples of a particular prime have been crossed out, they can be skipped. That means the first un-crossed-out multiple of our prime is actually the square of that prime. (So for example, if the prime were 7, 7 * 3 = 21 and 7 * 5 = 35 would already have been crossed out, so the first multiple of 7 that we have to deal with is 7 * 7.) Once that makes sense, it's pretty easy to see that the location of 9 in s is (9 - 3) // 2 (where // is floor division).

            while j<half:
                s[j]=0
                j+=m

Now it goes back to being easy. We've found 9; now we have to find 15 = 9 + 3 + 3. Since s contains only odd numbers, it's half as long as a list with every number; to skip ahead 6 then, we need only add 3 to j. We do this as long as j is less than half -- in other words, as long as j is less than the length of s.

        i=i+1
        m=2*i+3

Again, easy -- i is just the index of the list, while m is the value of the number that was originally there. (You can test it out to see why: [2 * i + 3 for i in range(10)].)

    return [2]+[x for x in s if x]

And voila -- filter the zeros out of the sieve, prepend [2] and you have a list of primes.

The most confusing thing about this algorithm has to do with the shortcuts the author has taken, which make this run faster, but fog up the fundamental concept. (In fact, there are even more shortcuts one could take, but that's another post.) Here's a much simpler sieve that shows the basic idea:

>>> numbers = range(40)
>>> numbers[1] = 0    # 1 isn't prime
>>> for i in numbers:
...     if i:
...         for j in range(i + i, len(numbers), i):
...             numbers[j] = 0
... 
>>> [n for n in numbers if n]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]

To spell it all out, first numbers looks like this:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10...]

Then...

[0, 0, 2, 3, 4, 5, 6, 7, 8, 9, 10...]
[0, 0, 2, 3, 0, 5, 0, 7, 0, 9, 0...]
[0, 0, 2, 3, 0, 5, 0, 7, 0, 0, 0...]

And so on.


This is an implementation of the Sieve of Eratosthenes. It takes a few shortcuts - eg, instead of starting from '2' and crossing out all the even numbers, it uses the range to not even generate them (it starts from 3 and generates only every second number - so 3, 5, 7, etc, up to n+1.


Indeed it uses a few tricks.

  • half represents roughly half of n. The plus and minus one is just there to make sure that half is actually the largest whole number less than half of n.
  • m is always what s[i] was at the time the list was created, which also equals 2*i+3. (You can get that from the definition of s). m is always the current prime number whose multiples you're eliminating.
  • j is used to eliminate the multiples. It starts from the index of m squared (since that's the smallest number that will need to be crossed out ((m*m-3)/2 undoes the formula above for getting the value, in this case m*m, from the index)
  • Then the while loop crosses out every m-th list element starting from the first j. Notice that since the list only has odd numbers, s[j+m] == s[j]+2m, which is okay because we're skipping the even multiples that way.
  • Then it just iterates over i and updates m accordingly.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号