Friday, January 20, 2012

Pascal's Triangle Profiled

In my last entry I presented two versions of printing Pascal's Triangle.  I removed the print statements so that only the construction would be profiled.  I changed the number of rows from 300 to 10,000.

With Python 2.7:

prev = [1]
#print(prev)
for row in xrange(1, 10000):
    curr = [1]
    for index in xrange(1, row):
        curr.append(prev[index-1] + prev[index])
    curr.append(1)
    #print curr
    prev = curr


  49995002 function calls in 47.930 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(func)

 49995000    2.988    0.000    2.988    0.000 {method 'append' of 'list' objs}


prev = [1]
#print(prev)
for row in xrange(1, 10000):
    curr = [1]
    if row > 5:
        mid = (row / 2) + 1
        if row % 2 != 0:
            right = mid + 1
        else:
            right = mid - 1

        for index in xrange(1, mid):
            curr.append(prev[index-1] + prev[index])
     
        r = curr[0:right]
        r.reverse()
        curr.extend(r)
    else:
        for index in xrange(1, row):
            curr.append(prev[index-1] + prev[index])
        curr.append(1)
    #print curr
    prev = curr


25014999 function calls in 22.137 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

 24995009    1.552    0.000    1.552    0.000 {method 'append' of 'list' objects}
     9994    0.191    0.000    0.191    0.000 {method 'extend' of 'list' objects}
     9994    0.018    0.000    0.018    0.000 {method 'reverse' of 'list' objects}

By using reverse and extend to replace the appends, the second version is over twice as fast with 24,980,003 (50.03%) fewer function calls.

Since Pypy with its JIT has made so much progress lately and the benchmarks have been so impressive, I ran the second version with Pypy 1.7.

 25014999 function calls in 63.517 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

 24995009    9.682    0.000    9.682    0.000 {method 'append' of 'list' objects}
     9994    0.579    0.000    0.579    0.000 {method 'extend' of 'list' objects}
     9994    0.034    0.000    0.034    0.000 {method 'reverse' of 'list' objects}

It was almost three times slower.  This is a question for the Pypy team.

So, the first version is what I would expect a good candidate to come up with in an interview.  Extra points if they can explain the optimization.

2 comments:

  1. Long objects are not amenable to jitting (they're tight C loops) and GCC is not very good with dealing with our generated C code, that would be why.

    ReplyDelete
  2. Let's represent triangle like:
    1
    1 1
    1 2 1
    1 3 3 1
    1 4 6 4 1

    we can see that row size grows by 1

    now let's analyze:
    1) size of elements in a row is the same as row number -> we can reuse for loop, which counts how many rows to show
    2) all elements are positive integers (maybe long integer)

    hence, we should use array, because we know the size of row and the type of all elements
    #-------
    prev = [1,1]

    for size_of_row in xrange(2, 10000):
    ____curr = [1] *( size_of_row +1)

    ____#curr[1:-1] = [ prev[index-1] + prev[index] for index in xrange(1, size_of_row) ]
    ____# For loop is a bit faster

    ____for index in xrange(1, size_of_row):
    ________curr[index] = prev[index-1] + prev[index]

    ____#print curr[:]
    ____prev = curr
    #-------

    Now lets switch to raw array from ctypes:
    #-------
    from ctypes import c_uint
    prev = [1,1]

    for size_of_row in xrange(2, 10000):
    ____curr = ( c_uint *( size_of_row +1) )(1) # <- 1 is an element on position 0
    ____curr[-1]= 1

    ____for index in xrange(1, size_of_row):
    ________curr[index] = prev[index-1] + prev[index]

    ____prev = curr
    #-------

    It is probably the maximum speed one can get from Python, .. but lets look at the triangle again
    ##############
    ______1
    _____1 1
    ____1 2 1
    ___1 3 3 1
    __1 4 6 4 1
    1 5 10 10 5 1
    ##############
    .. there are the same values on left and right sides, it's 'craving' for cutting off the calculations in half

    #-------
    from ctypes import c_uint

    prev = [1,1]

    for size_of_row in xrange(2, 10000):
    ____curr = ( c_uint *( size_of_row +1) )(1) # <- 1 is an element on position 0
    ____curr[-1]= 1

    ____for index in xrange(1, size_of_row//2 +1):
    ________curr[index] = curr[-index-1] = prev[index-1] + prev[index]

    ____prev = curr
    #-------

    Note: my PC is P4 3.0Ghz (10k from the post ~390s, my method ~31s)
    python -m cProfile .\profile_me_pascal_triangle.py
    ____137 function calls in 39.677 seconds

    http://pastebin.com/k1GAchT9

    ReplyDelete