With Python 2.7:

`prev = [1] #print(prev) for row in xrange(1, 10000): curr = [1] for index in xrange(1, row): curr.append(prev[index-1] + prev[index]) curr.append(1) #print curr prev = curr 49995002 function calls in 47.930 seconds ncalls tottime percall cumtime percall filename:lineno(func) 49995000 2.988 0.000 2.988 0.000 {method 'append' of 'list' objs}`

`prev = [1] #print(prev) for row in xrange(1, 10000): curr = [1] if row > 5: mid = (row / 2) + 1 if row % 2 != 0: right = mid + 1 else: right = mid - 1 for index in xrange(1, mid): curr.append(prev[index-1] + prev[index]) r = curr[0:right] r.reverse() curr.extend(r) else: for index in xrange(1, row): curr.append(prev[index-1] + prev[index]) curr.append(1) #print curr prev = curr 25014999 function calls in 22.137 seconds ncalls tottime percall cumtime percall filename:lineno(function) 24995009 1.552 0.000 1.552 0.000 {method 'append' of 'list' objects} 9994 0.191 0.000 0.191 0.000 {method 'extend' of 'list' objects} 9994 0.018 0.000 0.018 0.000 {method 'reverse' of 'list' objects}`

By using reverse and extend to replace the appends, the second version is over twice as fast with 24,980,003 (50.03%) fewer function calls.

Since Pypy with its JIT has made so much progress lately and the benchmarks have been so impressive, I ran the second version with Pypy 1.7.

`25014999 function calls in 63.517 seconds ncalls tottime percall cumtime percall filename:lineno(function) 24995009 9.682 0.000 9.682 0.000 {method 'append' of 'list' objects} 9994 0.579 0.000 0.579 0.000 {method 'extend' of 'list' objects} 9994 0.034 0.000 0.034 0.000 {method 'reverse' of 'list' objects}`

It was almost three times slower. This is a question for the Pypy team.

So, the first version is what I would expect a good candidate to come up with in an interview. Extra points if they can explain the optimization.

Long objects are not amenable to jitting (they're tight C loops) and GCC is not very good with dealing with our generated C code, that would be why.

ReplyDeleteLet's represent triangle like:

ReplyDelete1

1 1

1 2 1

1 3 3 1

1 4 6 4 1

we can see that row size grows by 1

now let's analyze:

1) size of elements in a row is the same as row number -> we can reuse for loop, which counts how many rows to show

2) all elements are positive integers (maybe long integer)

hence, we should use array, because we know the size of row and the type of all elements

#-------

prev = [1,1]

for size_of_row in xrange(2, 10000):

____curr = [1] *( size_of_row +1)

____#curr[1:-1] = [ prev[index-1] + prev[index] for index in xrange(1, size_of_row) ]

____# For loop is a bit faster

____for index in xrange(1, size_of_row):

________curr[index] = prev[index-1] + prev[index]

____#print curr[:]

____prev = curr

#-------

Now lets switch to raw array from ctypes:

#-------

from ctypes import c_uint

prev = [1,1]

for size_of_row in xrange(2, 10000):

____curr = ( c_uint *( size_of_row +1) )(1) # <- 1 is an element on position 0

____curr[-1]= 1

____for index in xrange(1, size_of_row):

________curr[index] = prev[index-1] + prev[index]

____prev = curr

#-------

It is probably the maximum speed one can get from Python, .. but lets look at the triangle again

##############

______1

_____1 1

____1 2 1

___1 3 3 1

__1 4 6 4 1

1 5 10 10 5 1

##############

.. there are the same values on left and right sides, it's 'craving' for cutting off the calculations in half

#-------

from ctypes import c_uint

prev = [1,1]

for size_of_row in xrange(2, 10000):

____curr = ( c_uint *( size_of_row +1) )(1) # <- 1 is an element on position 0

____curr[-1]= 1

____for index in xrange(1, size_of_row//2 +1):

________curr[index] = curr[-index-1] = prev[index-1] + prev[index]

____prev = curr

#-------

Note: my PC is P4 3.0Ghz (10k from the post ~390s, my method ~31s)

python -m cProfile .\profile_me_pascal_triangle.py

____137 function calls in 39.677 seconds

http://pastebin.com/k1GAchT9