Python dynamic programming performance difference

Question

I'm studying dynamic programming by doing Leetcode problems, and I frequently face time limit exceeded errors even though I'm caching my results. Can anyone explain why my version is so much slower than the official version for this problem?

There are obviously differences in the code, e.g., I use a class function for recursion while the official answer does not. My recursive function returns numeric values, the official one does not, etc. None of these seem like meaningful differences though, but the performance difference is nonetheless dramatic.

My version. This takes 0.177669 seconds to run, and receives a time limit exceeded error.

import datetime as dt
from typing import List
from functools import lru_cache


class Solution:
    def canPartition(self, nums: List[int]) -> bool:
        self.nums = nums
        total = sum(self.nums)
        if total % 2 == 1:
            return False
        half_total = total // 2
        return self.traverse(half_total, 0) == 0

    @lru_cache(maxsize=None)
    def traverse(self, subset_sum, index):
        if subset_sum < 0:
            return float('inf')
        elif index == len(self.nums):
            return subset_sum
        else:
            include = self.traverse(subset_sum - self.nums[index], index + 1)
            exclude = self.traverse(subset_sum, index + 1)
            best = min(include, exclude)
            return best


test_case = [20,68,68,11,48,18,50,5,3,51,52,11,13,11,38,100,30,87,1,56,85,63,14,96,7,17,54,11,32,61,94,13,85,10,78,57,69,92,66,28,70,20,3,29,10,73,89,86,28,48,69,54,87,11,91,32,59,4,88,20,81,100,29,75,79,82,6,74,66,30,9,6,83,54,54,53,80,94,64,77,22,7,22,26,12,31,23,26,65,65,35,36,34,1,12,44,22,73,59,99]
solution = Solution()
start = dt.datetime.now()
print(solution.canPartition(test_case))
end = dt.datetime.now()
print((end-start).total_seconds())

This is the official answer. It takes only 0.000165 seconds!

import datetime as dt
from typing import List, Tuple
from functools import lru_cache


class Solution:
    def canPartition(self, nums: List[int]) -> bool:
        @lru_cache(maxsize=None)
        def dfs(nums: Tuple[int], n: int, subset_sum: int) -> bool:
            # Base cases
            if subset_sum == 0:
                return True
            if n == 0 or subset_sum < 0:
                return False
            result = (dfs(nums, n - 1, subset_sum - nums[n - 1])
                    or dfs(nums, n - 1, subset_sum))
            return result

        # find sum of array elements
        total_sum = sum(nums)

        # if total_sum is odd, it cannot be partitioned into equal sum subsets
        if total_sum % 2 != 0:
            return False

        subset_sum = total_sum // 2
        n = len(nums)
        return dfs(tuple(nums), n - 1, subset_sum)


test_case = [20,68,68,11,48,18,50,5,3,51,52,11,13,11,38,100,30,87,1,56,85,63,14,96,7,17,54,11,32,61,94,13,85,10,78,57,69,92,66,28,70,20,3,29,10,73,89,86,28,48,69,54,87,11,91,32,59,4,88,20,81,100,29,75,79,82,6,74,66,30,9,6,83,54,54,53,80,94,64,77,22,7,22,26,12,31,23,26,65,65,35,36,34,1,12,44,22,73,59,99]
solution = Solution()
start = dt.datetime.now()
print(solution.canPartition(test_case))
end = dt.datetime.now()
print((end-start).total_seconds())

Foolhardy Hardy · Accepted Answer · 2022-01-16 15:52:41Z

1

In the former version, all possible cases are searched. While in the latter, the algorithm stops when a feasible solution has been found.

In the first version:

include = self.traverse(subset_sum - self.nums[index], index + 1)
# Suppose {include} is zero, the answer is already obtained, 
# but the algorithm still try to compute {exclude}, which is not neccessary.
exclude = self.traverse(subset_sum, index + 1)

In the second version:

result = (dfs(nums, n - 1, subset_sum - nums[n - 1])
                    or dfs(nums, n - 1, subset_sum))
# Because of the short-circuit behavior of logical operator,
# if the first branch has already obtained the solution, 
# the second branch will not be executed.

Just adding a if-check will improve the performance:

include = self.traverse(subset_sum - self.nums[index], index + 1)
# Check whether we are already done:
if include == 0:
    return include
exclude = self.traverse(subset_sum, index + 1)

answered Jan 16, 2022 at 15:52

Foolhardy Hardy

1563 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user554481 Over a year ago

Wow, that was it! Time went down to 7.8e-05. That beats the official solution! Very interesting. I like the explicit check you have at the end, but it's good to know about the short-circuit behavior too.

Roland Smith · Accepted Answer · 2022-01-16 15:48:42Z

1

It you want to know about performance, you need to profile your code. Profiling lets you see where your code spends its time.

CPython comes with built-in profiling module called cProfile. But you might want to look at e.g. line_profiler.

answered Jan 16, 2022 at 15:48

Roland Smith

43.8k3 gold badges69 silver badges98 bronze badges

Comments

Yılmaz Durmaz · Accepted Answer · 2022-01-16 16:33:28Z

1

your function returns a number that alters between a float and an int during the execution. python has to cope with it during the whole execution. whereas you need to return an answer of "yes or no" to the question "can it be partitioned?" and simple boolean Ture/False is enough and suggested.
for the same reason above, you are using a comparison function min on two recursively gained result where you have to run two recursion to their deepest levels. by using booleans, you can shortcut this process and this other program uses that shortcut.

answered Jan 16, 2022 at 16:33

Yılmaz Durmaz

3,17419 silver badges36 bronze badges

3 Comments

user554481 Over a year ago

I switched to booleans after reading your post and it didn't make a difference in time.

user554481 Over a year ago

Actually, I take that back, it did help a bit. Went from .177 to .146. Still a lot slower than the solution, but an improvement nonetheless. Upvoted.

Yılmaz Durmaz Over a year ago

@user554481, yep this float/int conversion has a small impact yet can be painful if dataset gets very large. the main impact is in the use of the recursion. we would not want to run it for all possibilites exists as it may take an infinite amount of time. the above answer you accepted, and the one in your question uses shortcut logic to make it fast: "return a result as soon as it fits your needs"

Collectives™ on Stack Overflow

Python dynamic programming performance difference

3 Answers 3

1 Comment

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related