Don’t Do Anything Twice - Memoization

Assalamualaikum warahmatullah. Just recently I attended the webinar Dynamic Programming and Local Memoization in Competitive Programming with Misof held by Topcoder. Misof showed us how Memoization can extremely speed up our programs.

There’s no real secret to the design of efficient algorithms. There are only two things you have to make sure: Don’t do anything stupid, and Don’t do anything twice. The Memoization is all about the latter one.

Mathematicians often say that Computer Science is a branch of Mathematics. Well in some sense it is, but in another sense it isn’t. There are some differences between Mathematics and Computer Science. One of the differences is, in Mathematics if we have a Function (e.g Sine, Cosine, Tan). The output for the same input parameters is always the same. Like cos(90) is always 0, sin(90) is always 1. But in Computer Science/Programming, if we have a Function, it can have many different factors that can influence the output of the function. Like we can have date, or some random numbers in the function, that can change the output for the same input parameters over time.

Let’s go back a little bit, so what’s the Memoization anyway? In a simple term, Memoization prevent us to Do the same thing twice by storing the results and returning the results when the same inputs occur again, much like cache.

To implement the Memoization to the function, we need to make sure that our function (in programming sense) is basically the same thing as in the mathematics sense: It always generate the same output for the same input parameters, so that we can cache it and get it back later.

Let’s take fibonacci as the example. In Mathematics, Fibonacci numbers is sequence of number when each number is the sum of the two preceeding ones. In programming, we can do exactly the same as the mathematical definition.

def fibonacci(n):
    if n in [0, 1]:
        ans = n
    else:
        ans = fibonacci(n-1) + fibonacci(n-2)
    return ans

n = int(input())
print(fibonacci(n))

Let’s try to run above program with some n inputs.

$ time python3 fibo.py <<< 3
2

real    0m0.050s
user    0m0.043s
sys     0m0.007s

$ time python3 fibo.py <<< 15
610

real    0m0.050s
user    0m0.030s
sys     0m0.020s

$ time python3 fibo.py <<< 30
832040

real    0m0.284s
user    0m0.273s
sys     0m0.010s

$ time python3 fibo.py <<< 35
9227465

real    0m2.772s
user    0m2.761s
sys     0m0.007s

$ time python3 fibo.py <<< 40
102334155

real    0m29.701s
user    0m29.575s
sys     0m0.057s

You can see that the time grows exponentially. When you count fibonacci 30, it only took 0.2 seconds, but when you count fibonacci 35, it took 2.7 seconds, even worse when you count fibonacci 40, it took almost half a minute. Let’s add a print to see what happened.

def fibonacci(n):
    print(f'fibonacci({n})')
    if n in [0, 1]:
        ans = n
    else:
        ans = fibonacci(n-1) + fibonacci(n-2)
    return ans

n = int(input())
print(fibonacci(n))

$ python3 fibo.py <<< 3
fibonacci(3)
fibonacci(2)
fibonacci(1)
fibonacci(0)
fibonacci(1)
2

It appears that fibonacci(1) are processed twice, it violates the rules of effective algorithms. Let’s do one more check with different input.

$ python3 fibo.py <<< 30 | sort | uniq -c
832040
 514229 fibonacci(0)
 832040 fibonacci(1)
fibonacci(10)
fibonacci(11)
fibonacci(12)
fibonacci(13)
fibonacci(14)
fibonacci(15)
fibonacci(16)
fibonacci(17)
fibonacci(18)
fibonacci(19)
 514229 fibonacci(2)
fibonacci(20)
fibonacci(21)
...

It also appears that some input parameters are processed multiple times. For example, you can see that fibonacci(1) was processed 832040 times although they generate the same value, it’s extremely inefficient. This is where Memoization will come in.

Let’s modify the previous code a little bit.

memo = {}

def fibonacci(n):
    if n in memo:
        ans = memo[n]
    else:
        if n in [0, 1]:
            ans = n
        else:
            ans = fibonacci(n-1) + fibonacci(n-2)
        memo[n] = ans
    return ans

n = int(input())
print(fibonacci(n))

I added a dictionary called memo. Whenever n is already on the memo, we simply get the value from memo instead of re-counting the fibonacci value, and when n is not in Memo, we count the fibonacci number of n and store it on the memo. See how Memoization extremely improves the processing time:

$ time python3 fibo.py <<< 30
832040

real    0m0.032s
user    0m0.028s
sys     0m0.004s

$ time python3 fibo.py <<< 40
102334155

real    0m0.028s
user    0m0.024s
sys     0m0.003s

$ time python3 fibo.py <<< 100
354224848179261915075

real    0m0.031s
user    0m0.025s
sys     0m0.006s

$ time python3 fibo.py <<< 500
139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125

real    0m0.033s
user    0m0.029s
sys     0m0.003s

Previously without Memo, we need 29 seconds to generate fibonacci of 40, now with Memo, it only took 0.02 seconds! Even better, we can generate fibonacci of 500 with almost the same processing time. You can also see that when we do Memoization, all the input parameters only processed once now:

$ python3 fibo.py <<< 30 | sort | uniq -c
832040
fibonacci(0)
fibonacci(1)
fibonacci(10)
fibonacci(11)
fibonacci(12)
fibonacci(13)
fibonacci(14)
fibonacci(15)
fibonacci(16)
fibonacci(17)
fibonacci(18)
fibonacci(19)
fibonacci(2)
fibonacci(20)
fibonacci(21)
...

In the webinar, Misof also shows us a more advanced example in using Memoization, but I think the fibonacci example is enough to give us the main idea on how Memoization works. If you are interested, you can find the recorded webinar on topcoder, or youtube.

Wassalamualaikum warahmatullah

Don’t Do Anything Twice - Memoization

Categories

Tags