Monday, October 31, 2016

Dynamic Programming in 5 easy steps - Examples - Set partitioning

Please find the updated version of this post here: https://piotr.westfalewicz.com/blog/2016/10/dynamic-programming-in-5-easy-steps---examples---set-partitioning/


Let's continue Dynamic Programming series. Last time we have covered Text Justification problem, this time we will do something harder. The problem statement is as follows:
Given a set S of positive integers, divide it into two sets S1 and S2 such that the absolute difference between their sums is minimum.

If there is a set S with n elements, then if we assume S1 has m elements, S2 must have n-m elements and the value of abs(sum(S1) – sum(S2)) should be minimum.
The 5 easy steps are:
  1. define subproblems, count number of subproblems
  2. guess (part of solution), count number of choices
  3. relate subproblem solutions, compute time per subproblem
  4. recurse + memoize OR build DP table bottom-up check subproblems acyclic/topological order, time = time per subproblem · number of subproblems
  5. solve original problem: = a subproblem OR by combining subproblem solutions ⇒ extra time 
Taking an example, assume a set S={ 1, 6, 11, 5}.
Step 1. As previously, we want to divide the problem to subproblems.
How this can be done? Suffixes, prefixes or the mix of those two are usually the answer. It's natural to imagine that in the middle of processing the set looks at some point as follows .., 11, 5. At this very point we already assumed we are working on suffixes. 
In that situation we can either include the 11 in the set S1 or in the set S2. I'll say that we take or leave the number, because we have only two choices.
Related, very important aspect - we have to track only one sum! The other one is just totalSum-chosenSetSum.
Getting back to our situation, what is the "context" of the subproblem? Does it matter how we got to this point? Not at all, what matters is only the chosenSetSum. Obviously, we have to also keep track of our current item index. Here we have it, the subproblems.

Step 2. How our guess will look like? We can increase the sum and progress with the problem,
or we can just progress with the problem, assuming we leave the item and it will be in the second set.

Step 3. Now we have to glue two pieces togeter. Our DP function will have two arguments as noted above, chosenSetSum and index.
We want to optimize for the minimum difference between the sets and that's what will be returned by the DP function. All in all, the equation is as follows:
DP(chosenSetSum, i) = min from:
  • DP(chosenSetSum, i+1) //when skipping the i'th item
  • DP(chosenSetSum + inputSet[i], i+1)) //when taking the i'th item
The corner case is DP(X, inputSet.Length) = totalSum-X, because that's the end of partitioning.
The time complexity of this approach is:
time = time per subproblem · number of subproblems, so
time = O(1) * O(sum * n) = O(sum * n)
Yey, we have pseudo-polynomial time solution.

Step 4. Is it actually an acyclic graph of DP function invocations? My reasoning here is that we have two dimensions (chosenSetSum and item index) where both increase and there is an end when the second variable reaches input set cardinality, thus it's finite algorithm.

Step 5. We start with chosenSetSum=0 and at the first index, therefore our solution is at DP(0,0).

The Code

Having the 5 steps figured out, it's relatively easy to code the solution.

Pure DP algorithm coded without memorization

private class Partitioner
{
    private readonly int[] _inputSet;
    private readonly int _inputSetSum;

    public Partitioner(int[] inputSet)
    {
        _inputSet = inputSet;
        _inputSetSum = _inputSet.Sum();
    }

    public void Solve()
    {
        DP(0, 0);
    }

    private int DP(int chosenSetSum, int i)
    {
        if (i == _inputSet.Length)
            //Note: returning the difference between chosenSetSum and skippedSetSum, 
            //not the skippedSetSum (which would be _inputSetSum - chosenSetSum)
            return Math.Abs(_inputSetSum - 2 * chosenSetSum);

        var chooseCurrentNumberScore = DP(chosenSetSum + _inputSet[i], i + 1);
        var skipCurrentNumberScore = DP(chosenSetSum, i + 1);
        var minDiff = Math.Min(skipCurrentNumberScore, chooseCurrentNumberScore) ;

        return minDiff;
    }
}
How cool is that? 6 lines of code for almost working solution!

Memoization added

private class Partitioner
{
    private const int Marker = -1;
    private readonly int[] _inputSet;
    private readonly int _inputSetSum;
    private readonly int[][] _memo;

    public Partitioner(int[] inputSet)
    {
        _inputSet = inputSet;
        _inputSetSum = _inputSet.Sum();
        _memo = new int[_inputSetSum][];
        for (var i = 0; i < _inputSetSum; i++)
        {
            _memo[i] = new int[_inputSet.Length];
            for (var j = 0; j < _inputSet.Length; j++)
                _memo[i][j] = Marker;
        }
    }

    public void Solve()
    {
        DP(0, 0);
    }

    private int DP(int chosenSetSum, int i)
    {
        if (i == _inputSet.Length)
            //Note: returning the difference between chosenSetSum and skippedSetSum, 
            //not the skippedSetSum (which would be _inputSetSum - chosenSetSum)
            return Math.Abs(_inputSetSum - 2 * chosenSetSum);

        if (_memo[chosenSetSum][i] != Marker)
            return _memo[chosenSetSum][i];

        var chooseCurrentNumberScore = DP(chosenSetSum + _inputSet[i], i + 1);
        var skipCurrentNumberScore = DP(chosenSetSum, i + 1);
        var minDiff = Math.Min(skipCurrentNumberScore, chooseCurrentNumberScore);

        _memo[chosenSetSum][i] = minDiff;
        return minDiff;
    }
}

Full solution with parent pointers and examples

using System;
using System.Collections.Generic;
using System.Linq;

public class DP_SetPartition
{
    static void Main(String[] args)
    {
        PrintSolution(new int[0]);
        PrintSolution(new[] { 3 });
        PrintSolution(new[] { 3, 3 });
        PrintSolution(new[] { 3, 2 });
        PrintSolution(new[] { 1, 4, 2 });
        PrintSolution(new[] { 1, 2, 4 });
        PrintSolution(new[] { 1, 1, 5, 1, 1, 1 });
        PrintSolution(new[] { 1, 6, 11, 5 });
        PrintSolution(new[] { 1, 6, 11, 5 });
        PrintSolution(new[] { 25, 21, 20, 17, 8 });
        PrintSolution(new[] { 1, 5, 6, 10 });
        PrintSolution(new[] { 3, 1, 4, 2, 2, 1 });
        PrintSolution(new[] { 200, 200, 300, 300, 400, 400, 1000, 1000 });
        PrintSolution(new[] { 100, 100, 200, 200, 300, 300, 400, 400, 1000, 1000 });
        Console.ReadLine();
    }

    private static void PrintSolution(int[] set)
    {
        Console.WriteLine($"Set: {string.Join(", ", set)}");
        int[] set1;
        int[] set2;
        new Partitioner(set).Solve(out set1, out set2);
        Console.WriteLine($"A: {string.Join(", ", set1)} \t\t B: {string.Join(", ", set2)}, sumdiff: {Math.Abs(set1.Sum() - set2.Sum())}");
        Console.WriteLine();
    }

    private class Partitioner
    {
        private const int Marker = -1;
        private readonly int[] _inputSet;
        private readonly bool[][] _parentPointers;
        private readonly int _inputSetSum;
        private readonly int[][] _memo;

        public Partitioner(int[] inputSet)
        {
            _inputSet = inputSet;
            _inputSetSum = _inputSet.Sum();
            _memo = new int[_inputSetSum][];
            _parentPointers = new bool[_inputSetSum][];
            for (var i = 0; i < _inputSetSum; i++)
            {
                _memo[i] = new int[_inputSet.Length];
                _parentPointers[i] = new bool[_inputSet.Length];
                for (var j = 0; j < _inputSet.Length; j++)
                {
                    _memo[i][j] = Marker;
                }
            }
        }

        public void Solve(out int[] set1, out int[] set2)
        {
            DP(0, 0);
            var chosenSet = new List<int>();
            var skippedSet = new List<int>();
            var sum = 0;
            var i = 0;
            while (i != _inputSet.Length)
            {
                var choose = _parentPointers[sum][i];
                if (choose)
                {
                    chosenSet.Add(_inputSet[i]);
                    sum += _inputSet[i];
                }
                else
                {
                    skippedSet.Add(_inputSet[i]);
                }
                i++;
            }
            set1 = chosenSet.ToArray();
            set2 = skippedSet.ToArray();
        }

        private int DP(int chosenSetSum, int i)
        {
            if (i == _inputSet.Length)
                return Math.Abs(_inputSetSum - 2 * chosenSetSum);

            if (_memo[chosenSetSum][i] != Marker)
                return _memo[chosenSetSum][i];

            var chooseCurrentNumberScore = DP(chosenSetSum + _inputSet[i], i + 1);
            var skipCurrentNumberScore = DP(chosenSetSum, i + 1);
            var minDiff = skipCurrentNumberScore;
            if (chooseCurrentNumberScore < skipCurrentNumberScore)
            {
                minDiff = chooseCurrentNumberScore;
                _parentPointers[chosenSetSum][i] = true;
            }

            _memo[chosenSetSum][i] = minDiff;
            return minDiff;
        }
    }
}
It's working!

Tuesday, October 25, 2016

Dynamic Programming in 5 easy steps - Examples - Text Justification

Please find the updated version of this post here: https://piotr.westfalewicz.com/blog/2016/10/dynamic-programming-in-5-easy-steps---examples---text-justification/


Lately I spent a relatively great amount of time revising Algorithms and Data Structures. It's a broad subject, but for me personally I found the Dynamic Programming as the abandoned and unused method. It's also considered as one of the hardest methods to master, with few examples on the internet. Let's contribute a little with this post series. Today I will cover the first problem - text justification. The problem is from MIT lectures and I highly recommend well explained videos along with notes from course 6.006: Introduction to algorithms - fall 2011.

Text justification

Given a text, split it into multiple lines so it is “nicely” distributed between each line.
What does it mean? Take a look at the example:
We can clearly see that eager approach won't produce good results in all cases.
Mathematically, given a badness(i, j) function for line of words[i : j], we seek for such a splitting that the sum of all badness is minimum.
Let's assume the badness function is predefined as follows:
∞ if total length > page width, else (page width − total length)3
So in the example above, the bad splitting would have:
  • page width = 16
  • first line length = 4+4+4+2 = 16, badness of the first line = 0
  • second line length = 4, badness = (16-4)3=1728
  • third line length = 16, badness = 0
  • badness sum = 1728
The good splitting would have:
  • first line length = 4+4+1 = 9, badness of the first line = (16-9)3=343
  • second line length = 9, badness = (16-9)3=343
  • third line length = 16, badness = 8
  • badness sum = 694
The second splitting has smaller total badness sum, thus it's a better splitting.

Dynamic Programming in 5 easy steps

On the MIT course, there is presented an approach to DP in 5 easy steps. Those are:
  1. define subproblems, count number of subproblems
  2. guess (part of solution), count number of choices
  3. relate subproblem solutions, compute time per subproblem
  4. recurse + memoize OR build DP table bottom-up check subproblems acyclic/topological order, time = time per subproblem · number of subproblems
  5. solve original problem: = a subproblem OR by combining subproblem solutions ⇒ extra time 
Let's try to apply them on our problem.
Well, the steps don't have to be executed always in order. Let's try to figure out the second step. We know where our line begins, it's on ith position. Where is the end of the line? We don't know. So let's check all possibilities, that's from ith onward and take the smallest badness. So now we know what's our subproblems in first step are - suffixes of word[i:]. That's pretty much the algorithm. The third step will look like:
DP(i)=for j in i+1...n+1 take
min from (DP(j) + badness(i, j))
The great thing is about DP that DP(j) considered to be free (because we compute the subproblem once and memorize the answer). Therefore the total time needed by this algorithm is:
time = time per subproblem · number of subproblems, so
time = O(n) · O(n)
time = O(n^2)
The fourth point is about proving that this really works and the time needed is finite. We compute the answer from the end of the words, down to word number 0. The recursive DP function always accesses greater indexes, therefore time needed is finite. Lastly, we have to find our answer. Here, obviously it's DP[0]. Easy? Yup, let's code it!

The code

using System;
using System.Collections.Generic;
using System.Linq;

namespace Algos
{
    public class DP_TextJustification
    {
        static void Main(String[] args)
        {
            //please note: this is not a production-ready solution!
            PrintSolution("   ", 10);
            PrintSolution("oneword", 10);
            PrintSolution("one line", 10);
            PrintSolution("two lines", 6);
            PrintSolution("blah blah blah blah reallylongword", 14);
            PrintSolution("blah blah blah blah reallylongword 1", 14);
            var txt1 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " +
                       "Praesent varius urna eget lacinia pharetra. " +
                       "Vivamus lacinia in dui vitae sodales. " +
                       "Aliquam auctor justo nec condimentum pretium. " +
                       "Curabitur egestas tellus dolor, vel tempus lacus blandit vitae. " +
                       "Cras dapibus scelerisque ex nec gravida.";
            PrintSolution(txt1, 60);
            PrintSolution(txt1, 70);
            PrintSolution(txt1, 80);
            PrintSolution(txt1, 90);
            PrintSolution(txt1, 100);
            Console.ReadLine();
        }

        public static void PrintSolution(string txt, int width)
        {
            Console.WriteLine($"Test length: {txt.Length}, line width: {width}");
            foreach (var line in new Justifier(txt, width).Solve())
            {
                Console.WriteLine("{0,-" + width + "}|", line);
            }
            Console.WriteLine();
        }

        private class Justifier
        {
            private const int Marker = -1;
            private readonly int _width;
            private readonly string[] _words;
            private readonly int[] _parentPointers;
            private readonly int[] _memo;
             
            public Justifier(string text, int width)
            {
                _width = width;
                _words = text.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries);
                _parentPointers = new int[_words.Length];
                _memo = new int[_words.Length];
                for (var i = 0; i < _memo.Length; i++)
                {
                    _memo[i] = Marker;
                }
            }

            public List<string> Solve()
            {
                DP(0);

                var result = new List<string>();
                var from = 0;
                while (from != _words.Length)
                {
                    var next = _parentPointers[from];
                    result.Add(string.Join(" ", _words.Skip(from).Take(next - from)));
                    from = next;
                }
                return result;
            }

            private int DP(int i)
            {
                if (i == _words.Length) //no words in line is the end of justification
                    return 0;

                if (_memo[i] != Marker) //if we have solution calculated, return it
                    return _memo[i];

                var minBadness = int.MaxValue;

                for (var j = i + 1; j <= _words.Length; j++)
                {
                    var badness = Badness(i, j) + DP(j);
                    if (minBadness > badness)
                    {
                        _parentPointers[i] = j; //remember best choice
                        minBadness = badness;
                    }
                }

                _memo[i] = minBadness; //remember solution
                return minBadness;
            }

            private int Badness(int i, int j)
            {
                var lengthOfWords = _words.Skip(i).Take(j - i).Sum(s => s.Length);
                var numberOfSpaces = j - i - 1;
                if (lengthOfWords + numberOfSpaces > _width)
                    return 10*1000*1000;
                return (int)Math.Pow(_width - lengthOfWords - numberOfSpaces, 3);
            }
        }
    }
}
To the solution in 5 easy steps I've added parent pointers - a simple way to remember the best solution from DP. The result is as follows:
If we don't care about justifying the last line, we can easily modify the badness function to return no cost for the last line:
private int Badness(int i, int j)
{
    var lengthOfWords = _words.Skip(i).Take(j - i).Sum(s => s.Length);
    var numberOfSpaces = j - i - 1;
    if (lengthOfWords + numberOfSpaces > _width)
        return 10*1000*1000;
    if (j == _words.Length) //don't care about last line
        return 0;
    return (int)Math.Pow(_width - lengthOfWords - numberOfSpaces, 3);
}
In this case, the result will be much different in some cases: