Suppose we have a sequence of letters ACCGGTC. Then a subsequence of this sequence would be like ACCG or ACTC or CCC. To get ACTC, we pick letters 1, 2, 6, and 7.
Formally, given a sequence X = x1, x2, . . . , xm, another sequence Z = z1, . . . , zk is a subsequence of X if there exists a strictly increasing sequence i1, i2, . . . , ik of indices of X such that for all j = 1, 2, . . . , k, we have
In the longest-common-subsequence problem, we are given two sequences X and Y , and want to find the longest possible sequence that is a subsequence of both X and Y. For example, if X = ABCBDAB and Y = BDCABA, BCBA and BDAB are longest common subsequences, since there are no common sequences of length 5 or greater.
Optimal substructure
The first step to solving this using dynamic programming is to say that we can create an optimal solution to this problem using optimal solutions to subproblems. The hardest part is to decide what the subproblems are. Here there are two possible cases:
- The last elements of X and Y are equal. Then they must both be part of the longest common subsequence. So we can chop both elements off the ends of the subsequence (adding them to a common subsequence) and find the longest common subsequence of the smaller sequences.
- The last elements are different. Then either the last element of X or the last element of Y cannot be part of the longest common subsequence. So now we can find the longest common subsequence of X and a smaller version of Y , or the longest common subsequence of Y and a smaller version of X.
Formally, let X = x1, . . . , xm and Y = y1, . . . , yn be sequences, and let Z = z1, . . . , zk be any longest common subsequence of X and Y . Let Xi refer to the first i elements of X, and Yi refer to the first i elements of Y , etc. Then
- If xm = yn, then zk = xm = yn and Zk−1 is a longest common subsequence of Xm−1
and Yn−1. - If xm ≠ yn, then zk ≠ xm implies that Z is a longest common subsequence of Xm−1
and Y . - If xm ≠ yn, then zk ≠ yn implies that Z is a longest common subsequence of X and Yn−1.
A recursive solution
To store the solutions to subproblems, we use a 2D matrix (c), compute all the values c[i, j], which are the lengths of the longest common subsequences between the first i elements of X and the first j elements of Y . At the end, the answer will be stored in c[m, n].
Dynamic programming algorithm
Using this recurrence, we can write the actual pseudocode. Observe that it is necessary to populate the table in a certain order, because some elements of the table depend on other elements of the table having already been computed.