Longest common subsequence (Dynamic Programming)

Suppose we have a sequence of letters ACCGGTC. Then a subsequence of this sequence would be like ACCG or ACTC or CCC. To get ACTC, we pick letters 1, 2, 6, and 7.

Formally, given a sequence X = x₁, x₂, . . . , x_m, another sequence Z = z₁, . . . , z_k is a subsequence of X if there exists a strictly increasing sequence i₁, i₂, . . . , i_k of indices of X such that for all j = 1, 2, . . . , k, we have

In the longest-common-subsequence problem, we are given two sequences X and Y , and want to find the longest possible sequence that is a subsequence of both X and Y. For example, if X = ABCBDAB and Y = BDCABA, BCBA and BDAB are longest common subsequences, since there are no common sequences of length 5 or greater.

Optimal substructure
The first step to solving this using dynamic programming is to say that we can create an optimal solution to this problem using optimal solutions to subproblems. The hardest part is to decide what the subproblems are. Here there are two possible cases:

The last elements of X and Y are equal. Then they must both be part of the longest common subsequence. So we can chop both elements off the ends of the subsequence (adding them to a common subsequence) and find the longest common subsequence of the smaller sequences.
The last elements are different. Then either the last element of X or the last element of Y cannot be part of the longest common subsequence. So now we can find the longest common subsequence of X and a smaller version of Y , or the longest common subsequence of Y and a smaller version of X.

Formally, let X = x₁, . . . , x_m and Y = y₁, . . . , y_n be sequences, and let Z = z₁, . . . , z_k be any longest common subsequence of X and Y . Let X_i refer to the first i elements of X, and Y_irefer to the first i elements of Y , etc. Then

If x_m = y_n, then z_k = x_m = y_n and Z_k−1 is a longest common subsequence of X_m−1
and Y_n−1.
If x_m ≠ y_n, then z_k ≠ x_m implies that Z is a longest common subsequence of X_m−1
and Y .
If x_m ≠ y_n, then z_k ≠ y_n implies that Z is a longest common subsequence of X and Y_n−1.

A recursive solution
To store the solutions to subproblems, we use a 2D matrix (c), compute all the values c[i, j], which are the lengths of the longest common subsequences between the first i elements of X and the first j elements of Y . At the end, the answer will be stored in c[m, n].

Dynamic programming algorithm
Using this recurrence, we can write the actual pseudocode. Observe that it is necessary to populate the table in a certain order, because some elements of the table depend on other elements of the table having already been computed.

Longest common subsequence (Dynamic Programming)

Related Posts

Sharing is Caring !