Pattern searching is an important problem. Worst case time complexity of the Naive algorithm is O(m(n-m+1)). The time complexity of KMP algorithm is O(n) in the worst case.

Every time the naive function fails (or succeeds), it starts matching over from the next character. This might not be necessary. We could use our knowledge of the point of last matching and the structure of the test string to know where the next matching should begin from. KMP algorithm preprocesses pattern and constructs an auxiliary array (longest proper prefix which is also suffix) which is used to skip characters while matching.

A proper prefix is prefix with whole string not allowed. For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A” and “AB”. Suffixes of the string are “”, “C”, “BC” and “ABC”.

Example:

Construct a auxiliary array of next[j]. For each j, figure out:
next[j] = length of longest prefix in “pattern[0] .. pattern[j]” that matches the suffix of “pattern[1] .. pattern[j]”

That is:

  • Prefix must include pattern[0]
  • Suffix must include pattern[j]
  • Prefix and suffix are different

Array for pattern “ABABAC”:

j012345
Substring 0 to jAABABAABABABABAABABAC
Longest prefix-suffix matchnonenoneAABABAnone
next[j]001230

 

Given j, let n = next[j] i.e. first n character of pattern matches with last n character i.e.
“pattern[0] .. pattern[n-1]” = “pattern[j-(n-1)] .. pattern[j]”

e.g. j = 4, n = 3,
“pattern[0] .. pattern[2]” = “pattern[2] .. pattern[4]”