Skip to main content

Knuth-Morris-Pratt (KMP) Algorithm (Geeks for Geeks)

1. What is the Knuth-Morris-Pratt (KMP) Algorithm?​

The Knuth-Morris-Pratt (KMP) algorithm is an efficient pattern matching algorithm that searches for occurrences of a wordword WW within a main textstringtext string TT by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.

2. Algorithm for Knuth-Morris-Pratt (KMP)​

  1. Preprocess the pattern to create an array (LPS array) that stores the length of the longest prefix which is also a suffix.
  2. Use the LPSarrayLPS array to skip characters while matching.

3. How does the Knuth-Morris-Pratt (KMP) Algorithm work?​

  • The KMP algorithm preprocesses the pattern to determine the longest prefix which is also a suffix in the pattern itself.
  • This preprocessing is used to avoid unnecessary re-evaluation of the characters in the text, making the search process more efficient.

4. Problem Description​

Given a text string and a pattern string, implement the Knuthβˆ’Morrisβˆ’Pratt(KMP)Knuth-Morris-Pratt (KMP) algorithm to find all occurrences of the pattern in the text.

5. Examples​

Example 1:

Input: text = "ABABDABACDABABCABAB", pattern = "ABABCABAB"
Output: Pattern found at index 10

Example 2:

Input: text = "AABAACAADAABAABA", pattern = "AABA"
Output: Pattern found at index 0, Pattern found at index 9, Pattern found at index 12

Explanation of Example 1:

  • The pattern "ABABCABAB" is found in the text "ABABDABACDABABCABAB" starting from index 10.

Visual Example​

Visual Example of KMP

6. Constraints​

  • The text and pattern can contain any number of characters.
  • All characters are ASCIIASCII characters.

7. Implementation​

Written by @ngmuraqrdd
def KMPSearch(pat, txt):
M = len(pat)
N = len(txt)

lps = [0] * M
j = 0

computeLPSArray(pat, M, lps)

i = 0
while i < N:
if pat[j] == txt[i]:
i += 1
j += 1

if j == M:
print("Found pattern at index " + str(i - j))
j = lps[j - 1]
elif i < N and pat[j] != txt[i]:
if j != 0:
j = lps[j - 1]
else:
i += 1

def computeLPSArray(pat, M, lps):
length = 0
lps[0] = 0
i = 1

while i < M:
if pat[i] == pat[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1

# Example usage:
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
KMPSearch(pattern, text)

8. Complexity Analysis​

  • Time Complexity:

    • Preprocessing the LPS array: O(M)O(M)
    • Searching the pattern in the text: O(N)O(N)
    • Overall: O(M+N)O(M + N)
  • Space Complexity: O(M)O(M) for the LPS array.

9. Advantages and Disadvantages​

Advantages:

  • More efficient than the naive pattern matching algorithm, especially for large texts.
  • Avoids redundant comparisons.

Disadvantages:

  • Slightly more complex to implement than naive algorithms.
  • Requires additional space for the LPS array.

10. References​