Skip to main content

Boyer-Moore Algorithm (Geeks for Geeks)

1. What is the Boyer-Moore Algorithm?​

The Boyer-Moore algorithm is an efficient string searching algorithm that skips sections of the text to find the pattern, making it faster than many other algorithms. It uses two main heuristics: the Bad Character Rule and the Good Suffix Rule, which allow it to skip over sections of the text that do not need to be checked.

2. Algorithm for Boyer-Moore​

  1. Preprocess the pattern to create two arrays (Bad Character and Good Suffix).
  2. Use these arrays to skip characters while matching the pattern with the text.

3. How does the Boyer-Moore Algorithm work?​

  • Bad Character Rule: When a mismatch occurs, the character in the text causing the mismatch is used to shift the pattern.
  • Good Suffix Rule: When a mismatch occurs, the pattern is shifted to align with the next possible match of the good suffix (the portion of the pattern that matched just before the mismatch).

4. Problem Description​

Given a text string and a pattern string, implement the Boyerβˆ’MooreBoyer-Moore algorithm to find all occurrences of the pattern in the text.

5. Examples​

Example 1:

Input: text = "ABABDABACDABABCABAB", pattern = "ABABCABAB"
Output: Pattern found at index 10

Example 2:

Input: text = "AABAACAADAABAABA", pattern = "AABA"
Output: Pattern found at index 0, Pattern found at index 9, Pattern found at index 12

Explanation of Example 1:

  • The pattern "ABABCABAB" is found in the text "ABABDABACDABABCABAB" starting from index 10.

Visual Example​

Visual Example of Boyer-Moore

6. Constraints​

  • The text and pattern can contain any number of characters.
  • All characters are ASCIIASCII characters.

7. Implementation​

Written by @Aarti_Rathi
 def bad_char_heuristic(pattern):
bad_char = [-1] * 256
for i in range(len(pattern)):
bad_char[ord(pattern[i])] = i
return bad_char

def boyer_moore_search(text, pattern):
m = len(pattern)
n = len(text)
bad_char = bad_char_heuristic(pattern)

s = 0
while s <= n - m:
j = m - 1
while j >= 0 and pattern[j] == text[s + j]:
j -= 1
if j < 0:
print("Pattern found at index " + str(s))
s += (m - bad_char[ord(text[s + m])] if s + m < n else 1)
else:
s += max(1, j - bad_char[ord(text[s + j])])


# Example usage
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
boyer_moore_search(text, pattern)

8. Complexity Analysis​

  • Time Complexity:

    • Preprocessing the Bad Character and Good Suffix tables: O(m+∣Σ∣)O(m + |Ξ£|), where mm is the length of the pattern and ∣Σ∣|Ξ£| is the size of the alphabet.
    • Searching the pattern in the text: O(n)O(n) in the best case and O(mn)O(mn) in the worst case, where nn is the length of the text.
    • Overall: Efficient in practice with average time complexity O(n/m)O(n/m).
  • Space Complexity: O(m+∣Σ∣)O(m + |Ξ£|) for the Bad Character and Good Suffix tables.

9. Advantages and Disadvantages​

Advantages:

  • Very efficient for large texts with small patterns.
  • Uses heuristics to skip sections of the text, making it faster than many other algorithms.

Disadvantages:

  • More complex to implement compared to simpler algorithms like the Naive Pattern Matching algorithm.
  • The worst-case time complexity can be higher for certain patterns and texts.

10. References​