Aho-Corasick Algorithm for Efficient String Matching
Problem Statementβ
Problem Descriptionβ
The Aho-Corasick Algorithm is designed for searching multiple patterns simultaneously within a given text. It constructs a finite state machine that resembles a digital tree with additional links between nodes, allowing efficient transitions between patterns.
Examplesβ
Example 1:
Input: 
Patterns: {"he", "she", "his", "hers"}
Text: "ahishers"
Output: 
Pattern found: he at index 1
Pattern found: his at index 1
Pattern found: she at index 3
Pattern found: hers at index 4
Explanation: All the patterns are efficiently found within the text.
Constraintsβ
- The input consists of multiple patterns and a single text.
- The algorithm should handle large patterns and text sizes efficiently.
Solution of Given Problemβ
Intuition and Approachβ
The Aho-Corasick Algorithm follows these steps:
- Build a Trie: Insert all patterns into a trie.
- Construct Failure Links: Create failure links to enable efficient transitions when a mismatch occurs.
- Search the Text: Use the trie and failure links to search the text for all patterns simultaneously.
Approachesβ
Codes in Different Languagesβ
- C++
 #include <bits/stdc++.h>
 using namespace std;
 struct TrieNode {
 unordered_map<char, TrieNode*> children;
 TrieNode* failure;
 vector<int> output;
 TrieNode() : failure(nullptr) {}
};
class AhoCorasick {
 TrieNode* root;
 vector<string> patterns;
public:
 AhoCorasick() {
     root = new TrieNode();
 }
 void addPattern(const string& pattern, int index) {
     TrieNode* node = root;
     for (char c : pattern) {
         if (node->children.find(c) == node->children.end()) {
             node->children[c] = new TrieNode();
         }
         node = node->children[c];
     }
     node->output.push_back(index);
 }
 void buildFailureLinks() {
     queue<TrieNode*> q;
     root->failure = root;
     for (auto& child : root->children) {
         child.second->failure = root;
         q.push(child.second);
     }
     while (!q.empty()) {
         TrieNode* current = q.front();
         q.pop();
         for (auto& child : current->children) {
             char c = child.first;
             TrieNode* fail = current->failure;
             while (fail != root && fail->children.find(c) == fail->children.end()) {
                 fail = fail->failure;
             }
             if (fail->children.find(c) != fail->children.end()) {
                 child.second->failure = fail->children[c];
             } else {
                 child.second->failure = root;
             }
             child.second->output.insert(child.second->output.end(),
                 child.second->failure->output.begin(), child.second->failure->output.end());
             q.push(child.second);
         }
     }
 }
 void search(const string& text) {
     TrieNode* node = root;
     for (int i = 0; i < text.size(); ++i) {
         char c = text[i];
         while (node != root && node->children.find(c) == node->children.end()) {
             node = node->failure;
         }
         if (node->children.find(c) != node->children.end()) {
             node = node->children[c];
         } else {
             node = root;
         }
         if (!node->output.empty()) {
             for (int index : node->output) {
                 cout << "Pattern found: " << patterns[index] << " at index " << i - patterns[index].size() + 1 << "\n";
             }
         }
     }
 }
 void initialize(const vector<string>& patterns) {
     this->patterns = patterns;
     for (int i = 0; i < patterns.size(); ++i) {
         addPattern(patterns[i], i);
     }
     buildFailureLinks();
 }
};
int main() {
 int n;
 cout << "Enter number of patterns: ";
 cin >> n;
 vector<string> patterns(n);
 cout << "Enter patterns:\n";
 for (int i = 0; i < n; ++i) {
     cin >> patterns[i];
 }
 string text;
 cout << "Enter text: ";
 cin >> text;
 AhoCorasick ac;
 ac.initialize(patterns);
 ac.search(text);
 return 0;
}
Complexity Analysisβ
- Time Complexity:  where Nis the length of the text,Mis the total length of all patterns, andZis the number of pattern occurrences.
- Space Complexity:
The time complexity accounts for building the trie, constructing failure links, and searching the text. The space complexity is linear with respect to the total length of the patterns.
Video Explanation of Given Problemβ
Authors:
Loading...