Most Common Word

Problem

Given a string paragraph and a string array of the banned words, return the most frequent word that is not banned. It is guaranteed there is at least one word that isn't banned, and the answer is unique.

Examples

Example 1:

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"] Output: "ball" Explanation: "hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph.

Example 2:

Input: paragraph = "a.", banned = [] Output: "a"

Constraints

1 <= paragraph.length <= 1000
The paragraph consists of English letters, space, or the punctuation symbols: "!?',;.".
There are no two consecutive punctuation characters in paragraph.
1 <= banned.length <= 100
1 <= banned[i].length <= 10
The banned[i] consists of only lowercase English letters.

Approach

To solve this problem, we need to:

Clean and split the paragraph into words.
Store the banned words in a set for quick lookup.
Use a dictionary to count the occurrences of each non-banned word.
Identify the word with the highest count.

Detailed steps:

Convert all letters in the paragraph to lowercase.
Replace punctuation with spaces to facilitate splitting into words.
Split the cleaned paragraph into words.
Count the occurrences of each word that is not in the banned list.
Return the word with the highest frequency.

Solution

Code in Different Languages

C++ Solution

#include <iostream>
#include <sstream>
#include <unordered_set>
#include <unordered_map>
#include <vector>
#include <cctype>

using namespace std;

string mostCommonWord(string paragraph, vector<string>& banned) {
    unordered_set<string> banned_set(banned.begin(), banned.end());
    unordered_map<string, int> word_count;
    string word;
    int max_count = 0;
    string result;

    for (char& c : paragraph) {
        c = isalnum(c) ? tolower(c) : ' ';
    }

    istringstream iss(paragraph);
    while (iss >> word) {
        if (banned_set.find(word) == banned_set.end()) {
            word_count[word]++;
            if (word_count[word] > max_count) {
                max_count = word_count[word];
                result = word;
            }
        }
    }

    return result;
}

int main() {
    string paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.";
    vector<string> banned = {"hit"};
    cout << mostCommonWord(paragraph, banned) << endl;  // Output: "ball"
}

Java Solution

import java.util.*;

public class MostCommonWord {
    public String mostCommonWord(String paragraph, String[] banned) {
        Set<String> bannedSet = new HashSet<>(Arrays.asList(banned));
        Map<String, Integer> wordCount = new HashMap<>();
        String[] words = paragraph.toLowerCase().split("\\W+");
        int maxCount = 0;
        String result = "";

        for (String word : words) {
            if (!bannedSet.contains(word)) {
                wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
                if (wordCount.get(word) > maxCount) {
                    maxCount = wordCount.get(word);
                    result = word;
                }
            }
        }

        return result;
    }

    public static void main(String[] args) {
        MostCommonWord solution = new MostCommonWord();
        String paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.";
        String[] banned = {"hit"};
        System.out.println(solution.mostCommonWord(paragraph, banned));  // Output: "ball"
    }
}

Python Solution

import re
from collections import defaultdict

def mostCommonWord(paragraph, banned):
    banned_set = set(banned)
    word_count = defaultdict(int)

    # Normalize the paragraph
    normalized_str = re.sub(r'[^\w\s]', ' ', paragraph).lower()

    # Split the paragraph into words
    words = normalized_str.split()

    # Count the occurrences of each word
    max_count = 0
    result = ""
    for word in words:
        if word not in banned_set:
            word_count[word] += 1
            if word_count[word] > max_count:
                max_count = word_count[word]
                result = word

    return result

# Test
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
print(mostCommonWord(paragraph, banned))  # Output: "ball"

Complexity Analysis

Time Complexity: O(n)

Reason: We traverse the paragraph and words array, each operation inside the loop takes constant time.

Space Complexity: O(n)

Reason: We use extra space for the set of banned words and the dictionary to count word occurrences.

This solution efficiently finds the most common word in a paragraph that is not banned by using a set for quick lookup and a dictionary for counting occurrences. The time complexity is linear, and the space complexity is linear as well, making it suitable for large input sizes.

References

LeetCode Problem: Most Common Word

Problem​

Examples​

Constraints​

Approach​

Solution​

Code in Different Languages​

C++ Solution​

Java Solution​

Python Solution​

Complexity Analysis​

References​