Word Frequency
Problem Description​
Write a bash script to calculate the frequency of each word in a text file words.txt.
For simplicity sake, you may assume:
words.txt contains only lowercase characters and space ' ' characters. Each word must consist of lowercase characters only. Words are separated by one or more whitespace characters.
Examples​
Example 1:
Assume that words.txt has the following content:
the day is sunny the the
the sunny is is
Your script should output the following, sorted by descending frequency:
the 4
is 3
sunny 2
day 1
Constraints​
- The input text file
words.txt
contains only lowercase characters and spaces. - Words are separated by one or more whitespace characters.
Solution for Word Frequency Problem​
Intuition And Approach​
To solve this problem using Unix tools, we can leverage a series of commands piped together to process the text file. The approach includes:
- Replacing spaces with newlines to handle word separation.
- Sorting the words to prepare for counting duplicates.
- Using
uniq
to count the occurrences of each word. - Sorting the counts in descending order.
- Formatting the output to display word frequency.
Code​
- bash
tr -s ' ' '\n' < words.txt | sort | uniq -c | sort -nr | awk '{print $2, $1}'
References​
- LeetCode Problem: Word frequency Problem
- Solution Link: Word-Frequency Solution on LeetCode
- Authors GeeksforGeeks Profile: Mahek Patel