Skip to main content

Word Frequency

Problem Description​

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

words.txt contains only lowercase characters and space ' ' characters. Each word must consist of lowercase characters only. Words are separated by one or more whitespace characters.

Examples​

Example 1:

Assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

Constraints​

  • The input text file words.txt contains only lowercase characters and spaces.
  • Words are separated by one or more whitespace characters.

Solution for Word Frequency Problem​

Intuition And Approach​

To solve this problem using Unix tools, we can leverage a series of commands piped together to process the text file. The approach includes:

  1. Replacing spaces with newlines to handle word separation.
  2. Sorting the words to prepare for counting duplicates.
  3. Using uniq to count the occurrences of each word.
  4. Sorting the counts in descending order.
  5. Formatting the output to display word frequency.

Code​

Written by @mahek0620
tr -s ' ' '\n' < words.txt | sort | uniq -c | sort -nr | awk '{print $2, $1}'

References​