Skip to main content

Burrows-Wheeler Transform (Geeks for Geeks)

1. What is the Burrows-Wheeler Transform (BWT)?

The Burrows-Wheeler Transform (BWT) is an algorithm used to transform a string into a form that is more amenable to compression. The BWT reorganizes the input string into runs of similar characters, which can then be more effectively compressed by other algorithms. It is particularly useful in data compression techniques and bioinformatics for sequence analysis.

2. Algorithm for Burrows-Wheeler Transform

  1. Generate all rotations of the input string.
  2. Sort the rotations lexicographically.
  3. Take the last column of the sorted rotations to form the BWT output.

3. How does the Burrows-Wheeler Transform work?

  • Generate Rotations: Create all possible rotations of the input string.
  • Sort Rotations: Sort the generated rotations in lexicographical order.
  • Last Column: Extract the last column from the sorted rotations to form the transformed string.

4. Problem Description

Given a text string, implement the Burrows-Wheeler Transform (BWT) to obtain the transformed string.

5. Examples

Example 1:

Input: text = "BANANA"
Output: BWT = "ANNBAA"

Example 2:

Input: text = "GEEKSFORGEEKS"
Output: BWT = "SKREEEGKESOF"

Explanation of Example 1:

  • The rotations of "BANANA" are:
    BANANA
    ANANAB
    NANABA
    ANABAN
    NABANA
    ABANAN
  • Sorting the rotations lexicographically:
    ABANAN
    ANABAN
    ANANAB
    BANANA
    NABANA
    NANABA
  • The last column of the sorted rotations is "ANNBAA", which is the BWT of "BANANA".

Visual Example

Visual Example of Burrows-Wheeler Transform

6. Constraints

  • The text can contain any number of characters.
  • All characters are ASCIIASCII characters.

7. Implementation

Written by @GeeksforGeeks
def burrows_wheeler_transform(text):
n = len(text)
rotations = [text[i:] + text[:i] for i in range(n)]
rotations.sort()
bwt = ''.join(rotation[-1] for rotation in rotations)
return bwt

# Example usage:
text = "BANANA"
bwt = burrows_wheeler_transform(text)
print("BWT =", bwt)

8. Complexity Analysis

  • Time Complexity:

    • Generating rotations: O(n2)O(n^2) where nn is the length of the text.
    • Sorting rotations: O(nlogn)O(n \log n).
    • Overall: O(n2)O(n^2) due to the rotations generation step.
  • Space Complexity: O(n2)O(n^2) for storing all rotations.

9. Advantages and Disadvantages

Advantages:

  • Makes the input string more compressible.
  • Useful in various applications, including data compression and bioinformatics.

Disadvantages:

  • High space complexity due to storing all rotations.
  • Higher time complexity compared to some other string transformation algorithms.

10. References

  • GFG Problem: GFG Problem
  • Author's Geeks for Geeks Profile: GeeksforGeeks