Pipe-and-Filter Architecture and An Elegant Algorithm

The Problem: Given a dictionary of words in English, find all the anagrams in the dictionary. That is, find all the words that are permutations of each other. For example, pots, stop, and spot are anagrams of each other.

Bentley, J. Programming Pearls, Second Edition. (Boston, MA: Addison-Wesley, 2000.)

First of all, all the anagrams have the same letters and the same number of letters in each word. That gives us the clue to the method you’ll use to find the anagrams. If you sort each word by individual letters, you’ll end up with a string of characters that has all the letters in the word in alphabetical order. We call this creating a sign for the word. If you then sort the resulting list, all the anagrams will end up together in the sorted list because their sorted letters will be identical. If you then keep track of which words you sorted, you can then simplify the list and create a new list with, say, each set of anagrams on the same line of the output file. This is exactly how Bentley does it. But how does this relate to a pipe-and-filter architecture, you ask? Good question. Let’s break down the solution again.

  1. Create a sign for each word in the list by sorting the letters in each word; keep the
    sign and the word together.
  2. Sort the resulting list by the signs; all the anagrams should now be together.
  3. Squash the list by putting each set of anagrams on the same line, removing the
    signs as you do.

sign <dictionary.txt| sort | squash> anagrams.txt

sign is the filter we use to do step 1, with input file dictionary.txt. sign outputs a list of signs and their associated words, which is piped to the sort utility. Sort then sorts the list by the first field on each line, which happens to be the sign of each word. It then outputs the sorted list to the next pipe. Squash takes the sorted list from the incoming pipe and compresses it by putting all the words with the same sign on the same line, eliminating the signs as it does so. This final list is sent via one last pipe to the output file anagrams.txt.

About Aliyar Güneş

I’am Aliyar Güneş, a learner and software developer from Istanbul, Turkey. I write C# and Java.
This entry was posted in Algorithm Analysis, General Concepts, Links, Software Architecture and tagged , , , . Bookmark the permalink.

Leave a Reply