Assignment 11

Task

Write a program that prints all the valid anagrams of user-entered words.

Text: 7.1 - 7.2
Concepts: Maps, sets.

Requirements

Your program should take the name of a dictionary file as a command line argument. This way, the user can easily specify a different dictionary to use each time the program is run. The dictionary file should contain a list of words, one per line.

If no command line argument is given, it is up to you on what to do. You could either try to load a default dictionary file (which may not be present) and/or print an error/usage message. Your program should not crash, though.

Once the program starts, repeatedly ask the user to enter a word. Then print out all of the valid anagrams of that word in alphabetical order. You may print them one per line or in a comma-separated list. A "valid anagram" is a word the contains all of the letters in the user's word and that appears in the current dictionary. Note that the user's input does not itself need to be a valid word in the dictionary.

The program should end when the user enters a blank line.

Your program should be case-insensitive. To do this, I recommend that you simply lowercase all dictionary words and all user input as you read them in.

The program should operate efficiently. There should be no noticeable lag time when finding all of the legal anagrams for a given input, even for a long word or a large dictionary.

For testing purposes, I recommend you download the ENABLE dictionary. (This is basically the same dictionary that is used by the Words with Friends game.)

Sample Output

A11> java ZtomaszeA11
Anagram-Finder!
You did not enter a filename as a command line argument.
Example: java ZtomaszeA11 mydict.txt

A11> java ZtomaszeA11 ENABLE.txt
Anagram-Finder!
Enter a word (or nothing to quit): team
Anagrams of 'team':
[mate, meat, meta, tame, team]

Enter a word (or nothing to quit): mister
Anagrams of 'mister':
[merits, mister, miters, mitres, remits, smiter, timers]

Enter a word (or nothing to quit): Anderson
No anagrams found for 'anderson'.

Enter a word (or nothing to quit): Neo
Anagrams of 'neo':
[eon, one]

Enter a word (or nothing to quit): what
Anagrams of 'what':
[thaw, what]

Enter a word (or nothing to quit): MatriX
Anagrams of 'matrix':
[matrix]

Enter a word (or nothing to quit): multiplication
Anagrams of 'multiplication':
[multiplication]

Enter a word (or nothing to quit):
Goodbye!

What to Submit

Upload your UsernameA11.java file to Tamarin. (If you want to write any extra helper classes, you can remove the public from them and paste them at the end of your UsernameA11.java file.) Note that you are not submitting a dictionary file; Tamarin will provide its own.

Remember to follow the course coding standards on all assignments.

Grading [50 points]

5 - Compiles
10 - Command line argument
Dictionary file to use must be specified as a cmd line arg (5; required for more points). If no cmd line argument is given, program still does or prints something helpful and does not crash (5).
35 - Prints anagrams of user-entered words.
Valid anagrams must be found in the current dictionary (25). Should be printed in alphabetical order (5). Program continues to ask for new words until user enters a blank line (5).

Discussion and Hints

There are two problems to tackle here. Think about them on your own a bit first, but you can then highlight the hints below for more help.

The first problem is finding whether a particular word is in the dictionary. If there are n words in the dictionary, you should now know how to store and lookup a word much faster than O(n) time.

The other task is to find the valid anagrams. Producing all anagrams of a word of length k would mean generating k! possible permutations. This is an exponential growth rate, with over 6 billion permutations for a 13-letter word like embellishment (if treating the duplicate es and ls as unique letters). There are only about 40 million permutations of exponential, but that is still more than you might want to lookup. You could do much better.

As always, you must write your own code, but you are free to discuss your design with others. You do not need to design any new data structures or ADTs. Everything you need is readily available in the Collections framework in java.util.

Design Solution

I've seen some initial designs wandering off into very complicated and inefficient directions. Therefore, give the design some thought, but before you start writing code, you may want to reveal the recommended design below.

You should process the dictionary file only once at the start of the program, not every time the user enters a word. You should divide the dictionary up into sets of anagrams in O(n) time. You will then devise a unique key for each of those sets, allowing you to immediately retrieve all the anagrams for a given word in O(1) time.

Therefore, you should put the dictionary into a Map<String, Set<String>>. You don't care if those anagrams sets are in any particular order relative to each other, so you can use a HashMap here:

  Map<String, Set<String>> anagrams = new HashMap<String, Set<String>>();

Now, given any word, you need a way to translate that word into a key to find the correct anagram set. Consider the following words:

  snail
  nails
  slain

These words all have the same letters, but in different orders. We could produce the same key for all of them simply by putting the letters of each word into the same order. That is, turn the String into a char array, sort the array, and turn the char[] back into a String. (This takes 3 lines of code or less.) Our key would be ailns in this case.

Therefore, you can go through the dictionary once at program start up. Alphabetize each word to get the corresponding key. Add the word to the set of anagrams associated with that key.

Then, when the user gives you a word, do the same thing: alphabize the letters to get the key, and return the set of valid anagrams stored under that key. Note that there may not be a corresponding set of anagrams if the user's word is not an anagram of a word in the dictionary.

Finally, if you use a TreeSet to hold the words in each anagram set, they'll be stored in sorted order and you won't even need to sort the words before you print them.

FAQs

So this assignment is about Maps and Sets? You don't mention those in the requirements.
True, I only asked that you do this efficiently. If you use Maps and/or Sets, your solution will fast. Your code will be shorter and more elegant too. (My whole program is ~100 lines, including all documentation and breaking the code into 4 static methods.)
Is it okay if there is a pause for just a second or two when I start my program with a large dictionary, such as ENABLE?
Yes, that's fine. But you should be able to avoid pauses of several seconds, especially on each input by the user.
Do we need to implement hashCode or a hash function for this assignment?
No. You should use a HashMap, which internally uses a hash table. So it will be calling hashCode() on your keys... but if your keys are Strings, they already have a working hashCode() method. So all of that hash stuff is happening, but it's behind the scenes in the API implementation classes you'll be using.
You say in the design solution that you can get the set of anagrams for a given word in O(1) time. Is that really true?
[Warning: Design Solution spoiler]
That is a bit of a simplification. Generating the key from the word requires a sort, which is O(k lg k), where k is the length of the word. This is probably very small compared to the size (n) of most dictionaries, though, and probably in practice isn't much worse than the cost of then hashing the key and performing the map lookup. You would also have to pay this sorting cost for each word while building the dictionary map, so that process is really O(n * (k lg k)), where k is average length of a word, rather than just O(n).