ICS211: A11

Your program should take the name of a dictionary file as a command line argument. This way, the user can easily specify a different dictionary to use each time the program is run. The dictionary file should contain a list of words, one per line.

If no command line argument is given, it is up to you on what to do. You could either try to load a default dictionary file (which may not be present) and/or print an error/usage message. Your program should not crash, though.

Once the program starts, repeatedly ask the user to enter a word. Then print out all of the valid anagrams of that word in alphabetical order. You may print them one per line or in a comma-separated list. A "valid anagram" is a word the contains all of the letters in the user's word and that appears in the current dictionary. Note that the user's input does not itself need to be a valid word in the dictionary.

Your program should be case-insensitive. To do this, I recommend that you simply lowercase all dictionary words and all user input as you read them in.

The program should operate efficiently. There should be no noticeable lag time when finding all of the legal anagrams for a given input, even for a long word or a large dictionary.

For testing purposes, I recommend you download the ENABLE dictionary. (This is basically the same dictionary that is used by the Words with Friends game.)

Sample Output

What to Submit

Upload your UsernameA11.java file to Tamarin. (If you want to write any extra helper classes, you can remove the public from them and paste them at the end of your UsernameA11.java file.) Note that you are not submitting a dictionary file; Tamarin will provide its own.

Grading [50 points]

Discussion and Hints

There are two problems to tackle here. Think about them on your own a bit first, but you can then highlight the hints below for more help.

The first problem is finding whether a particular word is in the dictionary. If there are n words in the dictionary, you should now know how to store and lookup a word much faster than O(n) time.

The other task is to find the valid anagrams. Producing all anagrams of a word of length k would mean generating k! possible permutations. This is an exponential growth rate, with over 6 billion permutations for a 13-letter word like embellishment (if treating the duplicate es and ls as unique letters). There are only about 40 million permutations of exponential, but that is still more than you might want to lookup. You could do much better.

As always, you must write your own code, but you are free to discuss your design with others. You do not need to design any new data structures or ADTs. Everything you need is readily available in the Collections framework in java.util.

Design Solution

I've seen some initial designs wandering off into very complicated and inefficient directions. Therefore, give the design some thought, but before you start writing code, you may want to reveal the recommended design below.

You should process the dictionary file only once at the start of the program, not every time the user enters a word. You should divide the dictionary up into sets of anagrams in O(n) time. You will then devise a unique key for each of those sets, allowing you to immediately retrieve all the anagrams for a given word in O(1) time.

Therefore, you should put the dictionary into a Map<String, Set<String>>. You don't care if those anagrams sets are in any particular order relative to each other, so you can use a HashMap here:

  Map<String, Set<String>> anagrams = new HashMap<String, Set<String>>();

Now, given any word, you need a way to translate that word into a key to find the correct anagram set. Consider the following words:

  snail
  nails
  slain

These words all have the same letters, but in different orders. We could produce the same key for all of them simply by putting the letters of each word into the same order. That is, turn the String into a char array, sort the array, and turn the char[] back into a String. (This takes 3 lines of code or less.) Our key would be ailns in this case.

Therefore, you can go through the dictionary once at program start up. Alphabetize each word to get the corresponding key. Add the word to the set of anagrams associated with that key.

Then, when the user gives you a word, do the same thing: alphabize the letters to get the key, and return the set of valid anagrams stored under that key. Note that there may not be a corresponding set of anagrams if the user's word is not an anagram of a word in the dictionary.

Finally, if you use a TreeSet to hold the words in each anagram set, they'll be stored in sorted order and you won't even need to sort the words before you print them.

Assignment 11

Task

Requirements