Assignment 02

Task

Write a program that will count the letters in a given file.

Text: Java Tutorial -> Learning the Java Language -> Language Basics and/or Appendix A in the textbook; Command line arguments
Concepts: data types, conditionals, loops, arrays, command line arguments, file I/O, API documentation.

Requirements

Your program should take the name of the text file to read as a command line argument. If no command line arguments are given, print a message explaining how to use the program and then quit.

If the file given cannot be opened or read, print an error message and quit.

If the file can be read, read in its contents and count how many times each English letter (a-to-z) occurs within the file. Ignore case by treating all letters as their lowercase equivalent. Use an array to store these counts. Print out the count for only each letter that occurs at least once in the file. Print only one letter and its corresponding count per line. Also print the total number of letters in the file.

You should be able to write this program without using a 26-case switch or 26 if/else-ifs to translate the digits from characters to ints or to compute the array index corresponding to a particular digit. (See FAQs for a hint on how to handle the indexing.)

Sample Output

The following is taken from the command line and shows me running the program 5 separate times. The grey line is my prompt and is not part of the program output.

A03> java ZtomaszeA02
This program counts the letters in a given file.
You must specify the filename as a command line argument.

Example: java ZtomaszeA02 sample.txt

A03> java ZtomaszeA02 noletters.txt
Count of each letter found in noletters.txt:


Total letters found: 0

A03> java ZtomaszeA02 md5sum.txt
Count of each letter found in md5sum.txt:

b: 1
c: 5
d: 4
e: 2
f: 4

Total letters found: 16

A03> java ZtomaszeA02 salutations.txt
Count of each letter found in salutations.txt:

a: 15
b: 1
c: 8
d: 1
e: 22
f: 2
h: 7
i: 6
l: 8
m: 3
n: 5
o: 8
p: 3
r: 12
s: 12
t: 19
u: 5
w: 2
y: 3

Total letters found: 142

A03> java ZtomaszeA02 nosuchfile.txt
Could not open file: nosuchfile.txt (The system cannot find the file specified)

You can download the same files I used or create your own:

What to Submit

Upload your UsernameA02.java file to Tamarin.

Remember to follow the course coding standards on all assignments.

Grading [50 points]

5 - Compiles
10 - Command line args
Gets name of file-to-read from command line arguments (7). Usage message (and no other output) if no command line argument is given (3).
5 - Appropriate error message (and no other output) if given file cannot be found or read.
20 - Correct output
Prints correct counts for each letter (A to Z) found in file, one per line, treating uppercase and lowercase characters the same. (16). Does not print results for letters with 0 counts (-5 if so). Prints correct total number of letters, labelled as total, count, or sum (4).
10 - Required internals
Uses an int array of size 26 (5). Does not use 26 ifs, ||s, or switch cases (5).

FAQs

How do I convert each letter to an array index if I can't use a switch or if statement?
Find a one-line expression that will directly compute the index from the value of the letter.

Like all values in memory, character values are just sequences of bits. Any sequence of bits can be treated as a numerical value. In fact, each character is already represented internally in your program by an integer value. The current character encoding used by your program--usually UTF8 or ASCII--specifies the particular mapping of integer values to characters.

You can see this at work by doing something like this:

  int letter = 'A';  //store and treat a char as if it were an int
  int digit = '3';
  System.out.println(letter);     // => 65
  System.out.println(digit + 2);  //51 + 2 => 53

You can leverage this trick to convert letters to array indices. Since you care only about the 26 lowercase letters, this pattern should help you: 'a' - 'a' == 0, 'b' - 'a' == 1, 'c' - 'a' == 2, ... Or, put another way:

  char letter = //...some lowercase letter...
  int index = letter - 'a';  //set index to a value between 0 and 25

Subtracting 'a' from your letter has the same effect as subtracting the value for 'a' that is specified in the character encoding from the value of your letter. Using 'a' instead of that number in your code 1) makes your code more readable and 2) won't break if you run your program in a different character encoding context... provided all the lowercase letters are contiguous in the encoding. (While this is not always true, you can assume it will be for this assignment.)

If you need to convert from an index to the corresponding letter, you may need to cast:

  char letter = (char) ('a' + index);  //gives 'c' if index == 2
This program is hard! What do I do if I don't know how to write this?
Your code does not need to be long. Mine is about 50 lines, including all documentation and error-handling. The part that actually reads from the file, totals the letters found, and prints the results is about 25 lines.

The hard part is the design: breaking the problem down into smaller pieces. As we did in lecture 02A, take it one step at a time, breaking each step down into smaller and smaller pieces until you can turn each piece into code. If you can't translate a particular step into code, it usually means you need to try to break it into even smaller logical steps.

Also, depending on your 111 experience, you may not be very familiar with file I/O or command line arguments. If so, you may also have to do a bit of research to figure what methods you'll need to use for some of the steps in your design. Some of the following FAQs should point you in the right direction with those.

How do I read from a file?

I find that using a Scanner is an easy way to do this, as shown on this slide.

There are other ways to read in a file though (such as on this slide). You can use whatever approach works for you.

Should I use Scanner's next() method to read the file one word at a time...

You can, but you'll still need to pick the letters out of each of the individual tokens you read in.

There is a way to change Scanner's default behavior. Each time you call next(), the Scanner skips over the next sequence of delimiter characters and then grabs the next token. By default, the delimiter is defined as any whitespace characters: space, tab, newline, etc. However, it is possible to change the delimiter to "" (the empty string). Then, each time you call next(), the Scanner won't skip anything, but just give you the next character as a String of length 1. Examine the API for a method that would let you change a Scanner object's delimiter to "".

Alternatively, you could use the BufferedReader approach. A BufferedReader has a read() method that will give you the next single character. (See the API for more.) However, it returns this char as an int so you can see if it's -1, which means the end-of-file has been reached. It's still really the char's encoding value though. You can cast that value to char if you want to print it as a character.

How do I read in the file one character at a time?
See above FAQ.
Hmm, reading in single characters or numbers is rather complicated. I usually just read in whole lines from a file.
That's fine too! Each line is a long String then, so just find a way to loop through a String to examine each character. (Hint: The String.charAt method would help.)
Okay, I've got a single char (either because I read it in that way or because I'm looping through a String). How do I see if it's a letter? And do I need to handle uppercase and lowercase separately?
Don't write separate tests for uppercase and lowercase. Just use the toLowerCase method in either the Character or String class to lowercase all of your characters. This is harmless if the character is not a letter or is already lowercase. See the API for more. (Uppercasing all characters is fine too, but all the other FAQs and examples on this page assumed you lowercased them.)

Then, to check if the character is a letter, I would normally recommend the Character.isLetter method. This will accept letters in any language, not just English. However, your array is setup to contain only 26 possible values, so you actually want to restrict yourself to only English letters. You can do this manually by checking if your current (lowercased) character value is >= 'a' and is <= 'z'. If so, its an English letter that you can safely count in your array.