Write a program that will count the letters in a given file.
Text: Java Tutorial -> Learning the Java Language -> Language Basics and/or Appendix A in the textbook; Command line arguments
Concepts: data types, conditionals, loops, arrays, command line arguments, file I/O, API documentation.
Your program should take the name of the text file to read as a command line argument. If no command line arguments are given, print a message explaining how to use the program and then quit.
If the file given cannot be opened or read, print an error message and quit.
If the file can be read, read in its contents and count how many times each English letter (a-to-z) occurs within the file. Ignore case by treating all letters as their lowercase equivalent. Use an array to store these counts. Print out the count for only each letter that occurs at least once in the file. Print only one letter and its corresponding count per line. Also print the total number of letters in the file.
You should be able to write this program without using a 26-case switch or 26 if/else-ifs to translate the digits from characters to ints or to compute the array index corresponding to a particular digit. (See FAQs for a hint on how to handle the indexing.)
The following is taken from the command line and shows me running the program 5 separate times. The grey line is my prompt and is not part of the program output.
A03> java ZtomaszeA02 This program counts the letters in a given file. You must specify the filename as a command line argument. Example: java ZtomaszeA02 sample.txt A03> java ZtomaszeA02 noletters.txt Count of each letter found in noletters.txt: Total letters found: 0 A03> java ZtomaszeA02 md5sum.txt Count of each letter found in md5sum.txt: b: 1 c: 5 d: 4 e: 2 f: 4 Total letters found: 16 A03> java ZtomaszeA02 salutations.txt Count of each letter found in salutations.txt: a: 15 b: 1 c: 8 d: 1 e: 22 f: 2 h: 7 i: 6 l: 8 m: 3 n: 5 o: 8 p: 3 r: 12 s: 12 t: 19 u: 5 w: 2 y: 3 Total letters found: 142 A03> java ZtomaszeA02 nosuchfile.txt Could not open file: nosuchfile.txt (The system cannot find the file specified)
You can download the same files I used or create your own:
Upload your UsernameA02.java
file to Tamarin.
Remember to follow the course coding standards on all assignments.
int
array of size 26 (5). Does not use 26 ifs, ||s, or switch cases (5).
Like all values in memory, character values are just sequences of bits. Any sequence of bits can be treated as a numerical value. In fact, each character is already represented internally in your program by an integer value. The current character encoding used by your program--usually UTF8 or ASCII--specifies the particular mapping of integer values to characters.
You can see this at work by doing something like this:
int letter = 'A'; //store and treat a char as if it were an int int digit = '3'; System.out.println(letter); // => 65 System.out.println(digit + 2); //51 + 2 => 53
You can leverage this trick to convert letters to array indices. Since you care only about the 26 lowercase letters, this pattern should help you: 'a' - 'a' == 0, 'b' - 'a' == 1, 'c' - 'a' == 2, ... Or, put another way:
char letter = //...some lowercase letter... int index = letter - 'a'; //set index to a value between 0 and 25
Subtracting 'a'
from your letter has the same effect as subtracting the value for 'a' that is specified in the character encoding from the value of your letter. Using 'a' instead of that number in your code 1) makes your code more readable and 2) won't break if you run your program in a different character encoding context... provided all the lowercase letters are contiguous in the encoding. (While this is not always true, you can assume it will be for this assignment.)
If you need to convert from an index to the corresponding letter, you may need to cast:
char letter = (char) ('a' + index); //gives 'c' if index == 2
The hard part is the design: breaking the problem down into smaller pieces. As we did in lecture 02A, take it one step at a time, breaking each step down into smaller and smaller pieces until you can turn each piece into code. If you can't translate a particular step into code, it usually means you need to try to break it into even smaller logical steps.
Also, depending on your 111 experience, you may not be very familiar with file I/O or command line arguments. If so, you may also have to do a bit of research to figure what methods you'll need to use for some of the steps in your design. Some of the following FAQs should point you in the right direction with those.
I find that using a Scanner is an easy way to do this, as shown on this slide.
There are other ways to read in a file though (such as on this slide). You can use whatever approach works for you.
next()
method to read the file one word at a time...
You can, but you'll still need to pick the letters out of each of the individual tokens you read in.
There is a way to change Scanner's default behavior. Each time you call next()
, the Scanner skips over the next sequence of delimiter characters and then grabs the next token. By default, the delimiter is defined as any whitespace characters: space, tab, newline, etc. However, it is possible to change the delimiter to "" (the empty string). Then, each time you call next()
, the Scanner won't skip anything, but just give you the next character as a String of length 1. Examine the API for a method that would let you change a Scanner object's delimiter to "".
Alternatively, you could use the BufferedReader approach. A BufferedReader has a read()
method that will give you the next single character. (See the API for more.) However, it returns this char
as an int
so you can see if it's -1, which means the end-of-file has been reached. It's still really the char
's encoding value though. You can cast that value to char
if you want to print it as a character.
String.charAt
method would help.)
char
(either because I read it in that way or because I'm looping through a String). How do I see if it's a letter? And do I need to handle uppercase and lowercase separately?
Then, to check if the character is a letter, I would normally recommend the Character.isLetter method. This will accept letters in any language, not just English. However, your array is setup to contain only 26 possible values, so you actually want to restrict yourself to only English letters. You can do this manually by checking if your current (lowercased) character value is >= 'a' and is <= 'z'. If so, its an English letter that you can safely count in your array.