Assignment 12

Task

Write a program that prints a concise digest of a simple log file.

Text: 7.1 - 7.2
Concepts: Maps, sets.

Requirements

Your program should take the name of a log file to process as a command line argument. If no argument is given, you should print an error/usage message.

The log file should have one entry per line where each entry consists of two tokens separated by whitespace (such as tabs or spaces). These two tokens will be described here as the source and the destination. For example, in a simple web server log, the "source" might be the IP address of a client that requested a resource and the "destination" would be the local URL of the requested resource.

It is an error if a log file line contains only one token. If a line has more than 2 tokens, the first two tokens should be treated as source and destination and the rest of the line ignored.

Your program should first print a list of all unique sources, one per line. Each source should be followed by the total number of destinations related to that source, then the number of unique destinations, and finally the full list of all destinations for that source. (See sample output below.)

This sources section should be followed by another section, separated by a blank line, in which the destination and sources roles are reversed. That is, in the second section, you should print all unique destinations, the total number of sources and the number of unique sources related to that destination, and finally the list of all source related to that destination.

Sample Log Files

access_log.txt
A simple web server log.
Source: client IP address. Destination: requested resource.
Your program will let you see which resources each client requested, and then who requested each resouce.
access_log_times.txt
As above, but with a unix timestamp for each access.
This file should produce identical results as with access_log.txt.
kill-log.txt
Log of "kills" in a multiplayer first-person-shooter arena game.
Source: Victorious player. Destination: Killed player.
You'll be able to how many kills each player got, and then who each player got killed by.
exposures.txt
A mapping of potential disease vectors.
Source: One of two infected carriers. Destination: The exposed recipient.
You should be able to see who was most frequently exposed and who was exposed to more than one different strain.
uneven.txt
A mapping of alignments and character classes.
This file has an uneven mix of tabs and spaces between tokens.
invalid.txt
Contains both blank lines and lines of only one token.

Sample Output

C:\Files\webwork\UH\teaching\ics211\grading\A12> java ZtomaszeA12
Error: No filename given.

Usage: java ZtomaszeA12 filename
Prints details of a log file in which each line contains two tokens:
a source and destination.

C:\Files\webwork\UH\teaching\ics211\grading\A12> java ZtomaszeA12 access.log
Could not read from log file: access.log (The system cannot find the file specif
ied)

C:\Files\webwork\UH\teaching\ics211\grading\A12> java ZtomaszeA12 access_log.txt
By SOURCE:
16.211.121.86     3 /  1        [H.html, H.html, H.html]
190.19.115.194    2 /  1        [D.html, D.html]
87.174.252.7      2 /  1        [D.html, D.html]
49.131.184.37     3 /  2        [D.html, D.html, H.html]
176.34.135.152    4 /  3        [C.html, F.html, C.html, G.html]
123.145.91.9      2 /  2        [H.html, A.html]
192.9.180.10      1 /  1        [G.html]
93.93.228.161     2 /  2        [D.html, A.html]
51.66.144.228     5 /  3        [C.html, H.html, B.html, B.html, B.html]
201.196.13.142    2 /  1        [F.html, F.html]
95.23.3.162       2 /  2        [G.html, C.html]

By DESTINATION:
D.html    7 /  4        [93.93.228.161, 49.131.184.37, 190.19.115.194, 87.174.25
2.7, 190.19.115.194, 87.174.252.7, 49.131.184.37]
B.html    3 /  1        [51.66.144.228, 51.66.144.228, 51.66.144.228]
G.html    3 /  3        [95.23.3.162, 192.9.180.10, 176.34.135.152]
F.html    3 /  2        [201.196.13.142, 176.34.135.152, 201.196.13.142]
A.html    2 /  2        [123.145.91.9, 93.93.228.161]
C.html    4 /  3        [176.34.135.152, 51.66.144.228, 95.23.3.162, 176.34.135.
152]
H.html    6 /  4        [123.145.91.9, 16.211.121.86, 51.66.144.228, 16.211.121.
86, 16.211.121.86, 49.131.184.37]

C:\Files\webwork\UH\teaching\ics211\grading\A12> java ZtomaszeA12 invalid.txt
Log file is malformed: A line exists with fewer than 2 tokens.

I wrapped the output to 80 chars, as it would appear in a standard command prompt or terminal window. If I resized the window or redirected the output to a file, the line given for D.html (for example) would be really long and not wrap.

In your output, the details of what you put between each "field" in a line is up to you. For example, I put a tab between the source and the first count, a '/' between the total and unique count, and a tab between the last count and the list of destinations. I just used the default toSting() behavior of the destination list. The section titles I used are also optional.

What to Submit

Upload your UsernameA12.java file to Tamarin. (If you want to write any extra helper classes, you can remove the public from them and paste them at the end of your UsernameA12.java file.)

Remember to follow the course coding standards on all assignments.

Grading [50 points]

5 - Compiles
10 - Command line argument
Filename taken from a cmd line arg (5). If no cmd line argument is given, prints an error/usage message (5).
35 - Prints results in the format described above

FAQs

So this assignment is about Maps and Sets? You don't mention those in the requirements.
If you use a Map or two, this assignment will be quite easy. Basically, you're mapping each source String to a List of destination Strings. And then, separately, you're doing the same thing in reverse. Once you have a List of Strings, it's easy to convert that to a Set to remove any duplicates.
How do I read in the lines of the file?
By default, a Scanner will skip over any whitespace, including multiple tabs and spaces. This suggests that you could use next() to read in the tokens. While this would work fine for valid files, it will cause you problems when there are extra tokens on a line or if a line has only one token in it.

Instead, I recommend you read in one line at a time with nextLine() and then split the line into tokens. One way to do that is to use a second Scanner, since a Scanner can read from a String as well as from a file:

  String line = "one two  three   four";
  Scanner scan = new Scanner(line);
  String token1 = scan.next();  // "one"
  String token2 = scan.next();  // "two"
You may have to use hasNext() to detect if there really is another token left, though, or else be prepared to catch a NoSuchElementException.

An simpler alterative is to use the split method in the String class. This method takes a regular expression, which you can read more about elsewhere. (See this page for Java-specific details, though it's not a very good introduction to regular expressions in general.) The one bit you need to know for this assignment is that, in Java, "\\s" will match any single whitespace character, and so "\\s+" will match 1 or more whitespace characters.

When I submit to Tamarin, it doesn't pass the ABCDE.txt. Many, but not all, of the tokens are missing in the output.
ABCDE.txt contains a mix of tabs and spaces between then tokens, sometimes with more than one of each. (I added the uneven.txt sample file above to show an example of this.) If you're failing this test, you're probably using split to break up the tokens, but you're not matching more than one whitespace in a row as the delimiter to split on. In particular, you should consider the "\\s+" regular expression discussed in the FAQ above.