LIS677: Usability Study

Abstract: A real-world pilot study of the use of alternate site maps in web site design. Testing a large, strategic miniature business web site, it is tentatively shown that alternate site maps improve the speed of user performance and increase both the perceived level of usability and user enjoyment. However, due to an uneven distribution between user groups, there is not enough data to draw strong conclusions. The benefits and problems with real-world testing are examined and advice is given for future studies.

Introduction
Methodology
Discussion
- Testing in the Real World
- Problems and Deformities
Data
Findings
Conclusions
Endnotes

Introduction

Classification schemes play a huge role in our very conception of reality. Whether the classification consists of an entire paradigm or something as simple as whether brown eggs are better than white eggs, such schemes strongly influence both what we consider to be important from the barrage of information we receive from the world and how we then act on that information. Research shows that these classifications are rarely clear cut, however. First, they have been shown to frequently orient around prototypes, instead of the tradition model of clear-cut divisions between attributes that all members of a class equally share. Secondly, they vary from culture to culture, and even from person to person. They are thus not innate, objective constructs. They are not separable from the context in which they are developed.[1]

Another important consideration is that the same entity can exist in multiple classification hierarchies at the same time. Which hierarchy is more useful depends on the situation. For example, phone books are both information resources and big, heavy books. My success in recognizing the potential of using a phone book to prop open a door depends on my use of the "big, heavy book" scheme. Yet this same scheme would not be helpful at all when I need to find a phone number for a Chinese take-out restaurant. For this, I must recognize that a phone book is a possible information resource I could use.

Despite this somewhat vague and shifting nature of classification hierarchies, we use them heavily as a way to organize knowledge. It seems that some knowledge structure, even if so dependent on context, is better than none. Hierarchies, such as program menus, can help reduces mental load. Rather than having to remember every possible command, a user has only to remember the top level, which better fits into the short term mental capacity of 5 to 7 items. The seminal website use of hierarch is Yahoo, which attempts to classify the entire World Wide Web (or at least those parts of it that meet certain acceptance requirements). Yahoo is also a good example of how hierarchies are not universal nor clearly defined; many of their categories have cross links to other hierarchy branches. And even with this aid, many people still resort to searching to find the appropriate branch.

Classification is particularly important in web site usability when it comes to navigation. Navigational hierarchies provide a model of the site's structure. Navigation is harder when the links are not organized in any recognizable order, or in an order that is not helpful to the task at hand. A page that lists every other page in a web site, usually demonstrating the informational architecture of the complete site, is referred to as a site map.

What if users could pick their navigational hierarchy or customize it to fit their needs? It seems that this should be an aid, since hierarchies vary in usefulness depending on the user and the current task. It should improve performance, but it should also mean less mental strain for users, who no longer have to construct all their searching into terms of a single hierarchy. Using a nearly completed web site of approximately 1400 strategic miniature items, this ideas was put to the test. The goal was not to test which navigational hierarchy works the best, but rather whether having alternates is an aid or a hinderance.

It was hypothesized that, in real use, alternate site maps would be under utilized. People, especially when web surfing, frequently take only the first screen provided to them. They don't seek aid or help [2]; they just tend to plunge along with what is obviously available. (Alternate site maps would likely be more used than online help, however, since they are at least related to searching; they are not some separate task that must be completed before the user can continue.) Another possibility is that, when switching between site maps, users will take some time to reorient themselves, leading to longer task completion times. But once users are familiar with their options, alternate site maps should improve user performance, as well as their enjoyment of the site.

Methodology

The Site Maps

The site was created based on an existing mail order catalog. As such, the webpages followed this existing order. Items are grouped into five manufacture and size categories: Battle Honors 15mm, Battle Honors 25mm, Old Glory 15mm, Rank and File 15mm, and Black Raven Foundry 15mm. The first four are historical miniatures; Black Raven Foundry are all fantasy figures. Figures of each manufacture are then divided into war/time period. These wars are then divided into soldier nationality.

The actual site pages reflect this initial construction, since generally each Manufacturer>War branch of the hierarchy is on a single web page. Each web page is then broken into nationalities. There are links at the top and bottom of each page to these nationality subsections. The main page also has links directly to each page's nationality subsections. Any item can be reached from the main site map with a single click.

Two other site maps were then developed. One is by nation, in which each major nation is listed with links to wars they were in, with the war links labeled by manufactures. The other site map is by war, organized chronologically. Each war, again labeled by different manufacturer to provide for duplicate combinations, is broken down into nations.

And so the three site maps are divided as follows:

By manufacture (the original site map): Manufacturer > War/Time Period > Nation
By Nation: Nation > War/Time Period > Manufacturer
By Time Period : War/Time Period > Manufacturer > Nation

Of course, there are some oddities that don't fit into the hierarchy. For example, Fantasy is treated as a Time Period on the Time Period, and as a nation under the Nation hierarchy. On all site maps, there is a Miscellaneous category for painting guides and flags.

The first part of the experiment was run with only the original site map. In the second part, the alternate two hierarchies were available through links at the top of each hierarchy page. These links were described as "Look at this site another way:" in the hopes of preventing confusion in common users over the exact meaning of "site map" "navigational hierarchy" or "link classification scheme."

The Test Mechanism

The test began with a single page with the initial two questions asking for the user's previous experience. Then a button link took them to the main page of the site, which was displayed in a large frame. In a small, lower frame, a question appeared instructing the user to find an item matching a particular description. (This lower frame is normally occupied by the site shopping cart, which displays the contents added to the cart by the user.) When the user clicked on an item in the main frame, the next question appeared. After four questions, the user was directed to an exit questionnaire. A server-side Perl script (a modification of the original shopping cart script) was used to both display the questions and record the user's answers.

The Users

Users with little experience with strategic miniatures were casually recruited by the experimenter. To find experienced users, an invitational email was posted on the first day of the experiment to the newsgroups rec.games.miniatures.historical and rec.games.miniatures.misc. There was no monetary, culinary, or other material incentives promised for user participation.

The Questions

The questions were kept to minimum in order not to fatigue or annoy the user.

Beginning Questions

I have much/some/no previous experience with strategic miniatures.
I have much/some/no previous experience using the Internet.

In each case, one of the three answer options was selected from a drop-down menu. In other studies, it has been found that users are generally pretty good judges of their own ability or experience level. Instead of using multiple questions in the hopes of determining some objective standard, these two questions were used as a rough guide to experience level.

Task Questions

These tasks formed the basis of the test and were used to judge actual usability of the site. For each of the four task question, a user randomly received one of three alternatives. This was done in the hopes of avoiding a single exceptionally hard question. The alternate questions involved very similar search tasks.

Q1.

Find a Battle Honors 15mm figure from the Napoleonics.
Find a Old Glory figure 15mm from the Seven Years War.
Find a Battle Honors 25mm figure from World War II.

This question was to test how well the user could find figures of a particular Manufacturer>War. The question was particularly easy because each of these wars is found under only one Manufacturer.

Q2.

Find a fantasy figure.
Find a medieval figure.
Find an American Civil War figure.

This question involved looking at the War/Time period subheadings under the manufacturers on the main classification system.

Q3.

Find an Austrian infantry figure from the Franco-Prussian War.
Find a Bavarian infantry figure from the Napoleonics.
Find a Zulu figure from Colonials.

This question involved looking at nationality. On this main site map, these are in paragraph form under the war subheadings.

Q4.

Find flags for the Jacobite Rebellion.
Find a Vikings figure.
Find a figure of Napoleon. (Hint: he doesn't come alone)

This question is a medley of "tricky" questions. The flags are under Miscellaneous on all classifications. Vikings are found under Ancients, which contains a very long paragraph-like list of other ancient peoples. Finding Napoleon involved going to "Generals" under Old Glory 15mm > Napoleonics, or to "Staff Sets" under Battle Honors 15mm > Napoleonics. Though this question is made easier by finding Napoleon in one of two places, he is not found directly by going through the France nationality.

This was actually labeled as a Bonus Question and the user could skip it if desired. This was intended to allow fatigued users the chance to end the test early in the hopes of at least getting responses to the exit questions. In reality, no one chose to skip this question. This is very likely due to the technical fact that the link to skip the question does not appear on most screens without scrolling the smaller frame.

Exit Questions

Again, all questions were answered by a pull-down menu.

The main page of this site is organized by primarily by manufacturer. There are links from the main page to two other site maps: one organized primarily by time period and one organized primarily by nation. Did you use any of the alternate site maps? Yes / No / I'm not sure

This question was to find out how many people actually used the alternate site maps. It was not displayed during the first half of the experiment when there were no alternate site maps available.

This test took me less than 3 minutes / between 3 to 7 minutes / 7 to 12 minutes / 12 to 15 minutes / more than 15 minutes to complete.

This was to give another objective indication of the site usability, where the longer the time spent, the lower the usability.

This was not, however, a completely objective determination of how long people spent on the test. First off, estimates may vary depending on the user's affective state, whether they were fully engaged in the task, etc. Secondly, they were not requested to keep track of the time at the beginning of the test, so this is necessarily only an estimate. Also, users surf at different speeds. Some may have been relaxing and exploring the site more; others may have been trying to finish the test as quickly as possible. Finally, technical problems with the site or users' connection speeds likely impacted the length of time taken, though this of course is a real-world problem that will always influence the site's use.

Was this site easy to use?
Compared to other web sites, I found this site harder/about average /easier to use .

This was to test how easy the user perceived the site to be. This may or may not have any correlation with their actual performance.

Did you enjoy using this site?
Compared to other web sites, I found this site more enjoyable / about average / less enjoyable to use .

This was to test user satisfaction. As shown by Spool [3], user satisfaction frequently does not correlate with user success when it comes to web site use. And generally user satisfaction plays a larger part than user success in determining whether a user will return to a site.

Please share any comments from your experiences or any suggestions for site improvement:
I would like to know about the results of this study: No /Yes
If yes, your email address:

These last two questions were to get some practical input on ways to improve the site, beyond just the navigation schemes. Also, it seemed polite to offer users the chance to learn what conclusions their time and efforts produced.

The Experiment Design

IVs:	IV1: Standard Site Map (control group)	IV2: Alternate Site Maps (experimental group)
DVs:	DV1: User Performance (task questions, exit question 1) DV2: Perceived Ease of Use (exit question 2) DV3: User Enjoyment (exit question 3)	DV1: User Performance (task questions, exit question 1) DV2: Perceived Ease of Use (exit question 2) DV3: User Enjoyment (exit question 3)

IVs:

IV1: Standard Site Map
(control group)

IV2: Alternate Site Maps
(experimental group)

DVs:

DV1: User Performance
(task questions, exit question 1)

DV2: Perceived Ease of Use
(exit question 2)

DV3: User Enjoyment
(exit question 3)

DV1: User Performance
(task questions, exit question 1)

DV2: Perceived Ease of Use
(exit question 2)

DV3: User Enjoyment
(exit question 3)

Discussion

There was much to be learned from this study, even though it was brief. Besides the actual results, the process shed much light on real-world usability testing by a practicing site designer.

Testing in the Real World

Real usability testing happens at the last minute.

With any large web site project, there are frequent delays or problems. By the time a designer is ready to open up the site for testing, there are often too many other "more important" things be done, such as securing the shopping cart, getting images up, or submitting the URL to search engines. Since the purpose of the site is usually to get business, those things that have an immediate, obvious business effect take precedence. Once these short term goals are done, the designer can turn to long-term issues of usability. However, by this time, the site is up and done and there is little chance to make integral changes to the structure.

There is usually a hesitation to take an unfinished site public.

A site designer often hesitates to encourage users to browse and comment on an unfinished site. As with any unfinished work, the author is rather sensitive to its known deficiencies. Also, if the site is for business, there is a pressure to always present a polished, professional interface to the user to make that important first impression. This is difficult to do while the site is still under construction.

Usability testing doesn't seem worth the expense compared to using existing research.

Usability tests are expensive to set up and run. They take time away from normal site production. When testing for a single feature, such as the site navigation, the results often don't seem worth the effort, especially when the results match existing usability heuristics based on previous research. There is some sense of, "Yeah, we already know that."

Usability testing and pre-release showcasing reveal technical bugs before the site officially opens.

Especially in this case, where the test mechanism was largely based on the shopping cart design, likely technical problems are discovered early on. These were largely based with real-world Internet traffic; they did not appear when the site tested on an in-house server. In other words, they would not have been discovered before the site was actually used.

Testing can provide a little pre-release stir and a chance for public relations.

By contacting real users, testing gets the word out in the appropriate circles that a new site will be released soon. Praise about a site passed from human to human is worth 10 times as many search engine submittals.

Usability testing provides a chance to interact with the real users.

Even if feedback largely fits what was expected, interaction with the real future users of a site is valuable. It gives the designer confidence that he made the right decisions, and often there are comments and suggestions on things he hadn't considered. It provides an opportunity to know what the life of the site will be like.

Usability testing in the real-world can be rewarding.

When it's all done, it feels good for a site designer to know that her site has been put through the paces and performed well (assuming it does perform well). And there is the added assurance knowing the users are real and the setting is real. The results apply.

Problems and Deformities

The most noticeable "deformity" was the uneven distribution of users between the control and experimental group. This is likely due to response to the newsgroup post, which resulted in over 50 users in the first two days. This high traffic dwindled, however, and there were only about 15 users for the experimental part, which last about four days. Perhaps more users could have been convinced to return, but the experimenter was hesitant to press his case on the strategic miniature community after such high, positive response. A better plan would have been to be ready for an onslaught and quickly enter the experimental phase after the first 30 users.

The second "deformity" is really only a lack -- a lack of novice users! This was not the expected distribution. Though this is actually a more representative sample of future site users, more novice users would have been nice for comparison purposes.

Not all started tests were finished; not all questions in finished tests were answered. This is in a large part due to human factors (people get bored and stop halfway through), and it was expected.

The major problem during the experiment was the somewhat buggy behavior of the testing mechanism. This seemed to come from two sources. First, the users were tracked and recorded by their IP addresses. It seems that either some users' IP addresses changed during the course of the test, there was some unforeseen script behavior due to browser or server caches, or there were gremlins at work. The exact nature of these oddities is still being investigated.

Secondly, the frames caused problems. When a new question appeared and the user pressed the browser back button to return to the main page, the question reverted to the last answered one. This was foreseen. Alternative setups, such as a separate window for the task question, were considered, but in the end frames still seemed the best solution. A "refresh question" link was added. Based on both comments and recorded user actions, this seems to have been partly successful.

Finally, there were times when either the server ran slowly or Internet traffic was high. This was noted in a couple user comments. This means that when a user selected an item, the next question did not immediately appear. So, understandably, they often clicked a second or third time. This could be seen in the responses when either the same or very similar items that answered one question also appeared as an answer for one or more subsequent questions.

Between users not being properly recorded, the wrong question being displayed, and repeated answer submittals, not all properly completed responses could be used. Of the 49 completed user responses in the control portion of the experiment, 37 were usable. Of the 11 in the experimental portion, 10 were usable, although one of these lacks an experience rating.

Data

TABULATED DATA, GIVEN AS PERCENTAGES.
	Original site map only (37 users)	Alternate site map available (10 users)	Used alternate site maps (3 users)
Strategic Miniature Experience:
Much	64.9	60?	66?
Some	29.7	20?	?
None	5.4	10?	?
Internet Experience:
Much	89.2	80?	66?
Some	10.8	10?	?
None	0	0?	?
Task Questions (Correct):
Q1	97	100	100
Q2	100	100	100
Q3	97	100	100
Q4	97	90	100
Time Taken:
< 3 minutes	48.6	60	100
3 to 7 minutes	43.2	40	0
7 to 12 minutes	5.4	0	0
12 to 15 minutes	2.7	0	0
> 15 minutes	0	0	0
Percieved Ease of Use:
Easier	32.4	30	33
Average	59.4	70	67
Harder	8.1	0	0
User Enjoyment:
More Enjoyable	18.9	20	33
Average	75.6	80	67
Less Enjoyable	5.4	0	0
	?: Missing one user's experience data.

You can also view the raw data.

Findings

On the real-world note, the most rewarding finding is how well users performed. The site works! Even though the site is still under construction (it has no graphics, for instance), the rating for both ease of use and enjoyability were above average. That's a good feeling.

An interesting pattern not visible in the above data tabulations is that three of the four incorrect answers were from users with no previous strategic miniature experience. All four incorrect answers were from users with much Internet experience. This implies that errors are caused by being completely unfamiliar with content, and not due to the medium.

No user answered more than one question incorrectly, and no question was answered incorrectly more than once. Considering that 47 users x 4 questions each = 188 questions answered, only four incorrect responses is very impressive performance.

Considering only three users definitely used the alternate site maps, it is difficult to draw conclusive conclusions on this aspect of the experiment. It seems there is some increase in the speed with which these users completed the tasks. Also, the general distribution for ease and enjoyment is higher. But, again, there were only three users, so it is presumptuous to make strong claims about these tendencies.

Conclusions

As a real-world usability pilot study, this experiment was quite successful. It showed that testing users in their normal environment can be done. Also, it showed that usability testing can be done by site designers with good results, though it is time consuming.

Alternate site maps did seem to improve both the speed of performance and user satisfaction. The speed improvement is the most notable and implies that switching between maps does not in fact slow users down. This may be because, though users are still orienting themselves to a site map, this can be done faster when the map fits their current mental hierarchy.

For future work on the effect of alternate site maps, I would suggest a more controlled setting. Specifically, a controlled network situation would be nice. Also, there needs to be a more balanced distribution between the control and experimental parts of the experiment, and between user experience levels. Questions should be harder, such that it is at least possible to note the difference in performance between the one site map and alternate site map conditions. In this experiment, users performed so well in the initial condition, it is impossible to say that alternate site maps improved performance. It does seem that at least adding site maps does not decrease performance. Finally, a controlled setting would allow the experimenter to actually watch users, which may lead to further insights.

Endnotes

(Click on an endnote number to return to the associated point in the text.)

1 Lakoff, George. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. Chicago: University of Chicago Press, 19

2 Nahl, Diane. "Creating User-Centered Instructions for Novice End-Users." Reference Services Review. 27:3. p280-6.

3 Spool, Jared M. et al. Web Site Usability : A Designer's Guide San Francisco : Morgan Kaufmann Publishers, 1999.

Acknowledgments
Thank you to OldGlory15s.com
for use of their site.

The Effect of Alternate Site Maps on User Task Success, Perceived Ease of Use, and User Satisfaction

A Web Site Usability Pilot Study by Zach Tomaszewski

for LIS 677, Spring 2001, taught by Dr. Diane Nahl

Table of Contents