Argax Project

Node Status: COMPLETE

Results

Participation Rates

The following table shows participant response rates and retention through the stages of the study.

StageTotal ParticipantsGroup CSGroup SCMean Time to Complete (minutes)
Received Email 250 to 500 (estimated)
Number of clicks to the start of study 89
Completed consent form 43 1924
Completed background survey 41 19224.6
Opened first game (Captain Fate) 41 1922
Entered one or more commands 38 1820
Completed first game 31 15169.1
Completed first response survey 26 12143.7
Opened second game (The Queen's Heart) 261214
Entered one or more commands 23 1211
Completed second game 23 121113.1
Completed second response survey 23 12113.4
Completed comparison survey 23 12111.7
Usable responses20 119
Table: Participant retention rates.

Retention rates between the two groups were generally comparable. Once participants started the study by completing the consent form, the greatest drop in participants was while playing through the first game, regardless of user interface. It is also interesting that all three of the participants that aborted the study upon seeing the second game were in Group SC. They were transitioning from Skald to the command-line interface (CLI).

Of the 23 complete study responses, three were from players who reported that they had already completed the study or played the games before. These "repeated play" responses were dropped, leaving 20 usable responses. 11 of these final responses were in Group CS, and 9 were in Group SC.

Game Endings

As described previously, there are three possible endings to Captain Fate and two possible endings to The Queen's Heart. A few players also quit the game early, although they continued the study by completing the subsequent survey.

CLISkald
Fate6 CHANGE, 1 SOUTH, 4 QUIT4 CHANGE, 2 ATTACK, 3 SOUTH
Queen1 KILL, 7 UP, 1 QUIT4 KILL, 6 UP, 1 QUIT
Table: Different game endings.

Over a third of those who played their first game using the CLI quit the game early, while all of those who started with the Skald interface completed the game. Also, for both games, use of the Skald UI led to a wider variety of possible endings.

Participant Backgrounds

As mentioned, participants were asked to provide certain background information, such as age, gender, education level, and previous computer game experience (Appendix C). Thanks to random assignment of the participants to two groups, the groups did not significantly differ on most of these background measures, with two noticeable exceptions.

First, on a 1 to 5 scale, Group SC rated themselves as more familiar with a command line interface (Group SC mean: 2.78, Group CS mean: 1.73, t(19) = 2.70, p = .11). Secondly, Group CS rated themselves more familiar with the interactive fiction game genre (Group CS mean: 2.36, Group SC mean: 1.33, t(19) = 3.72, p = .06). These differences were on the border of statistical significance. However, given their high potential to impact a new IF experience, the effect of both command line experience and IF experience will be examined in more detail below.

Game Experiences

Both survey responses and recorded game metrics were used to measure the differences in game experience along the dimensions described below.

Original survey questions are given in the title for each table. All survey responses used the following scale: Strongly Agree (5), Agree (4), Neutral (3), Disagree (2), Strongly Disagree (1).

Object Affordances

CLISkaldGame Mean
Fate2.553.563.00
Queen2.894.273.65
UI Mean2.73.95
Table: It was clear which objects I could interact with in the game world.

When recognizing interactive objects, the difference between the CLI and Skald was statistically significant, t(19) = 5.00, p < .0001.

When comparing the two groups, there was also a significant effect between the games F(1, 36) = 5.06, p = .03, as well as the more significant interaction effect between the two interfaces, F(1, 36) = 17.02, p = .0002. This game effect suggests that game narration also plays a significant role at directing a user's attention towards interactive objects, regardless of the interface used.

Metric: Number of Unique Objects Used

Captain Fate contains 22 interactive objects: Benny, Benny's Cafe, big menu picture, counter, cup of coffee, customers, food, light switch, menu, ordinary clothes, pedestrians, phone booth, restroom, restroom door, restroom key, scribbled note, sidewalk, silver coin, toilet, your costume, your ordinary clothes, and yourself.

The Queen's Heart contains 27 interactive objects: bier, blanket, depression, earthworms, flagstones, heartbeat, human child, iron cable, knot in your stomach, large rock, mandala, (dead) mole, path, plank, poles, Queen, ragged hole, roots, shaft, small hole, sorrow, stone arches, (bronze/tarnished) sword, sunlight, support beams, water, and yourself

Most participants only interacted with a subset of these per game.

CLISkaldGame Mean
Fate7.712.910.1
Queen8.717.915.7
UI Mean8.215.7
Table: Mean number of unique direct objects interacted with by each player.

For both games, the use of Skald UI significantly increased the number of different objects that players interacted with, t(19) > 3.9, p < .001. The mean number of unique objects used nearly doubled when using Skald over the CLI. On average, this meant users engaged with over half of the available objects in both games, up from about one third of the available objects when using the CLI.

Metric: Degree of Object Use

Not only were more unique objects used with the Skald interface, but they were used to a greater degree. For Captain Fate, the following objects were used significantly (p < 0.05) more frequently as direct objects in commands under the Skald interface than when using the command line interface: Benny's Cafe, restroom, restroom door, scribbled note, sidewalk, silver coin, toilet, and yourself. No objects were used significantly more with the CLI.

For The Queen's Heart, the following were used significantly (p < 0.05) more often as direct objects using the Skald interface: blanket, depression, heartbeat, knot in stomach, mandala, ragged hole, roots, small hole, stone arches, support beams, water, and yourself. With the CLI interface, Queen was used significantly more often. (The most common text command related to the Queen is listen to queen, which the game actually maps to "listen to heartbeat". Under Skald, more people simply used the heartbeat object directly.)

In general, the objects used significantly more often with Skald tend to background objects or objects that are only mentioned subtly or passingly in the narration. None of them (except for Captain Fate's silver coin) needed to be interacted with in order finish the game. In other words, Skald seems to encourage more exploration and experimentation with a wider range of objects.

Action Affordances

CLISkaldGame Mean
Fate2.553.893.15
Queen3.224.363.85
UI Mean2.854.15
Table: I knew which actions were possible to perform in the game.

When determining possible actions, the difference between the CLI and Skald interfaces was statistically significant, t(19) = 4.1, p < .001.

Once again, when comparing the two groups, there was a significant effect between the games F(1, 36) = 4.44, p = .04, as well as the more significant interaction effect between the two interfaces, F(1, 36) = 13.85, p < .001. It seems that the nature of the game itself affects the range of possible actions that suggest themselves.

Metric: Number of Unique Verbs Used

Captain Fate contains 17 author-supported verbs: Ask_for, Attack, Change, Close, Drink, Drop, Examine, Flip, Get, Give_to, Inventory, Lock_with, Look, Open, Pay, Travel, and Unlock_with.

The Queen's Heart contains 14 author-supported verbs: Attack, Climb, Drop, Eat, Examine, Get, Inventory, Kill_with, Listen to, Look, Push, Trace, Travel, and Wait.

The command line UI also supports the Help verb, and some other library-provided verbs not relevant to completing the game. Use of these extra verbs was ignored here. The CLI also supports different synonyms that map to the same verbs listed above. For example, Take maps to Get. The following counts are based on the back-end verb used, irrespective of any different synonyms used to invoke it.

CLISkald
Fate10.113.3
Queen9.49.6
Table: Mean number of unique author-supported verbs used per user.

The effect of UI on was not significant for The Queen's Heart. However, the Skald UI did lead to a significantly broader use of unique verbs per user for Captain Fate, t(9) = 3.4, p < 0.01.

Metric: Degree of Verb Use

The two interfaces also prompted significant differences (p < .05) in the frequency with which the different verbs were used.

The most striking and significant (p < .001) effect is that the Examine verb was used over 3 times as often with the Skald interface for both games.

For Captain Fate, Examine, Lock_with, and Open were used significantly more often under Skald. These verbs are related to one of the important puzzles in this game--getting into the restroom. Forgetting to lock the restroom door before changing your costume leads to one of the less satisfying endings of the game. Affording this action more clearly seems to have reminded users to lock the door.

On the other hand, Travel was used more often with the CLI. Unnecessary travel between locations is often an indication at the player is at a loss for what to do next and is searching for clues.

For The Queen's Heart, Examine and Kill_with were used significantly more frequently with the Skald interface. Attack was used more significantly frequently with the CLI. This is an interesting shift in use between the interfaces: Attack and Kill_with both accomplish similar goals, though the Kill_with verb requires a second object use to use as a weapon--such as the goblin protagonist's bronze sword. So it seems that Skald may encourage users to try more complicated verbs and command structures.

Command Construction

CLISkaldGame Mean
Fate3.454.223.8
Queen3.674.644.2
UI Mean3.554.45
Table: I was able to able to construct commands that the game understood.

When it comes to the reported ease of constructing valid commands, the difference between the CLI and Skald interfaces was statistically significant, t(19) = 3.9, p < .001. However, the difference between the games was not statistically significant.

It is interesting to note that responses to this question are an average of 0.5 points higher than those to the questions regarding object and action affordances. This suggests that command construction is generally deemed easier than determining the objects and actions that can be used to construct those commands.

Metric: Total Inputs Entered

Counting total inputs involves some subtlety. An input corresponds to a successful command only with the Skald interface. For the traditional command line interface, the number of entered lines may be greater than the number of successful commands.

For example, some entered lines are syntactically invalid commands that result in an error message:

>door
The story doesn't understand that command.

Other inputs may contain misspellings. These also result in an error message, but the player can correct the command with a second line of input:

>examine soorow
The word "soorow" is not necessary in this story.

(If this was an accidental misspelling, you can correct it by typing OOPS
followed by the corrected word now.  Any time the story points out an unknown
word, you can correct a misspelling using OOPS as your next command.)

>oops sorrow
Sometimes your sorrow is a dark void in your chest, sometimes it is a gray
weight between your shoulders, and sometimes it is a languid emptiness
everywhere.  At the moment, it is very heavy.

Similarly, if a command is syntactically incomplete, it may take two lines of input to complete the command:

>examine
What do you want to examine?

>benny
A deceptively fat man of uncanny agility, Benny entertains his customers
crushing coconuts against his forehead when the mood strikes him.

The following table shows the mean number of inputs entered per game session for each experimental group.

CLISkaldGame Mean
Fate50.441.246.3
Queen54.748.451.2
UI Mean52.345.2
Table: Mean number of total game inputs entered per game session.

While, on average, players entered slightly more inputs when using the command line interface or when playing The Queen's Heart, neither the difference between the UIs or between the games was statistically significant.

Metric: Command Error Rate

For this metric, any command that does not affect the game state in an author-supported way is counted as an error.

As discussed above, an input may be syntactically invalid:

>door
The story doesn't understand that command.

An input may be syntactically valid command, but not refer to any present objects:

>give mole to queen
You see no queen here.
>buy coffee
You have no money.

A command may be syntactically valid and refer to existing objects, yet still be refused. This refusal may be a generic refusal as provided by the TADS library:

>attack customer
You cannot attack those.

Or it may be a refusal specifically written by the game author:

>talk to customer
As John Covarth, you attract less interest than Benny's food.

>ask benny for sandwich
Food will take too much time, and you must change now!

Because of the default behaviors provided by the TADS library, some commands may succeed and affect the state of the game world unless specifically prevented by the game author. For example:

>sit
(on the floor)
Okay, you're now sitting on the floor.

>stand
Okay, you're now standing.

However, since sitting or standing does not help with any puzzle or in any other way advance either of the games' stories, these commands still achieve nothing for these particular games. Therefore, as verbs not supported by the game author, they are counted as errors here.

CLISkaldGame Mean
Fate28.9%0.3%17.4%
Queen20.8%0%10.0%
UI Mean25.1%0.1%
Table: Percent of total inputs that did not produce an author-supported command.

Due to a program bug, a single command entered using Skald failed to produce a valid command.

For the CLI, 25.1% of lines entered could be considered errors. If we disregard the commands that were simply not author-supported (but were still successfully processed by the default library) and look at only inputs that produced some sort of error message, this is still around 20.8% of inputs.

These error rate measures are in aggregate across all users. However, users who generate high input error rates tend to play shorter games and enter fewer inputs overall. If we computed a mean error rate by individual user, the mean user error rate over both games when using the CLI is 36.4%!

Every user of the command line UI produced at least one error or ineffective command.

Metric: Time Spent

CLISkaldGame Mean
Fate8.99.49.1
Queen15.211.213.0
UI Mean11.710.4
Table: Mean of time spent playing each game, in minutes.

Across both groups, the maximum time spent on a game was 23.5 minutes and the minimum was 4.5 minutes. The difference between the two games was statistically significant, t(19) = 3.20, p = .004, but the differences between the two interfaces was not.

Metric: Command Input Speed

CLISkaldGame Mean
Fate5.854.855.4
Queen4.234.534.4
UI Mean5.14.6
Table: Mean speed of input entry, in inputs per minute.

Given the significant difference in time spent playing the two different games, the difference in game speeds was also significant, t(19) = 2.47, p = .02. However, the differences between the two interfaces was not significant.

Within both Group CS and Group SC, the fastest player using the CLI was also the fastest player when using Skald.

World-Level Agency

CLISkaldGame Mean
Fate3.453.783.6
Queen3.674.093.9
UI Mean3.553.95
Table: I was sufficiently able to direct my character's actions in the game world--such as moving from place to place, manipulating objects, talking to other characters, etc.

While Skald was rated slightly higher regarding user's sense of agency within the game world, the difference was not significantly significant.

Story-Level Objectives

CLISkaldGame Mean
Fate3.733.563.65
Queen2.672.552.6
UI Mean3.253
Table: I usually knew what I was expected to do in the game (even if sometimes I had to figure out exactly how to accomplish it).

Regarding story-level direction, there was a significant difference between the two game means, t(19) = 3.9, p < .001, but not between the user interfaces.

As proposed by the poetics for interactive narrative discussed previously (Tomaszewski & Binsted 2006, 2004), a story's content should provide narrative constraints or similar implicit suggestion regarding the appropriate user action. These results were a nice confirmation that the story does indeed make a difference to this experience.

Captain Fate is a classic superhero scenario. The first paragraph explicitly lays out the player's objective: find a place to change into his costume. It was fairly obvious what the player was supposed to accomplish; the challenge lay in figuring out how to achieve it.

On the other hand, The Queen's Heart, opens on an unconventional scene of a lonely goblin living in a dark tunnel. No end-game objective is given, only a suggestion of what the first action might be regarding a nearby mole. Interestingly, as shown above, the object and action affordances were still clearer in The Queen's Heart. So players could more easily find things to do in this game, but they didn't know what they were supposed to do.

Interactive Story Experience

CLISkaldGame Mean
Fate4.004.004.0
Queen4.004.184.1
UI Mean4.04.1
Table: The game session had a story-like structure.
CLISkaldGame Mean
Fate3.733.443.6
Queen3.783.733.75
UI Mean3.753.6
Table: The game contained puzzles or situations that required some thought to overcome.
CLISkaldGame Mean
Fate4.093.563.85
Queen3.784.184.0
UI Mean3.953.9
Table: I believe that the game may have had a different outcome had I performed different actions.

These three questions explore the essential features of a classic interactive fiction experience: that it has at least some narrative structure, that it includes challenges and puzzles for the player to work through, and that there is some sense of significant choice on the part of the player.

While each of these dimensions received a fairly favorable rating, there was no statistically significant difference based on the user interface used, game played, or the participants' groups.

Summary Experience

CLISkaldGame Mean
Fate3.003.673.3
Queen3.443.733.6
UI Mean3.23.7
Table: I enjoyed playing this game.

Despite the differences in the games and the user interfaces, neither had a significant effect on the overall enjoyment of the games.

Open-Ended Responses

Below are the answers given in response to three open-ended questions. These answers have been tagged with interesting themes discovered across multiple responses. This tagging is fairly subjective. It was performed by only a single tagger, and so it lacks any inter-rater reliability. Also, it was sometimes difficult to determine the full meaning of a short comment. When more than one interpretation was possible, tags were given followed by a question mark. Such "questionable tags" were only counted as a half-point in the summary totals.

However, despite these limitations, these tags still serve to highlight issues related to some of the aspects explored above as well as to raise some other interesting issues.

Least Enjoyable Aspect

The following are answers to the question, What was the LEAST enjoyable aspect of this game?.

Group CS - Captain Fate, CLI

Group SC - Captain Fate, Skald

Group CS - The Queen's Heart, Skald

Group SC - The Queen's Heart, CLI

The following is a summary of the tag counts for each experimental condition.

Group CS: Fate, CLI Group SC: Fate, Skald Group CS: Queen, Skald Group SC: Queen, CLI
unclear affordances 5 3 1 5.5
actions too low-level 4 1
puzzles too challenging 1.5 2 0.5 1
unclear objective 1.51 3 4
story event 3
limited or unclear story paths 2 1 1
UI behavior 1.5 1 2.5
disliked click input 2.5
Table: Tag summary for least enjoyable aspect of game

We can see here some of the same trends noted above with the qualitative measures. For example, more people complained about unclear affordances when using the CLI. There were more complaints about actions being too low-level in Captain Fate, while The Queen's Heart lacked a clear story objective.

It is interesting to note how many complaints there were about the click-based input from Group CS when playing The Queen's Heart. This is the group that gave the highest rating to object affordances, action affordances, command construction, world-level agency, and enjoyment. So while they found the UI easier to use and seemed more immersed in the story, they still missed the ability to type commands.

Most Enjoyable Aspect

The following are answers to the question, What was the MOST enjoyable aspect of this game?

Group CS - Captain Fate, CLI

Group SC - Captain Fate, Skald

Group CS - The Queen's Heart, Skald

Group SC - The Queen's Heart, CLI

The following is a summary of the tag counts for each experimental condition.

Group CS: Fate, CLI Group SC: Fate, Skald Group CS: Queen, Skald Group SC: Queen, CLI
UI 1 1 1
game medium/​genre 1 0.5
narration 1 1 4 2.5
world simulation/​verisimilitude 0.5 0.5 0.5
clear affordances 23
puzzle challenges 3.5 0.5 3
interaction 2211.5
progress/​advancing the story 0.5 0.5 1 0.5
story 1 2.5 2 2.5
fun/​humor 2
Table: Tag summary for most enjoyable aspect of game

As the name interactive fiction suggests, this table highlights that the most enjoyable aspects include the text narration, the unfolding story, and ability to interact in that world. It is interesting to note that, while the Skald interface provided clear affordances, it was the CLI that made the puzzle challenges enjoyable. Also, the UI comments were fairly split--some liked the keyboard better and some liked the point-and-click better.

Comments

See Appendix F for participants' responses to the question, Any other comment?

User Interface Comparison

In the last step of the study, participants explicitly compared the two user interfaces on a number of different measures.

QuestionGroupResponse CountsMean
CLI (definitely/-2)CLI (slightly/-1)No PreferenceSkald (slightly/+1)Skald (definitely/+2)
Which interface did you find easier to use? TS20027 1.09
ST21114 0.44
Total411311 0.80
Which interface did you find faster to use? TS11027 1.18
ST40122 -0.22
Total51149 0.55
Which interface was more enjoyable to use? TS30116 0.64
ST31122 -0.11
Total61238 0.30
Which interface made the game more interesting or engaging? TS20126 0.91
ST33102 -0.56
Total53228 0.25
If you were to play a third game, which interface would you prefer to use? TS20117 1.00
ST40212 -0.33
Total60329 0.40
Table: Responses to questions comparing Skald and the command line interface.

Based on the mean of all responses for each question (shown in bold), participants slightly favored Skald over the CLI for all categories. The means range from 0.25 to 0.80 on a -2 (CLI) to 2 (Skald) scale.

However, there is much more going on here. First of all, the distribution of responses for every question are clustered at the two extremes of the scale, with few participants falling in the middle. So most people strongly favored one interface or the other, with slightly more favoring Skald over the traditional command-based UI.

Secondly, there was a noticeable difference between the two group means. Group SC favored the CLI side of scale for all question except ease of use (where they still had a lower score than Group CS). On the other hand, Group CS favored Skald over the CLI-based UI. This may just be an anomaly of group formation, or it may be that people favored the interface they played second.

Comments

Participants' additional comments can be found in Appendix G. These comments tended to underscore the qualitative results explored above:

Effect of Participant Background

As mentioned previously, there was an uneven distribution of prior experience between the randomly-assigned groups regarding using a command line and playing interactive fiction. Specifically, Group SC rated themselves as more familiar with a command line interface and Group CS rated themselves more familiar with the IF genre. It is therefore worth taking a closer look at the correlation between individuals' prior experience and their reported game experience.

A Pearson correlation score of r >= |0.44| is considered statistically significant, equivalent to p <= .05.

Command line Experience

Across all participants, there was a significant correlation between reported command line experience and the participant's reported ability to construct commands (r = 0.64), agency within the game world (r = 0.52), and their enjoyment of the game overall (r = 0.63) while using the CLI. When using the Skald UI, there was no correlation with previous command line experience.

Assuming that Group SC's higher degree of command line experience may have inflated their ratings on these three dimensions when using the CLI does not endanger any of the conclusions drawn above. In fact, a slightly lower rating from Group SC on their enjoyment of the CLI may have lead to a statistically significant result regarding the effect of Skald on enjoyment.

There was a significant correlation between participants' familiarity with the command line and their preferences for the traditional IF command interface as measured by the final comparison survey. These correlated measures included rating the CLI as faster to use (r = 0.53 ), more enjoyable to use (r = 0.45), creating a more interesting or engaging game (r = 0.63), and preferring it as the UI in future games (r = 0.52). However, there as not a significant correlation with finding the CLI easier to use (r = 0.26).

Interactive Fiction Experience

Across all participants, those with prior interactive fiction experience reported finding it easier to recognize interactive objects (r = 0.71), determine possible actions supported by those objects (r = 0.69), construct commands (r = 0.55), and generally exhibit agency in the game world (r = 0.46), but only when using the Skald interface. Surprisingly, this prior experience showed no correlations at all with response ratings when using a traditional command-based IF user interface.

If we assume that Group CS may thus have rated Skald higher on these four measure due to their higher levels of prior IF experience than Group SC, it would bring into question some of the conclusions above regarding Skald's ease of use for all users. However, those conclusions were all very significant (p < .001) and so would probably not be unduly influenced by the slightly uneven distribution of IF experience between the two groups.

There was no correlation with previous IF experience and any preference for one UI over the other, as reported on any rating in the final comparison survey.

Works Cited