The following table shows participant response rates and retention through the stages of the study.
Stage | Total Participants | Group CS | Group SC | Mean Time to Complete (minutes) |
---|---|---|---|---|
Received Email | 250 to 500 (estimated) | |||
Number of clicks to the start of study | 89 | |||
Completed consent form | 43 | 19 | 24 | |
Completed background survey | 41 | 19 | 22 | 4.6 |
Opened first game (Captain Fate) | 41 | 19 | 22 | |
Entered one or more commands | 38 | 18 | 20 | |
Completed first game | 31 | 15 | 16 | 9.1 |
Completed first response survey | 26 | 12 | 14 | 3.7 |
Opened second game (The Queen's Heart) | 26 | 12 | 14 | |
Entered one or more commands | 23 | 12 | 11 | |
Completed second game | 23 | 12 | 11 | 13.1 |
Completed second response survey | 23 | 12 | 11 | 3.4 |
Completed comparison survey | 23 | 12 | 11 | 1.7 |
Usable responses | 20 | 11 | 9 |
Retention rates between the two groups were generally comparable. Once participants started the study by completing the consent form, the greatest drop in participants was while playing through the first game, regardless of user interface. It is also interesting that all three of the participants that aborted the study upon seeing the second game were in Group SC. They were transitioning from Skald to the command-line interface (CLI).
Of the 23 complete study responses, three were from players who reported that they had already completed the study or played the games before. These "repeated play" responses were dropped, leaving 20 usable responses. 11 of these final responses were in Group CS, and 9 were in Group SC.
As described previously, there are three possible endings to Captain Fate and two possible endings to The Queen's Heart. A few players also quit the game early, although they continued the study by completing the subsequent survey.
CLI | Skald | |
---|---|---|
Fate | 6 CHANGE, 1 SOUTH, 4 QUIT | 4 CHANGE, 2 ATTACK, 3 SOUTH |
Queen | 1 KILL, 7 UP, 1 QUIT | 4 KILL, 6 UP, 1 QUIT |
Over a third of those who played their first game using the CLI quit the game early, while all of those who started with the Skald interface completed the game. Also, for both games, use of the Skald UI led to a wider variety of possible endings.
As mentioned, participants were asked to provide certain background information, such as age, gender, education level, and previous computer game experience (Appendix C). Thanks to random assignment of the participants to two groups, the groups did not significantly differ on most of these background measures, with two noticeable exceptions.
First, on a 1 to 5 scale, Group SC rated themselves as more familiar with a command line interface (Group SC mean: 2.78, Group CS mean: 1.73, t(19) = 2.70, p = .11). Secondly, Group CS rated themselves more familiar with the interactive fiction game genre (Group CS mean: 2.36, Group SC mean: 1.33, t(19) = 3.72, p = .06). These differences were on the border of statistical significance. However, given their high potential to impact a new IF experience, the effect of both command line experience and IF experience will be examined in more detail below.
Both survey responses and recorded game metrics were used to measure the differences in game experience along the dimensions described below.
Original survey questions are given in the title for each table. All survey responses used the following scale: Strongly Agree (5), Agree (4), Neutral (3), Disagree (2), Strongly Disagree (1).
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 2.55 | 3.56 | 3.00 |
Queen | 2.89 | 4.27 | 3.65 |
UI Mean | 2.7 | 3.95 |
When recognizing interactive objects, the difference between the CLI and Skald was statistically significant, t(19) = 5.00, p < .0001.
When comparing the two groups, there was also a significant effect between the games F(1, 36) = 5.06, p = .03, as well as the more significant interaction effect between the two interfaces, F(1, 36) = 17.02, p = .0002. This game effect suggests that game narration also plays a significant role at directing a user's attention towards interactive objects, regardless of the interface used.
Captain Fate contains 22 interactive objects:
Benny
, Benny's Cafe
, big menu picture
, counter
, cup of coffee
, customers
, food
, light switch
, menu
, ordinary clothes
, pedestrians
, phone booth
, restroom
, restroom door
, restroom key
, scribbled note
, sidewalk
, silver coin
, toilet
, your costume
, your ordinary clothes
, and yourself
.
The Queen's Heart contains 27 interactive objects:
bier
, blanket
, depression
, earthworms
, flagstones
, heartbeat
, human child
, iron cable
, knot in your stomach
, large rock
, mandala
, (dead) mole
, path
, plank
, poles
, Queen
, ragged hole
, roots
, shaft
, small hole
, sorrow
, stone arches
, (bronze/tarnished) sword
, sunlight
, support beams
, water
, and yourself
Most participants only interacted with a subset of these per game.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 7.7 | 12.9 | 10.1 |
Queen | 8.7 | 17.9 | 15.7 |
UI Mean | 8.2 | 15.7 |
For both games, the use of Skald UI significantly increased the number of different objects that players interacted with, t(19) > 3.9, p < .001. The mean number of unique objects used nearly doubled when using Skald over the CLI. On average, this meant users engaged with over half of the available objects in both games, up from about one third of the available objects when using the CLI.
Not only were more unique objects used with the Skald interface, but they were used to a greater degree. For Captain Fate, the following objects were used significantly (p < 0.05) more frequently as direct objects in commands under the Skald interface than when using the command line interface: Benny's Cafe
, restroom
, restroom door
, scribbled note
, sidewalk
, silver coin
, toilet
, and yourself
. No objects were used significantly more with the CLI.
For The Queen's Heart, the following were used significantly (p < 0.05) more often as direct objects using the Skald interface: blanket
, depression
, heartbeat
, knot in stomach
, mandala
, ragged hole
, roots
, small hole
, stone arches
, support beams
, water
, and yourself
. With the CLI interface, Queen
was used significantly more often. (The most common text command related to the Queen is listen to queen
, which the game actually maps to "listen to heartbeat". Under Skald, more people simply used the heartbeat
object directly.)
In general, the objects used significantly more often with Skald tend to background objects or objects that are only mentioned subtly or passingly in the narration. None of them (except for Captain Fate's silver coin
) needed to be interacted with in order finish the game. In other words, Skald seems to encourage more exploration and experimentation with a wider range of objects.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 2.55 | 3.89 | 3.15 |
Queen | 3.22 | 4.36 | 3.85 |
UI Mean | 2.85 | 4.15 |
When determining possible actions, the difference between the CLI and Skald interfaces was statistically significant, t(19) = 4.1, p < .001.
Once again, when comparing the two groups, there was a significant effect between the games F(1, 36) = 4.44, p = .04, as well as the more significant interaction effect between the two interfaces, F(1, 36) = 13.85, p < .001. It seems that the nature of the game itself affects the range of possible actions that suggest themselves.
Captain Fate contains 17 author-supported verbs: Ask_for
, Attack
, Change
, Close
, Drink
, Drop
, Examine
, Flip
, Get
, Give_to
, Inventory
, Lock_with
, Look
, Open
, Pay
, Travel
, and Unlock_with
.
The Queen's Heart contains 14 author-supported verbs: Attack
, Climb
, Drop
, Eat
, Examine
, Get
, Inventory
, Kill_with
, Listen to
, Look
, Push
, Trace
, Travel
, and Wait
.
The command line UI also supports the Help
verb, and some other library-provided verbs not relevant to completing the game. Use of these extra verbs was ignored here. The CLI also supports different synonyms that map to the same verbs listed above. For example, Take
maps to Get
. The following counts are based on the back-end verb used, irrespective of any different synonyms used to invoke it.
CLI | Skald | |
---|---|---|
Fate | 10.1 | 13.3 |
Queen | 9.4 | 9.6 |
The effect of UI on was not significant for The Queen's Heart. However, the Skald UI did lead to a significantly broader use of unique verbs per user for Captain Fate, t(9) = 3.4, p < 0.01.
The two interfaces also prompted significant differences (p < .05) in the frequency with which the different verbs were used.
The most striking and significant (p < .001) effect is that the Examine
verb was used over 3 times as often with the Skald interface for both games.
For Captain Fate, Examine
, Lock_with
, and Open
were used significantly more often under Skald. These verbs are related to one of the important puzzles in this game--getting into the restroom. Forgetting to lock the restroom door before changing your costume leads to one of the less satisfying endings of the game. Affording this action more clearly seems to have reminded users to lock the door.
On the other hand, Travel
was used more often with the CLI. Unnecessary travel between locations is often an indication at the player is at a loss for what to do next and is searching for clues.
For The Queen's Heart, Examine
and Kill_with
were used significantly more frequently with the Skald interface. Attack
was used more significantly frequently with the CLI.
This is an interesting shift in use between the interfaces: Attack
and Kill_with
both accomplish similar goals, though the Kill_with
verb requires a second object use to use as a weapon--such as the goblin protagonist's bronze sword. So it seems that Skald may encourage users to try more complicated verbs and command structures.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 3.45 | 4.22 | 3.8 |
Queen | 3.67 | 4.64 | 4.2 |
UI Mean | 3.55 | 4.45 |
When it comes to the reported ease of constructing valid commands, the difference between the CLI and Skald interfaces was statistically significant, t(19) = 3.9, p < .001. However, the difference between the games was not statistically significant.
It is interesting to note that responses to this question are an average of 0.5 points higher than those to the questions regarding object and action affordances. This suggests that command construction is generally deemed easier than determining the objects and actions that can be used to construct those commands.
Counting total inputs involves some subtlety. An input corresponds to a successful command only with the Skald interface. For the traditional command line interface, the number of entered lines may be greater than the number of successful commands.
For example, some entered lines are syntactically invalid commands that result in an error message:
>door The story doesn't understand that command.
Other inputs may contain misspellings. These also result in an error message, but the player can correct the command with a second line of input:
>examine soorow The word "soorow" is not necessary in this story. (If this was an accidental misspelling, you can correct it by typing OOPS followed by the corrected word now. Any time the story points out an unknown word, you can correct a misspelling using OOPS as your next command.) >oops sorrow Sometimes your sorrow is a dark void in your chest, sometimes it is a gray weight between your shoulders, and sometimes it is a languid emptiness everywhere. At the moment, it is very heavy.
Similarly, if a command is syntactically incomplete, it may take two lines of input to complete the command:
>examine What do you want to examine? >benny A deceptively fat man of uncanny agility, Benny entertains his customers crushing coconuts against his forehead when the mood strikes him.
The following table shows the mean number of inputs entered per game session for each experimental group.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 50.4 | 41.2 | 46.3 |
Queen | 54.7 | 48.4 | 51.2 |
UI Mean | 52.3 | 45.2 |
While, on average, players entered slightly more inputs when using the command line interface or when playing The Queen's Heart, neither the difference between the UIs or between the games was statistically significant.
For this metric, any command that does not affect the game state in an author-supported way is counted as an error.
As discussed above, an input may be syntactically invalid:
>door The story doesn't understand that command.
An input may be syntactically valid command, but not refer to any present objects:
>give mole to queen You see no queen here.
>buy coffee You have no money.
A command may be syntactically valid and refer to existing objects, yet still be refused. This refusal may be a generic refusal as provided by the TADS library:
>attack customer You cannot attack those.
Or it may be a refusal specifically written by the game author:
>talk to customer As John Covarth, you attract less interest than Benny's food. >ask benny for sandwich Food will take too much time, and you must change now!
Because of the default behaviors provided by the TADS library, some commands may succeed and affect the state of the game world unless specifically prevented by the game author. For example:
>sit (on the floor) Okay, you're now sitting on the floor. >stand Okay, you're now standing.
However, since sitting or standing does not help with any puzzle or in any other way advance either of the games' stories, these commands still achieve nothing for these particular games. Therefore, as verbs not supported by the game author, they are counted as errors here.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 28.9% | 0.3% | 17.4% |
Queen | 20.8% | 0% | 10.0% |
UI Mean | 25.1% | 0.1% |
Due to a program bug, a single command entered using Skald failed to produce a valid command.
For the CLI, 25.1% of lines entered could be considered errors. If we disregard the commands that were simply not author-supported (but were still successfully processed by the default library) and look at only inputs that produced some sort of error message, this is still around 20.8% of inputs.
These error rate measures are in aggregate across all users. However, users who generate high input error rates tend to play shorter games and enter fewer inputs overall. If we computed a mean error rate by individual user, the mean user error rate over both games when using the CLI is 36.4%!
Every user of the command line UI produced at least one error or ineffective command.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 8.9 | 9.4 | 9.1 |
Queen | 15.2 | 11.2 | 13.0 |
UI Mean | 11.7 | 10.4 |
Across both groups, the maximum time spent on a game was 23.5 minutes and the minimum was 4.5 minutes. The difference between the two games was statistically significant, t(19) = 3.20, p = .004, but the differences between the two interfaces was not.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 5.85 | 4.85 | 5.4 |
Queen | 4.23 | 4.53 | 4.4 |
UI Mean | 5.1 | 4.6 |
Given the significant difference in time spent playing the two different games, the difference in game speeds was also significant, t(19) = 2.47, p = .02. However, the differences between the two interfaces was not significant.
Within both Group CS and Group SC, the fastest player using the CLI was also the fastest player when using Skald.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 3.45 | 3.78 | 3.6 |
Queen | 3.67 | 4.09 | 3.9 |
UI Mean | 3.55 | 3.95 |
While Skald was rated slightly higher regarding user's sense of agency within the game world, the difference was not significantly significant.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 3.73 | 3.56 | 3.65 |
Queen | 2.67 | 2.55 | 2.6 |
UI Mean | 3.25 | 3 |
Regarding story-level direction, there was a significant difference between the two game means, t(19) = 3.9, p < .001, but not between the user interfaces.
As proposed by the poetics for interactive narrative discussed previously (Tomaszewski & Binsted 2006, 2004), a story's content should provide narrative constraints or similar implicit suggestion regarding the appropriate user action. These results were a nice confirmation that the story does indeed make a difference to this experience.
Captain Fate is a classic superhero scenario. The first paragraph explicitly lays out the player's objective: find a place to change into his costume. It was fairly obvious what the player was supposed to accomplish; the challenge lay in figuring out how to achieve it.
On the other hand, The Queen's Heart, opens on an unconventional scene of a lonely goblin living in a dark tunnel. No end-game objective is given, only a suggestion of what the first action might be regarding a nearby mole. Interestingly, as shown above, the object and action affordances were still clearer in The Queen's Heart. So players could more easily find things to do in this game, but they didn't know what they were supposed to do.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 4.00 | 4.00 | 4.0 |
Queen | 4.00 | 4.18 | 4.1 |
UI Mean | 4.0 | 4.1 |
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 3.73 | 3.44 | 3.6 |
Queen | 3.78 | 3.73 | 3.75 |
UI Mean | 3.75 | 3.6 |
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 4.09 | 3.56 | 3.85 |
Queen | 3.78 | 4.18 | 4.0 |
UI Mean | 3.95 | 3.9 |
These three questions explore the essential features of a classic interactive fiction experience: that it has at least some narrative structure, that it includes challenges and puzzles for the player to work through, and that there is some sense of significant choice on the part of the player.
While each of these dimensions received a fairly favorable rating, there was no statistically significant difference based on the user interface used, game played, or the participants' groups.
CLI | Skald | Game Mean | |
---|---|---|---|
Fate | 3.00 | 3.67 | 3.3 |
Queen | 3.44 | 3.73 | 3.6 |
UI Mean | 3.2 | 3.7 |
Despite the differences in the games and the user interfaces, neither had a significant effect on the overall enjoyment of the games.
Below are the answers given in response to three open-ended questions. These answers have been tagged with interesting themes discovered across multiple responses. This tagging is fairly subjective. It was performed by only a single tagger, and so it lacks any inter-rater reliability. Also, it was sometimes difficult to determine the full meaning of a short comment. When more than one interpretation was possible, tags were given followed by a question mark. Such "questionable tags" were only counted as a half-point in the summary totals.
However, despite these limitations, these tags still serve to highlight issues related to some of the aspects explored above as well as to raise some other interesting issues.
The following are answers to the question, What was the LEAST enjoyable aspect of this game?.
Group CS - Captain Fate, CLI
Group SC - Captain Fate, Skald
Group CS - The Queen's Heart, Skald
Group SC - The Queen's Heart, CLI
The following is a summary of the tag counts for each experimental condition.
Group CS: Fate, CLI | Group SC: Fate, Skald | Group CS: Queen, Skald | Group SC: Queen, CLI | |
---|---|---|---|---|
unclear affordances | 5 | 3 | 1 | 5.5 |
actions too low-level | 4 | 1 | ||
puzzles too challenging | 1.5 | 2 | 0.5 | 1 |
unclear objective | 1.5 | 1 | 3 | 4 |
story event | 3 | |||
limited or unclear story paths | 2 | 1 | 1 | |
UI behavior | 1.5 | 1 | 2.5 | |
disliked click input | 2.5 |
We can see here some of the same trends noted above with the qualitative measures. For example, more people complained about unclear affordances when using the CLI. There were more complaints about actions being too low-level in Captain Fate, while The Queen's Heart lacked a clear story objective.
It is interesting to note how many complaints there were about the click-based input from Group CS when playing The Queen's Heart. This is the group that gave the highest rating to object affordances, action affordances, command construction, world-level agency, and enjoyment. So while they found the UI easier to use and seemed more immersed in the story, they still missed the ability to type commands.
The following are answers to the question, What was the MOST enjoyable aspect of this game?
Group CS - Captain Fate, CLI
Group SC - Captain Fate, Skald
Group CS - The Queen's Heart, Skald
Group SC - The Queen's Heart, CLI
The following is a summary of the tag counts for each experimental condition.
Group CS: Fate, CLI | Group SC: Fate, Skald | Group CS: Queen, Skald | Group SC: Queen, CLI | |
---|---|---|---|---|
UI | 1 | 1 | 1 | |
game medium/genre | 1 | 0.5 | ||
narration | 1 | 1 | 4 | 2.5 |
world simulation/verisimilitude | 0.5 | 0.5 | 0.5 | |
clear affordances | 2 | 3 | ||
puzzle challenges | 3.5 | 0.5 | 3 | |
interaction | 2 | 2 | 1 | 1.5 |
progress/advancing the story | 0.5 | 0.5 | 1 | 0.5 |
story | 1 | 2.5 | 2 | 2.5 |
fun/humor | 2 |
As the name interactive fiction suggests, this table highlights that the most enjoyable aspects include the text narration, the unfolding story, and ability to interact in that world. It is interesting to note that, while the Skald interface provided clear affordances, it was the CLI that made the puzzle challenges enjoyable. Also, the UI comments were fairly split--some liked the keyboard better and some liked the point-and-click better.
See Appendix F for participants' responses to the question, Any other comment?
In the last step of the study, participants explicitly compared the two user interfaces on a number of different measures.
Question | Group | Response Counts | Mean | ||||
---|---|---|---|---|---|---|---|
CLI (definitely/-2) | CLI (slightly/-1) | No Preference | Skald (slightly/+1) | Skald (definitely/+2) | |||
Which interface did you find easier to use? | TS | 2 | 0 | 0 | 2 | 7 | 1.09 |
ST | 2 | 1 | 1 | 1 | 4 | 0.44 | |
Total | 4 | 1 | 1 | 3 | 11 | 0.80 | |
Which interface did you find faster to use? | TS | 1 | 1 | 0 | 2 | 7 | 1.18 |
ST | 4 | 0 | 1 | 2 | 2 | -0.22 | |
Total | 5 | 1 | 1 | 4 | 9 | 0.55 | |
Which interface was more enjoyable to use? | TS | 3 | 0 | 1 | 1 | 6 | 0.64 |
ST | 3 | 1 | 1 | 2 | 2 | -0.11 | |
Total | 6 | 1 | 2 | 3 | 8 | 0.30 | |
Which interface made the game more interesting or engaging? | TS | 2 | 0 | 1 | 2 | 6 | 0.91 |
ST | 3 | 3 | 1 | 0 | 2 | -0.56 | |
Total | 5 | 3 | 2 | 2 | 8 | 0.25 | |
If you were to play a third game, which interface would you prefer to use? | TS | 2 | 0 | 1 | 1 | 7 | 1.00 |
ST | 4 | 0 | 2 | 1 | 2 | -0.33 | |
Total | 6 | 0 | 3 | 2 | 9 | 0.40 |
Based on the mean of all responses for each question (shown in bold), participants slightly favored Skald over the CLI for all categories. The means range from 0.25 to 0.80 on a -2 (CLI) to 2 (Skald) scale.
However, there is much more going on here. First of all, the distribution of responses for every question are clustered at the two extremes of the scale, with few participants falling in the middle. So most people strongly favored one interface or the other, with slightly more favoring Skald over the traditional command-based UI.
Secondly, there was a noticeable difference between the two group means. Group SC favored the CLI side of scale for all question except ease of use (where they still had a lower score than Group CS). On the other hand, Group CS favored Skald over the CLI-based UI. This may just be an anomaly of group formation, or it may be that people favored the interface they played second.
Participants' additional comments can be found in Appendix G. These comments tended to underscore the qualitative results explored above:
As mentioned previously, there was an uneven distribution of prior experience between the randomly-assigned groups regarding using a command line and playing interactive fiction. Specifically, Group SC rated themselves as more familiar with a command line interface and Group CS rated themselves more familiar with the IF genre. It is therefore worth taking a closer look at the correlation between individuals' prior experience and their reported game experience.
A Pearson correlation score of r >= |0.44| is considered statistically significant, equivalent to p <= .05.
Across all participants, there was a significant correlation between reported command line experience and the participant's reported ability to construct commands (r = 0.64), agency within the game world (r = 0.52), and their enjoyment of the game overall (r = 0.63) while using the CLI. When using the Skald UI, there was no correlation with previous command line experience.
Assuming that Group SC's higher degree of command line experience may have inflated their ratings on these three dimensions when using the CLI does not endanger any of the conclusions drawn above. In fact, a slightly lower rating from Group SC on their enjoyment of the CLI may have lead to a statistically significant result regarding the effect of Skald on enjoyment.
There was a significant correlation between participants' familiarity with the command line and their preferences for the traditional IF command interface as measured by the final comparison survey. These correlated measures included rating the CLI as faster to use (r = 0.53 ), more enjoyable to use (r = 0.45), creating a more interesting or engaging game (r = 0.63), and preferring it as the UI in future games (r = 0.52). However, there as not a significant correlation with finding the CLI easier to use (r = 0.26).
Across all participants, those with prior interactive fiction experience reported finding it easier to recognize interactive objects (r = 0.71), determine possible actions supported by those objects (r = 0.69), construct commands (r = 0.55), and generally exhibit agency in the game world (r = 0.46), but only when using the Skald interface. Surprisingly, this prior experience showed no correlations at all with response ratings when using a traditional command-based IF user interface.
If we assume that Group CS may thus have rated Skald higher on these four measure due to their higher levels of prior IF experience than Group SC, it would bring into question some of the conclusions above regarding Skald's ease of use for all users. However, those conclusions were all very significant (p < .001) and so would probably not be unduly influenced by the slightly uneven distribution of IF experience between the two groups.
There was no correlation with previous IF experience and any preference for one UI over the other, as reported on any rating in the final comparison survey.
Argax Project : Thesis :
A Rough Draft Node http://www2.hawaii.edu/~ztomasze/argax |
Last Edited: 31 Jan 2015 ©2013 by Z. Tomaszewski. |