I replayed the 48 game transcripts of the 24 complete participants and gathered various measures of Marlinspike's story threads and event history structure at the end of each game. (Due to a programming bug, two of these game sessions were replayable but it was not possible to gather the subsequent story data. Therefore, the only data available for those two games is the time spent and the number of commands entered.) This game data summarizes the nature of the games played and the system's performance in terms of its own measures--story structure completeness and unity.
The mean time spent on a game session was 21.5 minutes (SD = 16.3 minutes), with a max of 94.1 minutes and min of 5.5 minutes. (These measures were calculated after dropping an extreme outlier: a game session that lasted over 5 hours. This participant likely took a long break during the game.)
Participants spent significantly more time on the first game (M = 28.1 minutes) than on the second game (M = 15.2 minutes), but this is to be expected since the first game also included a introductory tutorial session while the second game did not. Although it was not possible to separately measure the time spent on the tutorial from the time spent on the game that followed it, it was possible to determine which commands were typed during the tutorial and which were typed during the game session proper.
The mean number of commands entered during the tutorial was 38.4 lines (SD = 18.6 lines), with a max of 87 lines and a min of 20 lines.
Disregarding the tutorial, the mean number of commands for the two game sessions as 75.2 lines (SD = 37.3 lines), with a max of 175 lines and a min of 27 lines. There was no significant difference between the mean number of commands entered for the first game (79.8 lines) compared to the second game (70.63 lines), t(21) = 1.26, p = .22. The difference between those sessions played with reincorporation on (65.79 lines) and those played with reincorporation off (84.67) was more noticable, although also not significant, t(21) = 1.69, p = .11.
Dividing the total number of commands entered during a game session by the time spent on that session provides a measure of the player's speed of play in commands per minute (cpm). The mean speed of participants was 5.23 cpm (SD = 1.93 cpm), with a max of 9.85 cpm and a min of 1.86 cpm. There was a significant difference between mean speed during the first game (4.48 cpm) and the second game (5.94 cpm), t(23) = 1.00, p = .003. All of these speed measures include the tutorial as part of the first game session.
Thus, on average, participants spent about 30 minutes playing the tutorial and first game, followed by 15 minutes playing the second game. Although their second game was not significantly shorter in terms of commands entered, they played significantly faster. This greater speed during the second game makes sense since players were now familiar with the world and could skim many of the text descriptions of rooms, objects, and even some of the more common events.
It is interesting to see how the potential range of Demeter verbs, actions, and scenes were utilized in practice. The following tables show the number of games in which each verb, action, and scene was used in the 46 games for which data was available, the maximum number of times each was used within a single game, and the mean number of uses for each in those games in which it occurred at least once.
Verbs | Games Used | Max Count In One Game | Mean Count in Games Used |
---|---|---|---|
Ask | 0 (0%) | 0 | 0 |
Attack | 16 (35%) | 4 | 1.63 |
Close | 13 (28%) | 4 | 1.62 |
Drop | 4 (9%) | 4 | 2 |
Eat | 3 (7%) | 1 | 1 |
Enter | 44 (96%) | 29 | 10.52 |
Examine | 35 (76%) | 33 | 8.11 |
Give | 2 (4%) | 2 | 1.5 |
Go | 45 (98%) | 32 | 10.71 |
Insert | 0 (0%) | 0 | 0 |
Kill | 4 (9%) | 1 | 1 |
Kiss | 2 (4%) | 1 | 1 |
Knock | 3 (7%) | 1 | 1 |
Lock | 1 (2%) | 2 | 2 |
Look | 46 (100%) | 33 | 7.46 |
Open | 43 (93%) | 10 | 5.14 |
Push | 1 (2%) | 1 | 1 |
PushDir | 2 (4%) | 4 | 3.5 |
PutOn | 0 (0%) | 0 | 0 |
Rape | 0 (0%) | 0 | 0 |
Take | 33 (72%) | 7 | 2.36 |
Talk | 36 (78%) | 25 | 9.72 |
Tell | 0 (0%) | 0 | 0 |
Touch | 0 (0%) | 0 | 0 |
Unlock | 27 (59%) | 5 | 1.59 |
Use | 0 (0%) | 0 | 0 |
Wait | 43 (93%) | 41 | 11.21 |
These measures reflect only those verbs used directly by the player. For example, players were instructed in the tutorial to use the Talk
verb to converse with other characters. This provided a menu of options which then produced appropriate Ask
and Tell
deeds behind the scenes. So Ask
and Tell
were still used internally by most games, but they were not used directly by any of the players.
The mean number of unique verbs used during a single game was 8.74 verbs (SD = 2.37 verbs), with a max of 14 unique verbs and a min of 2 verbs. From the mean uses above, we can see that the most common player deeds involved waiting or moving between rooms, followed by talking to other characters and examining the world.
Actions | Games Used | Max Count In One Game | Mean Count in Games Used |
---|---|---|---|
ACQUIRE | 33 (72%) | 10 | 3.39 |
AFFRONT | 44 (96%) | 29 | 10.68 |
ASSAULT | 8 (17%) | 4 | 1.75 |
ASSIST | 3 (7%) | 5 | 3 |
ATTEMPT | 32 (70%) | 10 | 3.59 |
BATTERY | 11 (24%) | 2 | 1.36 |
CHANGE_STATE | 7 (15%) | 2 | 1.14 |
CONVERSE | 45 (98%) | 53 | 15.87 |
DAMAGE | 2 (4%) | 1 | 1 |
DEFY | 0 (0%) | 0 | 0 |
DELAY | 45 (98%) | 41 | 12.71 |
END_STATE | 37 (80%) | 7 | 2.89 |
ENDEAR | 46 (100%) | 37 | 12.41 |
EXAMINE | 46 (100%) | 49 | 16.43 |
HARASS | 2 (4%) | 1 | 1 |
HINDER | 0 (0%) | 0 | 0 |
INTERACT | 0 (0%) | 0 | 0 |
LOSE | 6 (13%) | 5 | 2.83 |
MANIPULATE | 43 (93%) | 20 | 7.93 |
OFFEND | 31 (67%) | 24 | 11.13 |
OPPOSE | 41 (89%) | 37 | 12.12 |
ROMANCE | 1 (2%) | 1 | 1 |
START_STATE | 28 (61%) | 8 | 2.57 |
SUPPORT | 45 (98%) | 67 | 21.13 |
TRAVEL | 46 (100%) | 100 | 40.37 |
USE | 0 (0%) | 0 | 0 |
The mean number of unique actions resulting from PC deeds during a single game was 13.1 actions (SD = 2.25 actions), with a max of 17 unique actions and a min of 8 actions. All games involved some influencing of NPC opinions. For example, all games had at least one ENDEAR player action, which would increase the affinity of the recipient NPC for the PC.
Scenes | Games Used | Max Count In One Game | Mean Count in Games Used |
---|---|---|---|
A_Long_Night | 29 (63%) | 1 | 1 |
Awaiting_GoParty | 1 (2%) | 1 | 1 |
Captains_Message | 46 (100%) | 1 | 1 |
Discussion_Concludes | 24 (52%) | 2 | 1.17 |
Discussion_Continues | 39 (85%) | 13 | 3.95 |
Discussion_Curtailed | 9 (20%) | 3 | 1.44 |
Discussion_Interrupted | 31 (67%) | 4 | 1.94 |
Discussion_Offscreen | 17 (37%) | 2 | 1.29 |
Discussion_Query | 13 (28%) | 2 | 1.23 |
Discussion_Starts | 43 (93%) | 8 | 2.65 |
Evidence_Revealed | 46 (100%) | 8 | 3.67 |
GoParty_Departs | 30 (65%) | 2 | 1.07 |
GoParty_Eviction | 0 (0%) | 0 | 0 |
GoParty_Moves_Along | 46 (100%) | 54 | 18 |
GoParty_Offscreen | 18 (39%) | 6 | 3 |
GoParty_Reentry | 15 (33%) | 1 | 1 |
GoParty_Reports | 24 (52%) | 3 | 1.25 |
GoParty_Requests_Follow | 23 (50%) | 10 | 2.17 |
GoParty_Returns | 29 (63%) | 3 | 1.24 |
Impromptu_GoParty | 23 (50%) | 3 | 1.57 |
Landfall | 37 (80%) | 1 | 1 |
NPC_Annoyed | 3 (7%) | 1 | 1 |
NPC_Defends | 3 (7%) | 4 | 2.33 |
NPC_Defends_Other | 3 (7%) | 4 | 2.33 |
NPC_Defied | 0 (0%) | 0 | 0 |
NPC_Interdicts | 3 (7%) | 2 | 1.67 |
NPC_Observes_Destruction | 2 (4%) | 1 | 1 |
NPC_Offered_Item_By | 2 (4%) | 2 | 1.5 |
NPC_Outraged | 3 (7%) | 5 | 2.67 |
NPC_Proposes_Plan | 43 (93%) | 20 | 6.23 |
NPC_Rebuffs | 1 (2%) | 1 | 1 |
NPC_Replies | 17 (37%) | 3 | 1.41 |
NPC_Replies_To_Opinion | 45 (98%) | 63 | 18.8 |
NPC_Replies_To_Plan_Action | 17 (37%) | 4 | 2 |
NPC_Wooed | 0 (0%) | 0 | 0 |
NPCs_React | 45 (98%) | 32 | 10.76 |
PC_Dies | 9 (20%) | 1 | 1 |
PC_Unconscious | 1 (2%) | 1 | 1 |
Pursues_Plan | 46 (100%) | 12 | 5.33 |
Revenant_Acts | 46 (100%) | 16 | 5.04 |
Revenant_Attacks | 27 (59%) | 5 | 1.93 |
Waiting_through_the_Day | 30 (65%) | 1 | 1 |
This list of scenes also includes all reactions and components. These scene results hint at some of the under-utilized story paths. For example, most of the NPC_* scenes are reactions, and these were played in very few games. Most of the exceptions to this--such as NPC_Proposes_Plan and NPC_Replies_To_Opinion--stem from discussions where NPCs respond to each other. In general, participants did not interact with NPCs non-verbally.
Participants were also very active--only one game contained Awaiting_GoParty in which a PC did not form, join, or follow an exploration GoParty into the Zeppelin. Of the two possible ending scenes, the "successful" ending, Landfall, occurred four times more often than PC_Dies.
The mean number of unique scenes (including reactions and components) played during a single game was 19.3 scenes (SD = 3.5 scenes), with a max of 26 unique scenes and a min of 9 scenes.
No significant differences in the number of unique verbs, actions, or scenes used per game were found between the two experimental groups, between first and second games, or when reincorporation was on or off. Thus, the range of content used within games did not vary systematically across the different conditions.
All 46 game sessions were complete stories. That is, each contained a single "main" story thread of events that connected the beginning scene to an ending scene with one or more middles scenes between them. (This is not always the case with Demeter. For example, two of the games played by the incomplete participants contained starting and ending events that were not connected by a single thread.)
The following table shows various measures of story length.
Measure | Mean | SD | Max | Min | Reinc ON | Reinc OFF | Reinc Significance |
---|---|---|---|---|---|---|---|
Root Events | 84.7 | 34.21 | 173 | 40 | 72.55 | 95.83 | t(21) = 2.39, p = 0.03 |
Root Scenes | 35.54 | 14.39 | 71 | 17 | 30.41 | 40.25 | t(21) = 2.41, p = 0.03 |
Player Deeds | 61.43 | 28.89 | 149 | 23 | 52.36 | 69.75 | t(21) = 2.06, p = 0.05 |
Total Events | 349.54 | 162.78 | 820 | 131 | 298.18 | 396.63 | t(21) = 2.10, p = 0.05 |
As described previously, events are represented in Marlinspike by a tree-like structure of recasts and sub-events. These sub-events represent components of a scene or other interpretations of an action. The top-most event in this tree structure is the root event that represents either the initial player deed before recasts or the parent scene that includes the various components.
Therefore, the number of Root Events indicates how many whole events occurred during the story. Total Events also includes a count of all the recasts and other sub-events that were components of those root events. Root Scenes is a count of those root events that were scenes. Player Deeds is a count of those player commands that resulted in a deed reported to the drama manager and thus represented as an action at the story level.1
These four measures are very closely related. Since a story is formed by player actions and resulting scene responses, these counts will correlate with the number of root events, as will the number of total events.
Games were significantly shorter in all measures when reincorporation was on versus when it was off. This was unexpected, but it makes sense in retrospect. In certain situations--such as during a discussion--irrelevant scenes--such as Pursues_Plan--are randomly selected when reincorporation is off. Thus, when reincorporation is off, a discussion may be punctuated by occasional moanings from an NPC who is pursing the plan of inaction. In contrast, only discussion scenes tend to play during a discussion sequence when reincorporation is on. This is one example of how certain parts of the story can be extended by less relevant scenes when reincorporation is off. Since the player is prompted to act after each scene, these extra scenes will also increase the number of player deeds required to complete the story.
The next table shows a number of related measures of the stories' unity.
Measure | Mean | SD | Max | Min | Reinc ON | Reinc OFF | Reinc Significance |
---|---|---|---|---|---|---|---|
Main Thread Size | 20.93 | 12.96 | 67 | 6 | 28.64 | 13.88 | t(21) = 4.83, p < 0.001 |
Main Thread Weight | 13.74 | 3.3 | 25 | 9 | 15.59 | 12.04 | t(21) = 4.32, p < 0.001 |
Threads Spliced | 2.09 | 1.95 | 8 | 0 | 3.23 | 1.04 | t(21) = 3.85, p < 0.001 |
Extra Threads | 9.04 | 5.59 | 25 | 2 | 5.45 | 12.33 | t(21) = 5.14, p < 0.001 |
Unthreaded Unique Weight | 41.15 | 27.49 | 124 | 9 | 22.82 | 57.96 | t(21) = 5.37, p < 0.001 |
Main Thread Size is the number of root events in the thread that contains the ending scene of the story. As mentioned above, all these main threads also contained the beginning scene. The Main Thread Weight is equal to the import of the event with the highest import in the main thread, plus 1 for every four events in the main thread. Thus, Main Thread Weight is closely related to the length of the main thread.
Threads Spliced is the number of times during the story that two threads were successfully combined to form a single thread. Extra Threads are the number of threads besides the main thread that existed at the end of the story. Ideally, this value would be 0. However, as described in Chapter V, there are reasons stemming from how scene preconditions are authored that many short threads can be left unreincorporated even with a fairly passive or cooperative player. Unthreaded Unique Weight is the sum of the weight of the unique material for each Extra Thread. Thus, this measure is closely correlated to the number of extra threads, but it also reflects the import of the events of that unthreaded material.
As shown above by the reincorporation means, turning reincorporation on made an extremely significant difference in the internal structure of the finished story. On average, twice as many root events were tied into the main thread, and three times as many threads were spliced together. There were also fewer than half the number of extra threads at the end of the story when reincorporation was used. Thus, reincorporation successfully produced a much more unified story in terms of the system's own measures.
There was no significant difference in either story length or story unity resulting from play order.
The first measure of player agency is world-level agency: the percentage of input attempts that successfully produced deeds at the story level.
Measure | Mean | SD | Max | Min | Play 1 Mean | Play 2 Mean | Play Significance |
---|---|---|---|---|---|---|---|
Percent of inputs that produced a deed | 80.5% | 10.7 | 100% | 50% | 76.6% | 84.0% | t(21) = 3.00, p = 0.01 |
While there was a wide variation between the participants, the mean percentage here indicates that 1 of every 5 commands attempted by players overall resulted in an error or otherwise failed to affect the world or story. The mean level of world agency significantly increased between the two games sessions, however.
The first requirement for story-level user agency is that the user perform actions of significant import.
Measure | Mean | SD | Max | Min |
---|---|---|---|---|
Mean Action Import | 2.46 | 0.16 | 2.89 | 2.12 |
Significant Action Count | 3.19 | 3.15 | 12 | 0 |
Mean Action Import is the mean import of all player actions. Thus, it indicates the average import of the story-level effects of the player's deeds. Significant Action Count is simply the number of player actions with an import of 4 or higher. These are actions that would be significant enough to start their own threads if not already relevant to an existing thread.
These values show that the bulk of the player's actions were of fairly low import--along the lines of traveling, exploring, manipulating objects, and interacting with NPCs in a mild manner. This is not surprising, since these are exactly the most common deeds performed by players, as shown above. However, on average, players produced only 3 events of significant import per story. So the demand placed on Marlinspike to reincorporate all significant player events is not particularly high in an average Demeter game.
Neither of these two measures of player action import differed signficantly between the first game and the second game or when reincorporation was used or not. This suggests that players did not significantly vary the kinds of actions they performed between different game sessions or in response to reincorporation.
Marlinspike's task is to then reincorporate user actions--particularly those of high import--into the finished story structure.
Measure | Mean | SD | Max | Min | Reinc ON | Reinc OFF | Reinc Significance |
---|---|---|---|---|---|---|---|
% of Actions in Threads | 55.0% | 11.8pp | 76.1% | 30.8% | 59.2% | 51.2% | t(21) = 2.05, p = 0.053 |
% of Actions in Main Thread | 24.5% | 16.2pp | 63.1% | 4.4% | 36.8% | 13.2% | t(21) = 7.02, p < 0.001 |
Count of Significant Actions in Main Thread | 0.76 | 1.45 | 7 | 0 | 1.41 | 0.17 | t(21) = 2.92, p = 0.008 |
% of Significant Actions in Main Thread | 24.3% | 35.8pp | 100% | 0% | 43.6% | 3.9% | t(21) = 3.97, p < 0.001 |
These results show that Marlinspike was significantly more successful in this task when its reincorporation feature was used. First of all, a slightly higher percentage of all the player's actions were reincorporated into threads when reincorporation was used. This difference is on the border of statistical significance.
However, the real goal for Marlinspike is to reincorporate user actions into the main story thread that connects the beginning and end of the story. When the reincorporation feature was used, the percentage of user actions reincorporated into the main thread increased threefold. For significant actions, this was an elevenfold increase.
While this performance improvement was both practically and statistically significant, it should be noted that many player actions were still not affecting the main story thread. On average, only about one third of all player actions and fewer than half of all significant player actions were made necessary to the main story thread in Demeter, even when reincorporation was used. So there is still some room for improvement here.
There was no significant difference between any of these means due to either play order or experimental group.
Look
and Examine
--do not always warrant a drama manager response. So it is possible to have two Examine
s followed by a Talk
combined as a single action event. For this reason, Root Scenes + Player Deeds does not always equal the number of Root Events.
Argax Project : Dissertation :
A Rough Draft Node http://www2.hawaii.edu/~ztomasze/argax |
Last Edited: 28 Apr 2011 ©2011 by Z. Tomaszewski. |