Argax Project

Node Status: COMPLETE

Game Data

I replayed the 48 game transcripts of the 24 complete participants and gathered various measures of Marlinspike's story threads and event history structure at the end of each game. (Due to a programming bug, two of these game sessions were replayable but it was not possible to gather the subsequent story data. Therefore, the only data available for those two games is the time spent and the number of commands entered.) This game data summarizes the nature of the games played and the system's performance in terms of its own measures--story structure completeness and unity.

Time Spent and Commands Entered

The mean time spent on a game session was 21.5 minutes (SD = 16.3 minutes), with a max of 94.1 minutes and min of 5.5 minutes. (These measures were calculated after dropping an extreme outlier: a game session that lasted over 5 hours. This participant likely took a long break during the game.)

Participants spent significantly more time on the first game (M = 28.1 minutes) than on the second game (M = 15.2 minutes), but this is to be expected since the first game also included a introductory tutorial session while the second game did not. Although it was not possible to separately measure the time spent on the tutorial from the time spent on the game that followed it, it was possible to determine which commands were typed during the tutorial and which were typed during the game session proper.

The mean number of commands entered during the tutorial was 38.4 lines (SD = 18.6 lines), with a max of 87 lines and a min of 20 lines.

Disregarding the tutorial, the mean number of commands for the two game sessions as 75.2 lines (SD = 37.3 lines), with a max of 175 lines and a min of 27 lines. There was no significant difference between the mean number of commands entered for the first game (79.8 lines) compared to the second game (70.63 lines), t(21) = 1.26, p = .22. The difference between those sessions played with reincorporation on (65.79 lines) and those played with reincorporation off (84.67) was more noticable, although also not significant, t(21) = 1.69, p = .11.

Dividing the total number of commands entered during a game session by the time spent on that session provides a measure of the player's speed of play in commands per minute (cpm). The mean speed of participants was 5.23 cpm (SD = 1.93 cpm), with a max of 9.85 cpm and a min of 1.86 cpm. There was a significant difference between mean speed during the first game (4.48 cpm) and the second game (5.94 cpm), t(23) = 1.00, p = .003. All of these speed measures include the tutorial as part of the first game session.

Thus, on average, participants spent about 30 minutes playing the tutorial and first game, followed by 15 minutes playing the second game. Although their second game was not significantly shorter in terms of commands entered, they played significantly faster. This greater speed during the second game makes sense since players were now familiar with the world and could skim many of the text descriptions of rooms, objects, and even some of the more common events.

Verbs, Actions, and Scenes Used

It is interesting to see how the potential range of Demeter verbs, actions, and scenes were utilized in practice. The following tables show the number of games in which each verb, action, and scene was used in the 46 games for which data was available, the maximum number of times each was used within a single game, and the mean number of uses for each in those games in which it occurred at least once.

VerbsGames UsedMax Count In One GameMean Count in Games Used
Ask 0 (0%) 0 0
Attack 16 (35%) 4 1.63
Close 13 (28%) 4 1.62
Drop 4 (9%) 4 2
Eat 3 (7%) 1 1
Enter 44 (96%) 29 10.52
Examine35 (76%) 33 8.11
Give 2 (4%) 2 1.5
Go 45 (98%) 32 10.71
Insert 0 (0%) 0 0
Kill 4 (9%) 1 1
Kiss 2 (4%) 1 1
Knock 3 (7%) 1 1
Lock 1 (2%) 2 2
Look 46 (100%)33 7.46
Open 43 (93%) 10 5.14
Push 1 (2%) 1 1
PushDir2 (4%) 4 3.5
PutOn 0 (0%) 0 0
Rape 0 (0%) 0 0
Take 33 (72%) 7 2.36
Talk 36 (78%) 25 9.72
Tell 0 (0%) 0 0
Touch 0 (0%) 0 0
Unlock 27 (59%) 5 1.59
Use 0 (0%) 0 0
Wait 43 (93%) 41 11.21

These measures reflect only those verbs used directly by the player. For example, players were instructed in the tutorial to use the Talk verb to converse with other characters. This provided a menu of options which then produced appropriate Ask and Tell deeds behind the scenes. So Ask and Tell were still used internally by most games, but they were not used directly by any of the players.

The mean number of unique verbs used during a single game was 8.74 verbs (SD = 2.37 verbs), with a max of 14 unique verbs and a min of 2 verbs. From the mean uses above, we can see that the most common player deeds involved waiting or moving between rooms, followed by talking to other characters and examining the world.


ActionsGames UsedMax Count In One GameMean Count in Games Used
ACQUIRE 33 (72%) 10 3.39
AFFRONT 44 (96%) 29 10.68
ASSAULT 8 (17%) 4 1.75
ASSIST 3 (7%) 5 3
ATTEMPT 32 (70%) 10 3.59
BATTERY 11 (24%) 2 1.36
CHANGE_STATE 7 (15%) 2 1.14
CONVERSE 45 (98%) 53 15.87
DAMAGE 2 (4%) 1 1
DEFY 0 (0%) 0 0
DELAY 45 (98%) 41 12.71
END_STATE 37 (80%) 7 2.89
ENDEAR 46 (100%) 37 12.41
EXAMINE 46 (100%) 49 16.43
HARASS 2 (4%) 1 1
HINDER 0 (0%) 0 0
INTERACT 0 (0%) 0 0
LOSE 6 (13%) 5 2.83
MANIPULATE 43 (93%) 20 7.93
OFFEND 31 (67%) 24 11.13
OPPOSE 41 (89%) 37 12.12
ROMANCE 1 (2%) 1 1
START_STATE 28 (61%) 8 2.57
SUPPORT 45 (98%) 67 21.13
TRAVEL 46 (100%) 10040.37
USE 0 (0%) 0 0

The mean number of unique actions resulting from PC deeds during a single game was 13.1 actions (SD = 2.25 actions), with a max of 17 unique actions and a min of 8 actions. All games involved some influencing of NPC opinions. For example, all games had at least one ENDEAR player action, which would increase the affinity of the recipient NPC for the PC.


ScenesGames UsedMax Count In One GameMean Count in Games Used
A_Long_Night 29 (63%) 1 1
Awaiting_GoParty 1 (2%) 1 1
Captains_Message 46 (100%) 1 1
Discussion_Concludes 24 (52%) 2 1.17
Discussion_Continues 39 (85%) 13 3.95
Discussion_Curtailed 9 (20%) 3 1.44
Discussion_Interrupted 31 (67%) 4 1.94
Discussion_Offscreen 17 (37%) 2 1.29
Discussion_Query 13 (28%) 2 1.23
Discussion_Starts 43 (93%) 8 2.65
Evidence_Revealed 46 (100%) 8 3.67
GoParty_Departs 30 (65%) 2 1.07
GoParty_Eviction 0 (0%) 0 0
GoParty_Moves_Along 46 (100%) 54 18
GoParty_Offscreen 18 (39%) 6 3
GoParty_Reentry 15 (33%) 1 1
GoParty_Reports 24 (52%) 3 1.25
GoParty_Requests_Follow 23 (50%) 10 2.17
GoParty_Returns 29 (63%) 3 1.24
Impromptu_GoParty 23 (50%) 3 1.57
Landfall 37 (80%) 1 1
NPC_Annoyed 3 (7%) 1 1
NPC_Defends 3 (7%) 4 2.33
NPC_Defends_Other 3 (7%) 4 2.33
NPC_Defied 0 (0%) 0 0
NPC_Interdicts 3 (7%) 2 1.67
NPC_Observes_Destruction 2 (4%) 1 1
NPC_Offered_Item_By 2 (4%) 2 1.5
NPC_Outraged 3 (7%) 5 2.67
NPC_Proposes_Plan 43 (93%) 20 6.23
NPC_Rebuffs 1 (2%) 1 1
NPC_Replies 17 (37%) 3 1.41
NPC_Replies_To_Opinion 45 (98%) 63 18.8
NPC_Replies_To_Plan_Action 17 (37%) 4 2
NPC_Wooed 0 (0%) 0 0
NPCs_React 45 (98%) 32 10.76
PC_Dies 9 (20%) 1 1
PC_Unconscious 1 (2%) 1 1
Pursues_Plan 46 (100%) 12 5.33
Revenant_Acts 46 (100%) 16 5.04
Revenant_Attacks 27 (59%) 5 1.93
Waiting_through_the_Day 30 (65%) 1 1

This list of scenes also includes all reactions and components. These scene results hint at some of the under-utilized story paths. For example, most of the NPC_* scenes are reactions, and these were played in very few games. Most of the exceptions to this--such as NPC_Proposes_Plan and NPC_Replies_To_Opinion--stem from discussions where NPCs respond to each other. In general, participants did not interact with NPCs non-verbally.

Participants were also very active--only one game contained Awaiting_GoParty in which a PC did not form, join, or follow an exploration GoParty into the Zeppelin. Of the two possible ending scenes, the "successful" ending, Landfall, occurred four times more often than PC_Dies.

The mean number of unique scenes (including reactions and components) played during a single game was 19.3 scenes (SD = 3.5 scenes), with a max of 26 unique scenes and a min of 9 scenes.

No significant differences in the number of unique verbs, actions, or scenes used per game were found between the two experimental groups, between first and second games, or when reincorporation was on or off. Thus, the range of content used within games did not vary systematically across the different conditions.

Internal Story Structure

All 46 game sessions were complete stories. That is, each contained a single "main" story thread of events that connected the beginning scene to an ending scene with one or more middles scenes between them. (This is not always the case with Demeter. For example, two of the games played by the incomplete participants contained starting and ending events that were not connected by a single thread.)

The following table shows various measures of story length.

MeasureMeanSDMaxMinReinc ONReinc OFFReinc Significance
Root Events 84.7 34.21 17340 72.55 95.83 t(21) = 2.39, p = 0.03
Root Scenes 35.54 14.39 71 17 30.41 40.25 t(21) = 2.41, p = 0.03
Player Deeds 61.43 28.89 14923 52.36 69.75 t(21) = 2.06, p = 0.05
Total Events 349.54 162.78820131298.18396.63t(21) = 2.10, p = 0.05

As described previously, events are represented in Marlinspike by a tree-like structure of recasts and sub-events. These sub-events represent components of a scene or other interpretations of an action. The top-most event in this tree structure is the root event that represents either the initial player deed before recasts or the parent scene that includes the various components.

Therefore, the number of Root Events indicates how many whole events occurred during the story. Total Events also includes a count of all the recasts and other sub-events that were components of those root events. Root Scenes is a count of those root events that were scenes. Player Deeds is a count of those player commands that resulted in a deed reported to the drama manager and thus represented as an action at the story level.1

These four measures are very closely related. Since a story is formed by player actions and resulting scene responses, these counts will correlate with the number of root events, as will the number of total events.

Games were significantly shorter in all measures when reincorporation was on versus when it was off. This was unexpected, but it makes sense in retrospect. In certain situations--such as during a discussion--irrelevant scenes--such as Pursues_Plan--are randomly selected when reincorporation is off. Thus, when reincorporation is off, a discussion may be punctuated by occasional moanings from an NPC who is pursing the plan of inaction. In contrast, only discussion scenes tend to play during a discussion sequence when reincorporation is on. This is one example of how certain parts of the story can be extended by less relevant scenes when reincorporation is off. Since the player is prompted to act after each scene, these extra scenes will also increase the number of player deeds required to complete the story.

The next table shows a number of related measures of the stories' unity.

MeasureMeanSDMaxMinReinc ONReinc OFFReinc Significance
Main Thread Size 20.93 12.96 67 6 28.64 13.88 t(21) = 4.83, p < 0.001
Main Thread Weight 13.74 3.3 25 9 15.59 12.04 t(21) = 4.32, p < 0.001
Threads Spliced 2.09 1.95 8 0 3.23 1.04 t(21) = 3.85, p < 0.001
Extra Threads 9.04 5.59 25 2 5.45 12.33 t(21) = 5.14, p < 0.001
Unthreaded Unique Weight41.15 27.49 1249 22.82 57.96 t(21) = 5.37, p < 0.001

Main Thread Size is the number of root events in the thread that contains the ending scene of the story. As mentioned above, all these main threads also contained the beginning scene. The Main Thread Weight is equal to the import of the event with the highest import in the main thread, plus 1 for every four events in the main thread. Thus, Main Thread Weight is closely related to the length of the main thread.

Threads Spliced is the number of times during the story that two threads were successfully combined to form a single thread. Extra Threads are the number of threads besides the main thread that existed at the end of the story. Ideally, this value would be 0. However, as described in Chapter V, there are reasons stemming from how scene preconditions are authored that many short threads can be left unreincorporated even with a fairly passive or cooperative player. Unthreaded Unique Weight is the sum of the weight of the unique material for each Extra Thread. Thus, this measure is closely correlated to the number of extra threads, but it also reflects the import of the events of that unthreaded material.

As shown above by the reincorporation means, turning reincorporation on made an extremely significant difference in the internal structure of the finished story. On average, twice as many root events were tied into the main thread, and three times as many threads were spliced together. There were also fewer than half the number of extra threads at the end of the story when reincorporation was used. Thus, reincorporation successfully produced a much more unified story in terms of the system's own measures.

There was no significant difference in either story length or story unity resulting from play order.

Player Agency

The first measure of player agency is world-level agency: the percentage of input attempts that successfully produced deeds at the story level.

MeasureMeanSDMaxMinPlay 1 MeanPlay 2 MeanPlay Significance
Percent of inputs that produced a deed 80.5%10.7100%50%76.6%84.0%t(21) = 3.00, p = 0.01

While there was a wide variation between the participants, the mean percentage here indicates that 1 of every 5 commands attempted by players overall resulted in an error or otherwise failed to affect the world or story. The mean level of world agency significantly increased between the two games sessions, however.


The first requirement for story-level user agency is that the user perform actions of significant import.

MeasureMeanSDMaxMin
Mean Action Import2.460.162.892.12
Significant Action Count3.193.15120

Mean Action Import is the mean import of all player actions. Thus, it indicates the average import of the story-level effects of the player's deeds. Significant Action Count is simply the number of player actions with an import of 4 or higher. These are actions that would be significant enough to start their own threads if not already relevant to an existing thread.

These values show that the bulk of the player's actions were of fairly low import--along the lines of traveling, exploring, manipulating objects, and interacting with NPCs in a mild manner. This is not surprising, since these are exactly the most common deeds performed by players, as shown above. However, on average, players produced only 3 events of significant import per story. So the demand placed on Marlinspike to reincorporate all significant player events is not particularly high in an average Demeter game.

Neither of these two measures of player action import differed signficantly between the first game and the second game or when reincorporation was used or not. This suggests that players did not significantly vary the kinds of actions they performed between different game sessions or in response to reincorporation.


Marlinspike's task is to then reincorporate user actions--particularly those of high import--into the finished story structure.

MeasureMeanSDMaxMinReinc ONReinc OFFReinc Significance
% of Actions in Threads 55.0%11.8pp76.1%30.8%59.2%51.2%t(21) = 2.05, p = 0.053
% of Actions in Main Thread 24.5%16.2pp63.1%4.4%36.8%13.2%t(21) = 7.02, p < 0.001
Count of Significant Actions in Main Thread 0.761.45701.410.17t(21) = 2.92, p = 0.008
% of Significant Actions in Main Thread 24.3%35.8pp100%0%43.6%3.9%t(21) = 3.97, p < 0.001

These results show that Marlinspike was significantly more successful in this task when its reincorporation feature was used. First of all, a slightly higher percentage of all the player's actions were reincorporated into threads when reincorporation was used. This difference is on the border of statistical significance.

However, the real goal for Marlinspike is to reincorporate user actions into the main story thread that connects the beginning and end of the story. When the reincorporation feature was used, the percentage of user actions reincorporated into the main thread increased threefold. For significant actions, this was an elevenfold increase.

While this performance improvement was both practically and statistically significant, it should be noted that many player actions were still not affecting the main story thread. On average, only about one third of all player actions and fewer than half of all significant player actions were made necessary to the main story thread in Demeter, even when reincorporation was used. So there is still some room for improvement here.

There was no significant difference between any of these means due to either play order or experimental group.

Notes

  1. Occasionally, in Demeter, more than one deed will be combined in a single action root event. This is because certain verbs--such as Look and Examine--do not always warrant a drama manager response. So it is possible to have two Examines followed by a Talk combined as a single action event. For this reason, Root Scenes + Player Deeds does not always equal the number of Root Events.

ToDo