VRML Browser

An HCI design project by Zach Tomaszewski

for ICS 667, Fall 2002, taught by Dr. Dan Suthers

Email Correspondence
- Initial Proposal (Assignment 5)
- Preparation (Assignment 6)
- Analysis (Assignment 7)
- Design (Assignment 8)
The Plan
- Root Concept
- Plan of Attack
Gathering Data
Synthesis and Analysis
Design
Prototyping and Testing
Conclusions

Email Correspondence

Initial Proposal (Assignment 5)

From: Zach Tomaszewski <zach@areteproductions.com>
To: "Zeus" <zeus@areteproductions.com>
Sent: Thu, 17 Oct 2002 13:55:53 -1000 (HST)
Subject: Project idea: VRML browser

Zeus--
I've got a new idea for you. The other day I was exploring a few VRML (Virtual Reality Modelling Language) pages on the Web. I downloaded a couple different browser plug-ins. They're all pretty hard to use. I think we should look into developing either a full VRML browser or at least a plug-in.
The popular conception of the future of computing is of a 3D environment, and developers continue to push into this area. But usually you need special hardware--a 3D mouse, VR goggles, power gloves, etc. Yet I don't think people buy this hardware because the payoff is too small--there just isn't enough 3D software out there to make it worth the expense. And yet how can developers really explore innovative 3D solutions if no one is outfitted to use them?
I think we should position ourselves to be part of the bridge. There are some existing 3D stuff out there, such as the VRML sites on the Web. If we can produce a usable browser for these that uses only current technology, we could be part of any 3D revolution, should it come. (Even if it doesn't sweep the world in general by storm, the small group of VRML users still need a decent browser!)
Obviously if we're going to produce a more usable product than what already exists, we'll have to do a lot careful HCI design and usability testing along the way. I'm thinking of following Rosson and Carroll's scenario-based design, with a slight lean towards essential use cases rather than just scenarios.
I believe Sally's finishing up the Reynold's project next week. If so, even though Arete Production is still a small company, I think we'll have enough man-hours to undertake this in a relatively timely manner.
I hope you're interested. Later this week, I'll send you a more formal development plan.
Zach.

Preparation (Assignment 6)

From: Zach Tomaszewski <zach@areteproductions.com>
To: "Zeus" <zeus@areteproductions.com>
Sent: Thu, 31 Oct 2002 16:18:36 -1000 (HST)
Subject: VRML browser--getting started

Zeus--
I have the root concept and design plan up. (I'm straying further away from SBD toward usage-centered design.) I've reviewed a number of browsers and am starting to select a couple to run artifact analysis/usability tests with. Also, I've sketched out an interview guide.
For next time, I still need to check out a couple browsers in more detail, especially Cosmo and WorldProbe. And I need to find some subjects for the browser run and interview. I've already started on some thinking about the tasks needed for a 3D browser.
Zach.

Analysis (Assignment 7)

From: Zach Tomaszewski <zach@areteproductions.com>
To: "Zeus" <zeus@areteproductions.com>
Date: Sun, 5 Jan 2003 21:30:42 +0400
Subject: VRML browser--stewing on it

Zeus--
It's been a while! Sorry about that. Hope you had a good holiday, and all the best in the New Year!
I've moved on to the analysis and synthesis stage of things. Instead of doing a browser run now, I'm pushing it back to a comparative usability test of our new browser verses one of the popular big boys. Instead of looking in more detail at the couple 3d browsers, I looked a little wider and reviewed a few immersive gaming environments. A review of Tomb Raider made the browser list; their control changes based on context and their user-defined controls are interesting prospects.
The problem themes are still a little sparse, but the usability test might give us a bit more there. (This whole process theoretically allows such iteration, though I have my doubts about that.) I've made a bit of progress on the technical details of the prototype implementation, and it looks like that will be a "go", though it'll be a few more days before we know just what to try implementing.
The main product of this session was a load of activity claims, which are straying into the area of interaction claims. I've found the brainstorming quite helpful. Social contexts and user roles are more prevalent than I first imagined, but still probably won't be a big part in the first version of the browser. I threw in an activity scenario too, in case we need to share the sort of issues we're considering with the uninitiated.
The future step is working through those tradeoffs and picking an implementation--the task we want, the controls that support them, and in which contexts--that sounds like it will work. I'll probably double check things by doing an abstract context or two for the essential tasks.
More soon,
Zach.

Design (Assignment 8)

From: Zach Tomaszewski <zach@areteproductions.com>
To: "Zeus" <zeus@areteproductions.com>
Date: Sun, 10 Jan 2003 3:21:47 +0400
Subject: VRML browser--consolidating

Zeus--
Is there a word for that action of scooping all the playing cards together after a game and jostling them all together into a single neat pile before you can start shuffling?
Basically, this session I consolidated things down to what needs to get put into the design/prototype. There was some reiteration as I discovered a few features I'd overlooked; they've been added to Activity Claims. (Tough the further along in the project I get, the less I want to iterate.)
The most interesting realization is that I can't remember any VRML browser I've seen ever having clickable navigation buttons that control motion or orientation. All allow keyboard control, most allow control by dragging the mouse across the screen, and most have buttons toggling the navigation mode (which effects both mouse drags and the arrow keys). But I can't recall any having screen buttons that actually control motion. It's a "deficiency" I'm keen on exploiting. The downside is that I'm not sure if the Cortona XML skin plan for the prototype will still hold up, since mostly the skin maps buttons to modes. We'll see.
At this point, I am realizing that I don't know much about VRML itself. It's hard to design a usable browser if its controls are of a different paradigm than the underlying protocol for motion. This personal "limitation" was recognized at the outset in the Root Concept, so I knew it was coming. The flipside has been designing with a "beginner's mind." But as we move into Design and Prototyping, it's making me a little nervous.
I did check out what the VRML97 specs have to say about Navigation modes. It mentions 5:

Any (allows user to specify dynamically)
Walk (collision detection, terrain following, and gravity)
Fly (collision detection; terrain-following and gravity may be ignored)
Examine (viewing an object, with optional rotation around the object, etc)
None (removes all navigation and forces the user to use navigation--such as links--provided by the scene)

What's intereresting is how many browsers have adopted these modes in their interfaces. Actually, it seems these only deal with how the browser handles collision, terrain, and gravity. So they can largely be dealt with behind the scenes (though we may need to grey-out certain options, such as "up" when gravity should not be overcome. It seems the control scheme(s) I've designed remains unaffected.
Anyway, at this point I now I have two prototypes. I want to run some basic usability tests on them and clarify them in the next (and final) stage.

The Plan

Root Concept

Mission: To create a novice user interface to a 3D browser using commonly available input devices such as a keyboard and a mouse.
Rationale: The online 3D world could use a stepping stone for novice users. These users are very unlikely to have a 3D mouse or stereoscopic goggles. Also, they may not be used to the controls used by experts. A simple, versatile browser would be of great help to these users.
Stakeholders: Customers/end users (in various permutations of experience, preference, and use)
Developer [me!]
Management [instructor and classmates]
Project financiers (advertisers? stock-holders? grant? paying customers?)[Probably none in the short term]
VRML site developers
Limitations: Developer skill -- programming quality, interface/graphics attractiveness
Time (about one month, part-time)
Funding (none)
Coming up with something new over existing 3D browsers
Current, normal technology: 2D computer screen, keyboard, and a mouse
Developer's 3D/VRML knowledge. (Not knowing anything about VRML and 3D means I have "beginner's mind" and a fresh outlook. But it also means I don't know have any experience of the different common problems or situations present in 3D.)
Assumptions: That 3D worlds are novel, exciting, or useful enough to warrant attention.
That it is possible to build an acceptable 2D interface to a 3D world
That it is possible to build a better (either more efficient, easier to use, or more fun) browser interface. (If not, we should merely be supporting an existing browser rather than mindlessly swamping the market with our own.)

Plan of Attack

This is the current design plan. It is subject to change. The table of contents will probably be the best indicator of the final path taken.

The Plan: This includes the Root Concept and other meta-considerations that will guide the design process.
Gathering Data: First off, there is a number of existing 3D browsers. These should be examined for good and bad design ideas. Since I am a 3D browser novice myself, I should have a good sense of what works for the layman. However, to ensure this, some sort of short interview and comparison of existing designs by a handful of novice users would give some usability info before I get started. Findings from this stage should be consolidated into problem "themes."
Synthesis and Analysis: In the spirit of activity design, an analysis of the basic tasks and actions needed to navigate a digital 3D space should be considered, outside of any existing interface. This will provide us with some essential use cases. Also, I'll start considering tradeoffs of problem themes with some activity claims. Though I'm not planning on including chat and other social features, there are still certain contexts/modes and user roles that will influence the tasks. This step may also include an activity scenario or two.
Design: Armed with a firm sense of the tasks that need to be supported and some ideas of the tradeoffs involved for each, I'll work on the how the system will support them. This will involve both the interaction/controls and information/display aspects. I will probably sum this up in an abstract prototype, though some scenarios might be helpful too.
Prototype and Test: Now to instantiate the design. I'd also like to briefly test it to see if people prefer the design over existing browsers.

Gathering Data

Existing Browser Review (Artifact Analysis)

Most of the browsers below are from Web3D Consortium's list of browsers.

Installed Browsers

These browsers I actually installed and used to explore or manipulate a 3D world

Actify. Mostly deals with CAD. Handles some VRML. Free plug-in. Only handles object manipulation, not movement of the view through a world. In one mode, the user can grab and move object around in place with the mouse. In pan mode, the user can drag the object up, down, left, or right on the screen. Also has zoom controls.

blaxxun Contact. These people have been severely influenced by Neal Stephenson's virtual cyberpunk world, Metaverse, from Snow Crash. They even have the Black Sun club in their online world Cybertown. And so their browser supports avatars, gestures, chat, etc.

Can move with the mouse or keyboard. I find keyboard easier (due to excessive old-school computer gaming?) . With mouse, click and drag in a direction to move/turn that way. Either way, there are four cardinal 2D directions to choose from.

There is a right-click menu to select mode (also key combinations):

Walk. Move forward, back, turn left, turn right
Slide. Move up, down, slide left, slide right
Fly. Like walk, but gravity does not pull you down.
Examine. Turn up/tilt back, turn down/tilt forward, turn left, turn right. (Some strange correcting behavior --> straightening up again.)(Ah, probably examining center of world, which is why behavior is strange. Learned after reading about Cortona)
Pan. Like examine, but works as expected. Limited to normal head motion--tilt stops at not quite straight up and not quite straight down. Turning to the side is unlimited.
Jump. Click on a space/object and avatar will smoothly slide there.

Mouse turns to hand and can click on objects. Possible it include avatar in view (3rd person vs. 1st person). Supports viewpoints--designer selected locations of note in a world.

Holding down Control allows for panning up and down and sliding left and right. (In Walk mode?) Shift increases speed. Has an interesting feature that movement is slow for first second, then faster after that, which allows for fine/delicate movement when needed.

Version 5.1 doesn't support VRML1.0. Need 4.4 for that.

Parallel Graphic's Cortona. Of these browsers, this is the most frequently downloaded from CNet's Downloads.com. Keyboard or mouse (again, only 4 directions/buttons at a time.) 3 Modes: walk, fly, study. 4 options for each mode: plan, pan, turn, roll. (I had to learn this from the manual--all 7 buttons are along left side; though spaced and labelled, that doesn't help much in determining how they work combined.) (Interesting that these are so similar to blaxxun's Contact. Do they know something I don't?). This means there are 12 possible combination. I'd like to sound knowledgeable here and tell you how it all logically works, but I didn't spend the time to figure it out. I just skimmed the page. Besides these 12 "states", pressing space, alt, control, or shift during these changes speed and possibly functionality.

Study is used to let you move the world/object, rather than yourself. This can be confusing because you can roll around the center of some object, which is often the center of the world. When the world is moving, it's not clear that you're not moving yourself, except for certain slight weirdnesses.

Other buttons: restore (to original entry state), align (get back to vertical after you've messed yourself up), goto (like blaxxen's jump), and cycling for viewpoints.

Can create XML skins that not only change the look, but also the functionality of controls! I think I can actually put new/more/different control buttons on the screen. Look further into this for prototyping (as an alternative to Java3D) Also a number of extensions available.

Tomb Raider. Not a browser, but a computer game. Much could be learned from first-person games, since the basic motion dynamics are the same. However, games are getting increasingly more complex to control (as can be seen by the evolution of game system controllers). So though they may be a source of innovative solutions, they may not be the most usable designs for novice users.

Tomb Raider follows the standard forward/back, turn-left/turn-right arrow controls. Vertical motions is achieved through climbing or jumping (which is not always possible in VRML worlds). There is also a step-left and step-right (which is equivalent to slide-left and slide-right). Holding down the look key means the view can move while the character (Lara Croft) does not. There is also a walk key to slow motion, and a roll key to quickly change direction. (Other specific keys allow for drawing a weapon, firing, accessing inventory, jumping, using/grasping, and lighting a flare.)

There are some interesting context changes in the game, where the same keys produce different motions. When climbing a vertical surface, the left and right arrows now slide the character left and right; up and down now move up and down, rather than forward and back. Yet the change is hardly noticeable because a vertical rock face only affords certain motions.

When swimming underwater, up and down now rotate up and down (to swim deeper or to swim up to the surface). Left and right still turn-left and turn-right. Forward motion is now controlled by the jump key; this makes some sense because Lara Croft has a pretty killer frog-kick/breaststroke. There is no backwards motion underwater.

Another interesting aspect is, with so many keys, it is possible to define your own key bindings. The game has two sets of key bindings: Default and User. Both come preset. Default is not editable; it uses the arrow keys, Alt, Ctl, Shift, Enter, etc. The User keys are set to use primarily the number keypad, though they can be changed to use any key. If the same action is bound to a different key in both the Default and User setting, User settings take precedence.

Reviewed Browsers

These browsers I didn't actually use, but examined their manuals or other documentation to learn how they deal with navigation.

Cosmo Player. Their caveat: Some VRML worlds are more movement oriented, others force object manipulation. Some have a default but allow the other form of motion.

2 major modes with three minor modes for each.

Movement: Go: move in any direction (vague description; I assume forward, back, turn left, turn right)
Slide: up down left right
Tilt: back-and-forth or side-to-side
Examine: Rotate the object
Pan the object left, right, up, down
Zoom in or out

Control or Alt switches modes within either Moment or Examine. Other controls: undo/redo last move, activate objects, (controlling movement can be tricky in a busy, activating world; use seek to go to an object without activating it), viewpoints (next, previous, list).

Adobe's Atmosphere. This is not actually a VRML browser, but one for Adobe's proprietary 3D format (typical). They have set up a test world based on the movie Dark City. The following is according to the PDF user manual.

With mouse or keyboard, default movement is: up, down, turn left, turn right.
Holding shift allows you to move: up or down, slide left and slide right.
(Moving up with gravity on means you fall back down when you stop moving.)
Holding control means you pan: turn up, down, left, or right.
(Note: this is the same mode setup as Cosmo. It sounds like a good one.)

Other buttons include: restart (back to your "home" world), back (return to the starting point of the current world), make gate (basically bookmark current position), and screen shot. The browser also supports chat and avatars.

Setable options: gravity (on/off), collision (on/off), acceleration, and max velocity. Acceleration only works with arrow keys.

UpperCut's WorldProbe. Not much information here. Their navigation controls screen shot implies 4 modes--walking, flying, viewing, and object handling, as well as a zoom and jump. No FAQs yet. This might be one to download and investigate further.

OpenWorld. An X3D browser. Available for integration into other applications. There does exist a stand-alone OpenWorld Horizons browser. Need to agree to a long license and sign up to download, so I didn't.

Eyematic's Shout3D. Not very informative web page. System requirements: 200MB disk space, 512MB memory, 700Mhz processor. More than I can deliver!

Interview and Browser Trials

Do you have any experience with online virtual reality or 3D worlds?
What sort of 1st person computer gaming experience do you have? (Wolfenstein 3D, Doom, Quake, Tomb Raider, Halflife, etc)
Are you interested in 3D worlds?
What controls do you think you would want in order to move through a 3D world (forward, back, turning left and right, etc.)

Browser trial tasks/format still under construction. Ideas so far:
--2 browsers with different controls, same world
--possibly a very brief verbal description of each browser's controls
--2 to 3 tasks to complete (in each browser)
--get users to describe how they are trying to complete the task
--a post-task discussion of what they liked and didn't like about the controls
--discuss alterative setups, modeless controls, etc.

Problem Themes

These are intended only to be a consolidation of important trends or points of the current state of affairs as discovered during data gathering. Their tradeoffs (if applicable) will be analyzed later.

Sometimes, as with small objects, you want to move the object. In immersive worlds, you usually want to move yourself.
Moving a scene, when you think you're moving yourself, is subtly confusing.
In every browser viewed so far, there are modes. The same controls are used, but produce different results.
Arrow keys allow for only 4 inputs. Strangely (?), the mouse does too. Though more may be possible, the four directions of a 2-dimensional plane are what's expected. The mouse does allow for speed control, however, depending on the distance the cursor is dragged.
As Stephenson pointed out, it's important to support realism. Probably why there is a difference between flying and walking modes.
In most browsers. you can switch off gravity and collisions. (Usually you can still fly up with gravity on. What's the difference there? I think it's that you fall back down when you stop moving up.)(Fly might also have to do with collisions.)
Most browsers include a jump. And a reset.
Worlds often include designer-designated viewpoints. In some worlds, movement is not permitted--only panning or viewing from these points.
In most browsers, moving is like walking with a stiff neck brace. Unlike the real world, you can not turn your head and view something to the side as you continue forward.
...

Synthesis and Analysis

Essential Task Analysis

I'm not sure whether this technically classifies as Essential Use Cases or Hierarchical Task Analysis, but I know what I want to do: look at the basic elements of motion in 3D space.

3d coordinate axes

As one can see, in three dimensions, there are 6 possible directions of motion along the 3 axes:

Along the x-axis: left or right
Along the y-axis: up or down
Along the z-axis: forward or backwards

Any point in the coordinate volume can be reached with a combination of these motions.

But in the real world, we also like to look around as we travel. Though we may be able to reach any point with the above 6 directional motions, we might not be able to see much when we get there--such as the backside of a large object. Thus, we also need rotations. If we rotate around each of the 3 axes, we come up with 6 possible rotations:

Around the y-axis: turning left or right (yaw)
Around the x-axis: leaning back or forwards (pitch)
Around the z-axis: leaning right or left (roll)

With these 12 movements and rotations, we can reach and view any point in a 3D space. They are essential. (Other functions, like realigning, jumping, reseting, etc., would be nice, but are shortcuts on these.)

directions of possible motion possible rotations

It should be noted, however, that when you add rotations, the 3 directions of motion are no longer essential. The fewest, barest-bones controls that would get you anywhere (and see anything from there) are only: forward, turn up (or down), turn right (or left). Anything beyond these just makes things easier.

Activity Claims

We know, from the Root Concept, that we will be using keyboard and/or mouse to control motion. Much of this discussion is too concrete to rightly deserve the title Activity Claims. Yet there is a smooth slide between Activity and Interaction Claims, so we'll keep things together here, though most specific key binding discussions are indented a bit.

Object motion (in addition to or instead of viewpoint motion)

+Sometimes, if the object is small, interaction makes more sense this way.
+Allows richer interaction, manipulating the environment directly
-Possible to move around an object and achieve the same views.
-Confusing if you are expecting viewpoint motion; object motion can move the whole world, with subtle and frustrating differences over viewpoint
-Currently in VRML (as far as I know) a user cannot move distinct objects within a world (the world is the object).

Other controls (in any format): jump, realign, cycle viewpoints, object mode, restore.

+Added ease and functionality
-Added screen/control clutter. (All of these are possible through basic controls, but not always easy.)

Simultaneously control view separate from motion

+More realistic, immersive motion
+Allows something as real-life simple as looking right and left while walking into a room
-Motions themselves might be tricky to implement
-Requires separate user controls (with sufficient feedback that users know they're simply changing the view, and not their avatar's orientation).
--User must be coordinated enough to work both controls--view and motion--simultaneously. (If they can't be used simultaneously, then changing view by changing orientation is good enough.)

Using keyboard keys as controls

+Keys are, for many, easier or faster to use
+Allows for more simultaneous actions. (You can only drag a mouse in one direction at a time; you can press a number of different keys at the same time.)
-Not always clear which keys match with which actions
-Mouse can control speed by drag speed; also include different mouse buttons to control context.
-Mouse is probably the first method users will try to interact with they system.

Arrow keys

+First keys most people will try when attempting to invoke movement
-Arrow keys alone can only control 2-dimensions at a time, as illustrated by these common game combos:

can be forward/back (left/right along the screen), up/down for controlling a simple character in a cross-section, 2d world
can be forward/back, turn left/turn right for an "over the shoulder" or "1st person" view of a walking character
can be stick forward/back (which corresponds to nose down/up), bank left/right for a flight simulator where a single key can be bound to nearly any motion, given the right context.

-3d world has need for 3 pairs/axes of motions, rather than 2 pairs/axes

Switching arrow key contexts with Alt, Ctrl, and Shift keys

+Most existing browsers support it, implying it's a "natural" choice (at least among the software designer populations).
+Alt, Ctrl, and Shift normally change the behavior of other keys; probably the best choice for controlling context changes for a set of keys
+First buttons, after arrow keys, experience gamers would try
+Having to hold down an extra key (as compared to a press-release/toggle action) impresses awareness that another mode is being used
-Not very often used by novice computer users, who may hesitant to experiment with the strange, power-user buttons.
-There may be Windows Menu buttons, function buttons, etc in the same region of the keyboard that get in the way of clean use. (Personal note: it took me a long time to retrain my gaming fingers after Windows 95 and that stupid new menu button came out!)

The following are some of the more promising possible key combos. Normal means arrow keys without additional context; Alt, Ctrl, and Shift are the actions of arrow keys when those keys are also pressed.

Minimalist.
Normal: foward/rotate back (look up), turn-left/turn-right
+A variation on the "barest controls" necessary, as discovered in ETA; no other keys or contexts would be needed.
+Does support the most commonly used keys: forward, turn-left, turn-right.
-Theoretically interesting, but probably frustrating in practice. Over-rotating would require a 360 degree turn to get back again. Turning a 2d view of a 3d world is already rather disorienting. In short, novel and possible, but slow and irritating (like so many miniaturization efforts).

3d Browser Default.
Normal: forward/back, turn-left/turn-right
Ctrl: up/down, slide left/slide right
Alt: rotate forward (look down)/rotate back (look up), roll left/roll right
Shift: speed increase

+Most commonly used by reviewed browsers (with some small variation, such as whether roll-left/roll-right is instead turn-left/turn-right, or which state is invoked with Ctrl or Shift). Even if this is not the wisest choice for all users, it would meet expectations of 3d browser users.
+Normal keys correspond to the keys most frequently used in a 1st person perspective
+Ctrl invokes two (rather unnatural) forms of motion dealing with motion in the xy plane
+The Alt key, as an alternative to motion, deals more with changing orientation or view.
+Shift is usually "bigger" or more, and so seems to fit well with speed increases
-Roll is probably not used that often, so a user trying "look" around would need to alternate between Normal and Alt key sets. (Roll could be dropped for a repeat of turn-left/turn-right).
-Though there might be slight mnemonics to the key sets, they still need to be learned.
-Alone, like all key bindings, they lack much affordance or learning aids.

Variations.

Minimalist: have a different key, such as spacebar, move forward; arrow keys all look.
Browser Default: basically any combination of the 12 basic actions discovered in ETA, broken into 3 sets of 4.
Browser Default: add more contexts. Ctrl+Alt would make a good flight sim mode: automatically move forward, with yaw up/down and roll left/right. (As far as I know, no browser has implemented a flight mode such as this; "fly" usually just means gravity can be overcome and the avatar can be moved vertically.

Number pad: +A good, close-second alternative to straight arrow keys
+A number of other possible key bindings in close proximity (1,3,7,9,0, +, -, *, ., etc).
+12 essential actions can be encoded without use of context keys
+Can use Page Up or Page Down for motion up or down
-Not all computers (ie, laptops) have separate number pads
-Number pad configurations are not universal.
-While Page Up and Page Down do have a sort of mnemonic to them, they are not necessarily clear at first
-Keypads require a certain amount of coordination not inherent to most users.
-Keypads are not usually fully explored when trying out new software (At least, I don't usually, unless prompted in that direction.)

The following is one promising, example number pad key binding set.

-
Roll back/
Look up

7
Slide-left 8
Forward 9
Slide-right +
Look down/
roll forward
4
Turn-left 5
Up 6
Turn-right

1
Roll-left 2
Back 3
Roll-right

0
Down

Single key-to-action mappings: +(Once learned) allows for a one-to-one control-to-action correspondence, which can be helpful to learn a system
+The action corresponding to that key is always available because that key is always available (unlike some of the very context-specific controls of the reviewed browsers)
-Not "intuitive": people are used to moving the view with arrow keys, not with letters or numbers.
-Certain actions may not be available in certain contexts; can't "grey-out" a keyboard key
-No aids to learn the key combos.

The following is one example of this sort of binding. (The keys are grouped more by use than by similarly function; navigation is primarily through the right hand.)

W = look up/rotate back
S = look down/rotate forward
A= slide left
D = slide right
Q= roll left
E= roll right
I = forward
K = back
J = turn left
L = turn right
U = up
M = down

As shown by this real-world example, this can be clumsy and unintuitive. It would be workable really only if the user picked the keys herself.

Redefinable keys: +Customizable per user to fit their personal preferences
-Does not negate the need for a good default setting (If we as designers can't come up with a decent set of key-binding, how can a novice user?)
-Users have to understand the system and the action keys correspond to before they can reassign them.
Screen controls: +Usually the first method to attempt controlling an application (if the controls are clearly visible and appear to be controls)
+Allows more feedback (greying-out, depressed "button" graphics, etc.)
-Keyboard keys are frequently a faster form of control
-The presence of screen controls do not normally encourage users to also experiment with the keyboard
-Takes up screen real estate, reducing the possible size of the view of the 3d world
-Requires mouse dexterity, especially if controls are small

One idea for screen controls:

2x3.
12 buttons arranged into 2 3D axes--basically as shown above in ETA--where one set of axes controls the 6 possible rotations while the other controls the 6 possible motion directions.
+Implied by and clearly supports Essential Tasks discovered above
+A simple toggle switch between motion and object-manipulation modes would be possible, since the controls would apply equally well to both modes.
-Can we graphically depict 3d controls clearly enough to be recognized and understood?
-Correspondence to keyboard controls would be difficult (though not impossible, if we go with a one-to-one key-to-screen-control binding)
-Normal navigation must switch between the two: forward, and turn-left/turn-right are on two different axis.

One-to-one key-to-screen-control mappings: +Labeling a screen control with the corresponding keyboard key control would make it clear that another method for the same action is possible
+Feedback from key presses would be possible (highlighting the corresponding screen control on a keypress, for instance)
-Labelling with key names clutters the screen controls
-Screen controls such as scroll bars, radio buttons, etc do not always map cleanly to keyboard controls
-If screen controls are supposed to serve as a guide it keys, then it means controls should be organized similar to key bindings.
Listing key bindings in pop-up, mouse-over tool tip: +Saves screen space
+Reduces information overload and a cluttered appearance
-Requires user action to discover the tip. (Tool-tip = half-way between a full screen control and being documented in the Help section.)
List non-navigation items--help, exit, options, settings, etc.--in a menu: +Saves screen space, since most are only used occasionally
+If in right-click menu, accessible in fullscreen too
-Requires hunting through a menu hierarchy to find
-If in a right-click menu, it might not be found at all
Putting fullscreen in a menu rather than on screen controls: +Once you've gone to fullscreen, there's no control to get back unless you've remembered the key combo.Dangerous to offer users such one way trips where the return trip is through a different route.
+The menu button (or right-click menu) is likely to be the only button/feature available in Fullscreen mode. The toggle to turn it off should be there too.
+saves screen space for more frequently used controls
+Harder to find as a feature (for the unwary).
-Harder to find as a feature (for those who want it)
-Many other mode buttons are on the screen; why break the continuity?

Social Contexts and User Roles

In general, browsing a 3D world is not a social activity. Navigation is not usually collaborative, though your friend might be watching over your shoulder telling you where to go.

Within some worlds, there are social aspects. Avatars can interact with each other through gestures or even engage in some form of combat. Users may need to change their avatars for different worlds. Sometimes this is simply for technical reasons, such as their avatar it too big to fit through a door. But other times, they may simply want to be represented differently depending who they are interacting with. (As Neal Stephenson points on in Snowcrash, a 6-foot, walking, talking penis is not the avatar of choice in all circles.)

Many worlds now support chat as well, which is certainly a social activity. Like websites, some worlds are primarily places for interaction with other users, while other worlds are simply meant to be explored.

There doesn't seem to be a big difference in user roles either. Of course, there will novice or casual users who need a working, easy browser quickly for occasional uses. Expert or power user may desire more customization, such as updating key bindings. World authors may also want additional features, such as being able to walk through walls which normally cannot be walked through.

Other user groups may be minors whose parents would like to restrict access to adult-oriented worlds. Different users speak different languages, which may also be a concern. Besides language differences, keyboards are often different around the world, especially with respect to letters available and their location on the keyboard. This could be important for some default key-bindings.

Activity Scenario

This is an activity scenario intended to convey what this sort of software should allow, independent of actual controls.

Bob has just entered a fantasy world. In the distance through the palm trees he sees a strange shape. It looks like a statue of a rabbit. He moves forward in that direction, skirting around a palm tree on the way. When he gets to the foot of the statue, he can no longer see the top of the statue in his field of vision. He looks up. Hmm, maybe it's a different animal, like a kangaroo. Looks like he's standing on a broken clock or a jukebox. But what's that he's holding? A ball? Bob wonders if he can zoom in his vision, like some sort of bionic man, but he doesn't know how. Instead, he decides he'll fly up closer to the "ball" and see what it is. He does so, and discovers that the kangaroo is actually holding two.. acorns? No, maybe they are alarm clock bells. "Curiouser and curiouser," says Bob.

An illustrative screen shot of the Cortona browser at http://www.auzgnosis.com/vrml/anzac/kangab_6.wrl

Design

Abstract Prototype

Shown here are the two interaction contexts (content models). Navigation between the two contexts is achieved through Motion/Object toggle control.

The actions listed here are those that need to be supported through some means. For a discussion of how and why, see the next section.

Weighing the Tradeoffs

Our primary concern, from the beginning, has been determining the nature of the controls. The view of the world already provides much of the feedback about the current state of the application. (If there are navigation modes or other control states, they will require additional feedback.) Now that we've delimited the main possibilities in Activity Claims, we can compare many of the tradeoffs.

I think we need screen controls. Some people are primarily mouse-oriented. Also, it allows for control feedback not possible through the keyboard. It conveys the types of motion or rotation possible. The disadvantages can be outweighed by also including keyboard controls and by a fullscreen mode in which the screen controls are removed (for those who greatly prefer screen real estate over visible controls).

[Actually, now that I think about it, I don't think any of the reviewed browsers had screen controls for motion! Navigation was possible with the mouse only by dragging it across the world view. Dragging in each of the four cardinal directions corresponded to motions invoked by each of the four arrow keys. But there were no buttons for motion, only for mode selection! I'm surprised this as escaped my attention this long. Certainly, screen controls would make our browser innovative compared to the existing browsers.]

There are two major possibilities for the correspondence of keyboard controls to screen controls. Either there can be mapping or the two can be independent of each other.

If the two control sets are mapped, the screen controls should be grouped so as to demonstrate how they map to the keyboard. For example, if the 3d Default Browser keys are being used, the screen controls could consist of three sets of 4 arrows, each laballed with the corresponding context button. The 12 arrows would be clickable.

Admittedly, this idea does not work as well for other key bindings. I would not recommend placing screen controls in the shape of a number keypad or in the letter groups demonstrated under Single key-to-action Mappings. Also, these controls do not map cleanly to Object manipulation.
Independent controls means there is no correspondence between the screen and the keyboard. This would allow for the 2x3 screen control design that maps directly and simply to the 12 basic tasks. This seems the clearest screen control set, but not the easiest to use, especially in the long run, since normal walking motion requires switching between the two. The keyboard controls could be any of explored possibilites.
A compromise: implement the 2x3 design, but then label each of the 12 controls with the key that handles that control: "Ctrl + ->", "E", or "9". Ideally, these labels would change to reflect any user updates. This would allow the clarity of the 2x3 controls, but still allow for a (sort of) mapping to easier-to-use arrow controls.

The drawback here in that the controls appear quite cluttered. This could be minimized somewhat by a better graphic artist. (I could use some help with the axes too!). Or things could be simplified further by putting the key bindings into tool-tips that pop up when the mouse hovers over the buttons.

An object mode exists in practically every reviewed browser, which implies it is more vital than I suspect. (I think some VRML worlds can specify that only Object controls be used.) The major disadvantage to including it is that it is confusing to new users. Changing the screen control labels, or the color, or some other rather noticable change between modes would eliminate this. It should be implemented.

On the same lines, the "extra" controls of realign, jump, cycle viewpoints, etc should be included for their convenience.

On "view independent of motion or orientation", since these would necessarily use separate controls than those that control orientation and motion, they could be added later. At this point, they add complication to the basic to task of moving in a world. Considering our current target group of novice or casual users, we should shelve this option for now.

Social contexts and user roles were more prevalent than first suspected. Chat would be particularly helpful. However, this too could be added later with little change. An always-on-top chat screen would be best kept separate from the screen control section, so it could be used in fullscreen mode and moved or resized as the user prefers. This could be opened by a button or a key combo or by clicking on another avatar. But for now, this should not be a supported feature.

Other unconsidered features continue to arise. Specifically:

Settings, such as for VRML's on/off navigation toggles for gravity, collisions, and terrain-following, or for lighting effects, such as headlight, or for avatar aspects.
Wizard/screen to redefine the keyboard key binding.
Help documenation
Fullscreen toggle
Other, yet-unrealized, mostly-necessary but infrequently-used odds and ends.

These should be in a menu, since they are not used often enought to deserve their own controls.

Two Possible Designs

So, from the Abstract Prototype and Weighing the Tradeoffs, we've learned we need the following controls in both contexts:

Realign
Goto
Cycle Viewpoints (left)
Cycle Viewpoints (right)
Object/Motion toggle

Since then, I've realized we should also have:

Reset
Viewpoint List
Menu (button and right-click on world view)

Additionally, we should support mouse drags. By that, I mean the user can click the center of the world view and drag the mouse towards the edge. The distance from center determines the speed. The edge determines the directions: dragging towards the top of the screen is like pressing the "up" arrow, dragging to the right side is like pressing "right", etc.

On the actual controls, the two most promising are the 2x3 screen controls and the 3d Browser Default (which is never actually seen in a 3d browser). Both could use pop-up tool-tips to further clarify which keyboard controls map to which screen controls. Both could have their keyboard controls overridden by user-defined keys. (The browser could even come with some preset "user" assignments, such as the number pad and single key-to-action letter mapping explored above.)

Here are mockup prototypes of the two proposed designs.

Conclusions

Though I started out favoring a scenario-based approach, I seem to have switched to a very structured, traditional paradigm. The structure and logical progression just appeal to me, even though I know design is supposed to be more iterative than deductive. Here are a few thoughts on the techniques used in this project.

I thought the browser review was a great tool. Though it isn't really mentioned in any of the literature, reviewing other similar solutions allows a designer to note design pitfalls, to try different prototypes before even starting to design, to find out what already works and how well, and what is lacking everywhere else.
Problem claims are a nice way to sum up problems I would like to deal with later, requirements for the future system, etc. There were not developed very extensively here. I don't think iteration happens much in actual structural design. In that case, I should have done interviewing early on, as originally planned, in order to get more data at this stage of the design.
Essential Task Analysis--an odd hybrid of HTA and EUC--worked very well. It was probably the single most useful tool in this project, though the write up is short. It gave the essential tasks that need to be implemented. This wouldn't be as helpful for all projects, of course.
I love activity claims. These are a great brainstorming technique. I can work on just record ideas, with tradeoffs for each one. Then later I can switch modes to figure out how to best combine them in a design. I find the more brainstorming, the better. With a lot of options laid out, activity claims become almost a rational, deductive method of fitting the pieces together.
The contexts and user roles were important to think about, though I neglected to implement anything from them in this project.
I think scenarios are good for quickly summing up a bit of a design for a layperson, but for actual design purposes I prefer charts and lists of claims to clealy delineate all the factors involved.
I like abstract prototypes because it produces an interface skeleton directly from the essential tasks. There's a certain automatic, logical progression to it.
During the rest of the Design section, I don't think I followed any real method, but this was still a very important section. With all the data and tradeoffs, it was important to pull things together and actually make some documented choices between options.
Towards the end, lots of little "forgotten" features started appearing--where to put the help button, how to organize the menu, how to change Options, etc. It was very hard to go back and iterate over the whole design to include these. The temptation was just to stick them in wherever at the stage they appear (usually prototype or implementation!).
I'm glad I considered the future needs--chat, fullscreen mode, independent view control, etc.--and how they could be added the current system. Yet I still shelved these features in favor of producing a lean, simple product that focuses on the main uses of a 3d browser.

Overall, I think my biggest fault as a designer is my general aversion to iterating over a design. A close second is an inability to stay on a schedule.

			- Roll back/ Look up
7 Slide-left	8 Forward	9 Slide-right	+ Look down/ roll forward
4 Turn-left	5 Up	6 Turn-right	+ Look down/ roll forward
1 Roll-left	2 Back	3 Roll-right
0 Down