From: Zach Tomaszewski <zach@areteproductions.com>
To: "Zeus" <zeus@areteproductions.com>
Sent: Thu, 17 Oct 2002 13:55:53 -1000 (HST)
Subject: Project idea: VRML browser
Zeus--
I've got a new idea for you. The other day I was exploring a few VRML (Virtual Reality Modelling Language) pages on the Web. I downloaded a couple different browser plug-ins. They're all pretty hard to use. I think we should look into developing either a full VRML browser or at least a plug-in.
The popular conception of the future of computing is of a 3D environment, and developers continue to push into this area. But usually you need special hardware--a 3D mouse, VR goggles, power gloves, etc. Yet I don't think people buy this hardware because the payoff is too small--there just isn't enough 3D software out there to make it worth the expense. And yet how can developers really explore innovative 3D solutions if no one is outfitted to use them?
I think we should position ourselves to be part of the bridge. There are some existing 3D stuff out there, such as the VRML sites on the Web. If we can produce a usable browser for these that uses only current technology, we could be part of any 3D revolution, should it come. (Even if it doesn't sweep the world in general by storm, the small group of VRML users still need a decent browser!)
Obviously if we're going to produce a more usable product than what already exists, we'll have to do a lot careful HCI design and usability testing along the way. I'm thinking of following Rosson and Carroll's scenario-based design, with a slight lean towards essential use cases rather than just scenarios.
I believe Sally's finishing up the Reynold's project next week. If so, even though Arete Production is still a small company, I think we'll have enough man-hours to undertake this in a relatively timely manner.
I hope you're interested. Later this week, I'll send you a more formal development plan.
Zach.
From: Zach Tomaszewski <zach@areteproductions.com>
To: "Zeus" <zeus@areteproductions.com>
Sent: Thu, 31 Oct 2002 16:18:36 -1000 (HST)
Subject: VRML browser--getting started
Zeus--
I have the root concept and design plan up. (I'm straying further away from SBD toward usage-centered design.) I've reviewed a number of browsers and am starting to select a couple to run artifact analysis/usability tests with. Also, I've sketched out an interview guide.
For next time, I still need to check out a couple browsers in more detail, especially Cosmo and WorldProbe. And I need to find some subjects for the browser run and interview. I've already started on some thinking about the tasks needed for a 3D browser.
Zach.
From: Zach Tomaszewski <zach@areteproductions.com>
To: "Zeus" <zeus@areteproductions.com>
Date: Sun, 5 Jan 2003 21:30:42 +0400
Subject: VRML browser--stewing on it
Zeus--
It's been a while! Sorry about that. Hope you had a good holiday, and all the best in the New Year!
I've moved on to the analysis and synthesis stage of things. Instead of doing a browser run now, I'm pushing it back to a comparative usability test of our new browser verses one of the popular big boys. Instead of looking in more detail at the couple 3d browsers, I looked a little wider and reviewed a few immersive gaming environments. A review of Tomb Raider made the browser list; their control changes based on context and their user-defined controls are interesting prospects.
The problem themes are still a little sparse, but the usability test might give us a bit more there. (This whole process theoretically allows such iteration, though I have my doubts about that.) I've made a bit of progress on the technical details of the prototype implementation, and it looks like that will be a "go", though it'll be a few more days before we know just what to try implementing.
The main product of this session was a load of activity claims, which are straying into the area of interaction claims. I've found the brainstorming quite helpful. Social contexts and user roles are more prevalent than I first imagined, but still probably won't be a big part in the first version of the browser. I threw in an activity scenario too, in case we need to share the sort of issues we're considering with the uninitiated.
The future step is working through those tradeoffs and picking an implementation--the task we want, the controls that support them, and in which contexts--that sounds like it will work. I'll probably double check things by doing an abstract context or two for the essential tasks.
More soon,
Zach.
From: Zach Tomaszewski <zach@areteproductions.com>
To: "Zeus" <zeus@areteproductions.com>
Date: Sun, 10 Jan 2003 3:21:47 +0400
Subject: VRML browser--consolidating
Zeus--
Is there a word for that action of scooping all the playing cards together after a game and jostling them all together into a single neat pile before you can start shuffling?
Basically, this session I consolidated things down to what needs to get put into the design/prototype. There was some reiteration as I discovered a few features I'd overlooked; they've been added to Activity Claims. (Tough the further along in the project I get, the less I want to iterate.)
The most interesting realization is that I can't remember any VRML browser I've seen ever having clickable navigation buttons that control motion or orientation. All allow keyboard control, most allow control by dragging the mouse across the screen, and most have buttons toggling the navigation mode (which effects both mouse drags and the arrow keys). But I can't recall any having screen buttons that actually control motion. It's a "deficiency" I'm keen on exploiting. The downside is that I'm not sure if the Cortona XML skin plan for the prototype will still hold up, since mostly the skin maps buttons to modes. We'll see.
At this point, I am realizing that I don't know much about VRML itself. It's hard to design a usable browser if its controls are of a different paradigm than the underlying protocol for motion. This personal "limitation" was recognized at the outset in the Root Concept, so I knew it was coming. The flipside has been designing with a "beginner's mind." But as we move into Design and Prototyping, it's making me a little nervous.
I did check out what the VRML97 specs have to say about Navigation modes. It mentions 5:
- Any (allows user to specify dynamically)
- Walk (collision detection, terrain following, and gravity)
- Fly (collision detection; terrain-following and gravity may be ignored)
- Examine (viewing an object, with optional rotation around the object, etc)
- None (removes all navigation and forces the user to use navigation--such as links--provided by the scene)
What's intereresting is how many browsers have adopted these modes in their interfaces. Actually, it seems these only deal with how the browser handles collision, terrain, and gravity. So they can largely be dealt with behind the scenes (though we may need to grey-out certain options, such as "up" when gravity should not be overcome. It seems the control scheme(s) I've designed remains unaffected.
Anyway, at this point I now I have two prototypes. I want to run some basic usability tests on them and clarify them in the next (and final) stage.
This is the current design plan. It is subject to change. The table of contents will probably be the best indicator of the final path taken.
Most of the browsers below are from Web3D Consortium's list of browsers.
These browsers I actually installed and used to explore or manipulate a 3D world
Actify. Mostly deals with CAD. Handles some VRML. Free plug-in. Only handles object manipulation, not movement of the view through a world. In one mode, the user can grab and move object around in place with the mouse. In pan mode, the user can drag the object up, down, left, or right on the screen. Also has zoom controls.
blaxxun Contact. These people have been severely influenced by Neal Stephenson's virtual cyberpunk world, Metaverse, from Snow Crash. They even have the Black Sun club in their online world Cybertown. And so their browser supports avatars, gestures, chat, etc.
Can move with the mouse or keyboard. I find keyboard easier (due to excessive old-school computer gaming?) . With mouse, click and drag in a direction to move/turn that way. Either way, there are four cardinal 2D directions to choose from.
There is a right-click menu to select mode (also key combinations):
Mouse turns to hand and can click on objects. Possible it include avatar in view (3rd person vs. 1st person). Supports viewpoints--designer selected locations of note in a world.
Holding down Control allows for panning up and down and sliding left and right. (In Walk mode?) Shift increases speed. Has an interesting feature that movement is slow for first second, then faster after that, which allows for fine/delicate movement when needed.
Version 5.1 doesn't support VRML1.0. Need 4.4 for that.
Parallel Graphic's Cortona. Of these browsers, this is the most frequently downloaded from CNet's Downloads.com. Keyboard or mouse (again, only 4 directions/buttons at a time.)
3 Modes: walk, fly, study. 4 options for each mode: plan, pan, turn, roll. (I had to learn this from the manual--all 7 buttons are along left side; though spaced and labelled, that doesn't help much in determining how they work combined.)
(Interesting that these are so similar to blaxxun's Contact. Do they know something I don't?). This means there are 12 possible combination. I'd like to sound knowledgeable here and tell you how it all logically works, but I didn't spend the time to figure it out. I just skimmed the page. Besides these 12 "states", pressing space, alt, control, or shift during these changes speed and possibly functionality.
Study is used to let you move the world/object, rather than yourself. This can be confusing because you can roll around the center of some object, which is often the center of the world. When the world is moving, it's not clear that you're not moving yourself, except for certain slight weirdnesses.
Other buttons: restore (to original entry state), align (get back to vertical after you've messed yourself up), goto (like blaxxen's jump), and cycling for viewpoints.
Can create XML skins that not only change the look, but also the functionality of controls! I think I can actually put new/more/different control buttons on the screen. Look further into this for prototyping (as an alternative to Java3D) Also a number of extensions available.
Tomb Raider. Not a browser, but a computer game. Much could be learned from first-person games, since the basic motion dynamics are the same. However, games are getting increasingly more complex to control (as can be seen by the evolution of game system controllers). So though they may be a source of innovative solutions, they may not be the most usable designs for novice users.
Tomb Raider follows the standard forward/back, turn-left/turn-right arrow controls. Vertical motions is achieved through climbing or jumping (which is not always possible in VRML worlds). There is also a step-left and step-right (which is equivalent to slide-left and slide-right). Holding down the look key means the view can move while the character (Lara Croft) does not. There is also a walk key to slow motion, and a roll key to quickly change direction. (Other specific keys allow for drawing a weapon, firing, accessing inventory, jumping, using/grasping, and lighting a flare.)
There are some interesting context changes in the game, where the same keys produce different motions. When climbing a vertical surface, the left and right arrows now slide the character left and right; up and down now move up and down, rather than forward and back. Yet the change is hardly noticeable because a vertical rock face only affords certain motions.
When swimming underwater, up and down now rotate up and down (to swim deeper or to swim up to the surface). Left and right still turn-left and turn-right. Forward motion is now controlled by the jump key; this makes some sense because Lara Croft has a pretty killer frog-kick/breaststroke. There is no backwards motion underwater.
Another interesting aspect is, with so many keys, it is possible to define your own key bindings. The game has two sets of key bindings: Default and User. Both come preset. Default is not editable; it uses the arrow keys, Alt, Ctl, Shift, Enter, etc. The User keys are set to use primarily the number keypad, though they can be changed to use any key. If the same action is bound to a different key in both the Default and User setting, User settings take precedence.
These browsers I didn't actually use, but examined their manuals or other documentation to learn how they deal with navigation.
Cosmo Player. Their caveat: Some VRML worlds are more movement oriented, others force object manipulation. Some have a default but allow the other form of motion.
2 major modes with three minor modes for each.
Control or Alt switches modes within either Moment or Examine. Other controls: undo/redo last move, activate objects, (controlling movement can be tricky in a busy, activating world; use seek to go to an object without activating it), viewpoints (next, previous, list).
Adobe's Atmosphere. This is not actually a VRML browser, but one for Adobe's proprietary 3D format (typical). They have set up a test world based on the movie Dark City. The following is according to the PDF user manual.
Other buttons include: restart (back to your "home" world), back (return to the starting point of the current world), make gate (basically bookmark current position), and screen shot. The browser also supports chat and avatars.
Setable options: gravity (on/off), collision (on/off), acceleration, and max velocity. Acceleration only works with arrow keys.
UpperCut's WorldProbe. Not much information here. Their navigation controls screen shot implies 4 modes--walking, flying, viewing, and object handling, as well as a zoom and jump. No FAQs yet. This might be one to download and investigate further.
OpenWorld. An X3D browser. Available for integration into other applications. There does exist a stand-alone OpenWorld Horizons browser. Need to agree to a long license and sign up to download, so I didn't.
Eyematic's Shout3D. Not very informative web page. System requirements: 200MB disk space, 512MB memory, 700Mhz processor. More than I can deliver!
Browser trial tasks/format still under construction. Ideas so far:
--2 browsers with different controls, same world
--possibly a very brief verbal description of each browser's controls
--2 to 3 tasks to complete (in each browser)
--get users to describe how they are trying to complete the task
--a post-task discussion of what they liked and didn't like about the controls
--discuss alterative setups, modeless controls, etc.
These are intended only to be a consolidation of important trends or points of the current state of affairs as discovered during data gathering. Their tradeoffs (if applicable) will be analyzed later.
I'm not sure whether this technically classifies as Essential Use Cases or Hierarchical Task Analysis, but I know what I want to do: look at the basic elements of motion in 3D space.
As one can see, in three dimensions, there are 6 possible directions of motion along the 3 axes:
Any point in the coordinate volume can be reached with a combination of these motions.
But in the real world, we also like to look around as we travel. Though we may be able to reach any point with the above 6 directional motions, we might not be able to see much when we get there--such as the backside of a large object. Thus, we also need rotations. If we rotate around each of the 3 axes, we come up with 6 possible rotations:
With these 12 movements and rotations, we can reach and view any point in a 3D space. They are essential. (Other functions, like realigning, jumping, reseting, etc., would be nice, but are shortcuts on these.)
It should be noted, however, that when you add rotations, the 3 directions of motion are no longer essential. The fewest, barest-bones controls that would get you anywhere (and see anything from there) are only: forward, turn up (or down), turn right (or left). Anything beyond these just makes things easier.
We know, from the Root Concept, that we will be using keyboard and/or mouse to control motion. Much of this discussion is too concrete to rightly deserve the title Activity Claims. Yet there is a smooth slide between Activity and Interaction Claims, so we'll keep things together here, though most specific key binding discussions are indented a bit.
The following are some of the more promising possible key combos. Normal means arrow keys without additional context; Alt, Ctrl, and Shift are the actions of arrow keys when those keys are also pressed.
- Minimalist.
Normal: foward/rotate back (look up), turn-left/turn-right- +A variation on the "barest controls" necessary, as discovered in ETA; no other keys or contexts would be needed.
+Does support the most commonly used keys: forward, turn-left, turn-right.
-Theoretically interesting, but probably frustrating in practice. Over-rotating would require a 360 degree turn to get back again. Turning a 2d view of a 3d world is already rather disorienting. In short, novel and possible, but slow and irritating (like so many miniaturization efforts).
- 3d Browser Default.
Normal: forward/back, turn-left/turn-right
Ctrl: up/down, slide left/slide right
Alt: rotate forward (look down)/rotate back (look up), roll left/roll right
Shift: speed increase
- +Most commonly used by reviewed browsers (with some small variation, such as whether roll-left/roll-right is instead turn-left/turn-right, or which state is invoked with Ctrl or Shift). Even if this is not the wisest choice for all users, it would meet expectations of 3d browser users.
+Normal keys correspond to the keys most frequently used in a 1st person perspective
+Ctrl invokes two (rather unnatural) forms of motion dealing with motion in the xy plane
+The Alt key, as an alternative to motion, deals more with changing orientation or view.
+Shift is usually "bigger" or more, and so seems to fit well with speed increases
-Roll is probably not used that often, so a user trying "look" around would need to alternate between Normal and Alt key sets. (Roll could be dropped for a repeat of turn-left/turn-right).
-Though there might be slight mnemonics to the key sets, they still need to be learned.
-Alone, like all key bindings, they lack much affordance or learning aids.
- Variations.
- Minimalist: have a different key, such as spacebar, move forward; arrow keys all look.
- Browser Default: basically any combination of the 12 basic actions discovered in ETA, broken into 3 sets of 4.
- Browser Default: add more contexts. Ctrl+Alt would make a good flight sim mode: automatically move forward, with yaw up/down and roll left/right. (As far as I know, no browser has implemented a flight mode such as this; "fly" usually just means gravity can be overcome and the avatar can be moved vertically.
The following is one promising, example number pad key binding set.
-
Roll back/
Look up7
Slide-left8
Forward9
Slide-right+
Look down/
roll forward4
Turn-left5
Up6
Turn-right1
Roll-left2
Back3
Roll-right0
Down
The following is one example of this sort of binding. (The keys are grouped more by use than by similarly function; navigation is primarily through the right hand.)
W = look up/rotate back
S = look down/rotate forward
A= slide left
D = slide right
Q= roll left
E= roll right
I = forward
K = back
J = turn left
L = turn right
U = up
M = down
As shown by this real-world example, this can be clumsy and unintuitive. It would be workable really only if the user picked the keys herself.
One idea for screen controls:
- 2x3.
12 buttons arranged into 2 3D axes--basically as shown above in ETA--where one set of axes controls the 6 possible rotations while the other controls the 6 possible motion directions.- +Implied by and clearly supports Essential Tasks discovered above
+A simple toggle switch between motion and object-manipulation modes would be possible, since the controls would apply equally well to both modes.
-Can we graphically depict 3d controls clearly enough to be recognized and understood?
-Correspondence to keyboard controls would be difficult (though not impossible, if we go with a one-to-one key-to-screen-control binding)
-Normal navigation must switch between the two: forward, and turn-left/turn-right are on two different axis.
In general, browsing a 3D world is not a social activity. Navigation is not usually collaborative, though your friend might be watching over your shoulder telling you where to go.
Within some worlds, there are social aspects. Avatars can interact with each other through gestures or even engage in some form of combat. Users may need to change their avatars for different worlds. Sometimes this is simply for technical reasons, such as their avatar it too big to fit through a door. But other times, they may simply want to be represented differently depending who they are interacting with. (As Neal Stephenson points on in Snowcrash, a 6-foot, walking, talking penis is not the avatar of choice in all circles.)
Many worlds now support chat as well, which is certainly a social activity. Like websites, some worlds are primarily places for interaction with other users, while other worlds are simply meant to be explored.
There doesn't seem to be a big difference in user roles either. Of course, there will novice or casual users who need a working, easy browser quickly for occasional uses. Expert or power user may desire more customization, such as updating key bindings. World authors may also want additional features, such as being able to walk through walls which normally cannot be walked through.
Other user groups may be minors whose parents would like to restrict access to adult-oriented worlds. Different users speak different languages, which may also be a concern. Besides language differences, keyboards are often different around the world, especially with respect to letters available and their location on the keyboard. This could be important for some default key-bindings.
This is an activity scenario intended to convey what this sort of software should allow, independent of actual controls.
Bob has just entered a fantasy world. In the distance through the palm trees he sees a strange shape. It looks like a statue of a rabbit. He moves forward in that direction, skirting around a palm tree on the way. When he gets to the foot of the statue, he can no longer see the top of the statue in his field of vision. He looks up. Hmm, maybe it's a different animal, like a kangaroo. Looks like he's standing on a broken clock or a jukebox. But what's that he's holding? A ball? Bob wonders if he can zoom in his vision, like some sort of bionic man, but he doesn't know how. Instead, he decides he'll fly up closer to the "ball" and see what it is. He does so, and discovers that the kangaroo is actually holding two.. acorns? No, maybe they are alarm clock bells. "Curiouser and curiouser," says Bob.
An illustrative screen shot of the Cortona browser at http://www.auzgnosis.com/vrml/anzac/kangab_6.wrl
Shown here are the two interaction contexts (content models). Navigation between the two contexts is achieved through Motion/Object toggle control.
The actions listed here are those that need to be supported through some means. For a discussion of how and why, see the next section.
Our primary concern, from the beginning, has been determining the nature of the controls. The view of the world already provides much of the feedback about the current state of the application. (If there are navigation modes or other control states, they will require additional feedback.) Now that we've delimited the main possibilities in Activity Claims, we can compare many of the tradeoffs.
I think we need screen controls. Some people are primarily mouse-oriented. Also, it allows for control feedback not possible through the keyboard. It conveys the types of motion or rotation possible. The disadvantages can be outweighed by also including keyboard controls and by a fullscreen mode in which the screen controls are removed (for those who greatly prefer screen real estate over visible controls).
[Actually, now that I think about it, I don't think any of the reviewed browsers had screen controls for motion! Navigation was possible with the mouse only by dragging it across the world view. Dragging in each of the four cardinal directions corresponded to motions invoked by each of the four arrow keys. But there were no buttons for motion, only for mode selection! I'm surprised this as escaped my attention this long. Certainly, screen controls would make our browser innovative compared to the existing browsers.]
There are two major possibilities for the correspondence of keyboard controls to screen controls. Either there can be mapping or the two can be independent of each other.
An object mode exists in practically every reviewed browser, which implies it is more vital than I suspect. (I think some VRML worlds can specify that only Object controls be used.) The major disadvantage to including it is that it is confusing to new users. Changing the screen control labels, or the color, or some other rather noticable change between modes would eliminate this. It should be implemented.
On the same lines, the "extra" controls of realign, jump, cycle viewpoints, etc should be included for their convenience.
On "view independent of motion or orientation", since these would necessarily use separate controls than those that control orientation and motion, they could be added later. At this point, they add complication to the basic to task of moving in a world. Considering our current target group of novice or casual users, we should shelve this option for now.
Social contexts and user roles were more prevalent than first suspected. Chat would be particularly helpful. However, this too could be added later with little change. An always-on-top chat screen would be best kept separate from the screen control section, so it could be used in fullscreen mode and moved or resized as the user prefers. This could be opened by a button or a key combo or by clicking on another avatar. But for now, this should not be a supported feature.
Other unconsidered features continue to arise. Specifically:
These should be in a menu, since they are not used often enought to deserve their own controls.
So, from the Abstract Prototype and Weighing the Tradeoffs, we've learned we need the following controls in both contexts:
Since then, I've realized we should also have:
Additionally, we should support mouse drags. By that, I mean the user can click the center of the world view and drag the mouse towards the edge. The distance from center determines the speed. The edge determines the directions: dragging towards the top of the screen is like pressing the "up" arrow, dragging to the right side is like pressing "right", etc.
On the actual controls, the two most promising are the 2x3 screen controls and the 3d Browser Default (which is never actually seen in a 3d browser). Both could use pop-up tool-tips to further clarify which keyboard controls map to which screen controls. Both could have their keyboard controls overridden by user-defined keys. (The browser could even come with some preset "user" assignments, such as the number pad and single key-to-action letter mapping explored above.)
Here are mockup prototypes of the two proposed designs.
Though I started out favoring a scenario-based approach, I seem to have switched to a very structured, traditional paradigm. The structure and logical progression just appeal to me, even though I know design is supposed to be more iterative than deductive. Here are a few thoughts on the techniques used in this project.
Overall, I think my biggest fault as a designer is my general aversion to iterating over a design. A close second is an inability to stay on a schedule.
~ztomasze Index:
CIS: Project -- VRML Browser http://www2.hawaii.edu/~ztomasze |
Last Edited: 01 Apr 2003 ©2002 by Z. Tomaszewski. |