Exploring a New Paradigm and Tools for Wizard-of-Oz Experience Design

INTRODUCTION

As researchers, designers, and other creative professionals
look to move beyond desktop interaction, there are few
tools and proven methodologies to support human-centered
design. One tactical approach with a track record for
enabling the design of complex computing applications is
the Wizard of Oz (WOz) method, the practice of using
hidden operators to temporarily emulate unfinished parts of
a computing system during development [4].

However, I believe there are issues with how designers and
researchers approach the WOz method. The WOz method
tends to be viewed as a tool to be used once or twice to test
incomplete systems. Due to their temporary nature, wizard
interfaces are often designed as ad-hoc solutions, leading to
overly challenging wizard tasks and inconsistent system
performance during evaluative simulations. There are no
guidelines to help designers realize the potential, and avoid
the pitfalls, of this technique. More seriously, because the
development tools do not generally include explicit support
for WOz, it can be extremely difficult to tack on WOz
interfaces into a system, further hampering effective use of
the method. By embracing and encouraging the WOz
methodology early in the design process, and as the system
evolves through constant and rapid user testing, I see the
potential for dramatically changing the way interactive
computing systems are designed and evaluated.

The overall goals of my work are to first (i) identify
prospects and limitations of the WOz method by
conducting empirical studies, extensively reviewing the
related literature, interviewing experts at conducting WOz
studies, and gaining theoretical insight from sources such
as distributed cognition [7, 9]. My primary contribution
will be to (ii) articulate a conceptual framework for
WOz to help reveal possibilities and design guidelines for
the method, as well as, (iii) develop and evaluate practical
tools for rapid generation of WOz interfaces. I am
particularly interested in (iv) exploring the use of multiple
wizard operators within a single simulation to understand
issues of collective cognition, scale, and modularization,
and (v) reflecting on the iterative development process of
a complex interactive system in order to contribute to
theoretical knowledge on deliberate distribution and
redistribution of cognition among human operators and
software during design and development.

PROBLEMS WITH CURRENT USE OF “WIZARD OF OZ”

At a high level, this thesis addresses the difficulty of
designing and building complex interactive software
systems, a significant problem occupying the attention of
researchers in many disciplines including HCI, Design, and
Software Engineering. Speaking from the HCI perspective,
the best practice is often to obtain user feedback early in
the design process and to continue to get feedback
throughout the development process as the system evolves.
Evaluating systems before they are finished has vexed
researchers and developers for many years, resulting in new
techniques, such as the use of paper or video prototypes
and the Wizard of Oz (WOz) method. However, a survey of
prior use of the WOz method reveals that it has been
underutilized in at least two ways. I believe there is:

1) A narrow view of the role for the WOz method. Most
projects use a single human operator to simulate one part of
a computer system, such as natural language dialogue [2].
Positioning the wizard as a “sensor” is limiting, considering
the breadth of technologies pieced together in more
advanced computing applications and the capacity available
to human operators. Taking a broader view one could
define the WOz method as “having one or more human
operators simulate, moderate or supervise one or more
modules of a system.” Wizard interfaces are rarely used to
simulate multiple, distinct components of one application,
and wizards are not used for more understated roles, such
as mediating between users and partially implemented
components or supervising a complete system [4].

2) Limited use of WOz methods throughout a system’s
development. When the broader set of possible wizard roles
and activities mentioned above are considered, the
opportunities for leveraging the WOz method throughout
the development cycle are increased. The role of the wizard
is likely to change as the system evolves, from full
simulation early on, to supervision later in the development
cycle.

The underutilization of WOz can be attributed to some
extent to the difficulty in creating WOz interfaces, and in
particular in creating WOz interfaces that can be
maintained and kept working as the application evolves. A
more extensive treatment of the WOz method presents a
serious software engineering problem because current
development tools are not designed with WOz interfaces in
mind. Developers must consciously structure applications
for WOz interfaces and then deal with throwing away the
code after testing. Because of their complexity, these WOz
interfaces are not conceived as being part of the main
application, nor designed to integrate cleanly with other
application modules as they evolve through a design cycle.
Wizard interfaces are typically viewed as temporary code,
and are thus designed quickly to satisfy the minimum
requirements of the application test being performed.

Designers and researchers often spend little effort
designing wizard interfaces, leading to cognitively and
perceptually overloaded wizard operators. There are
numerous examples from the HCI literature and in our own
work with the WOz method where poor wizard interface
design can lead to sub-par wizard task performance (e.g.,
[2, 5]). Inconsistent, mistimed, or inaccurate wizard
performance can have a negative impact on the WOz
evaluation and the actual application being created.
Similarly, researchers and designers rarely (if ever) look
behind the curtain to analyze wizard activities and to
make appropriate adjustments to the Wizard interface. The
“curtain” here refers to the real or imagined barrier
separating the users from the wizards; “front of the curtain”
evaluation is what most people think of when doing WOz
evaluation. Due to time constraints and the designer’s
attention on the user experience, the wizards themselves are
not the focus of scrutiny, but their experience should be
evaluated if the outcome of the tests are to be trusted. For
example, if the wizard cannot perform their tasks reliably
and repeatably given the same users actions, conclusions
drawn across multiple users may be questionable.

Conversely, if a designer expends too much effort creating
the wizard interface, either to create an effective interface
or to integrate the wizard interface so it can evolve as the
application evolves, the value of rapid prototyping is
compromised. Not only can it slow down the design
process, there is the danger of the developers becoming
“attached” to their code, leading to resistance (explicit or
implicit) of making significant changes to the code.

The root problem and the primary need for more effective
application of the WOz method is that wizard interfaces,
and the internal application interfaces connecting the
wizard interfaces to the rest of the application, need to be
properly designed so that wizards can perform successfully
and so the WOz method can be applied repeatedly
throughout the design cycle. At the same time, such
interfaces must be possible with relatively minimal effort or
the WOz method will not be used, especially since the
effort to create WOz code is typically seen as an additional
cost above and beyond the development of the application
itself. The challenge is to provide support for widespread
and effective use of WOz during system development.

RESEARCH APPROACH

To address these problems, I am proposing to articulate a
conceptual framework for interactive system development
based on more pervasive use of the WOz method. To test
and refine my ideas, I will build a collection of WOz tools
on top of an existing development environment, and work
with application designers to attempt the expanded WOz
approach. Both the conceptual framework and the toolset
will be aimed at supporting effective use the WOz method
for rapid design, iteration and evaluation of evolving
system prototypes. As I have discussed above, the lack of
appropriate tools has largely prevented the WOz method
from being used use to its potential. But without the
appropriate conceptual framework, designers will neither
be aware of the potential WOz offers, nor understand how
to approach application development to facilitate effective
use of the WOz method.

A number of researchers have reflected on their experience
with WOz, pointing out important considerations [2,8].
Likewise, researchers in the HCI community have begun to
add WOz support within custom prototyping tools for
specific design contexts [11, 12], but have relegated the
wizard to a “sensor simulator” (a speech-recognizer or a
location sensor, in these examples), requiring that
application logic be created before an experience can be
tested. I propose a more general definition of the WOz
method, along with a conceptual framework and far-
reaching tools to demonstrate the power of using an
expanded WOz approach. The following sections describe
my research approach for gathering data, articulating a
conceptual framework, building WOz tool support, and
evaluating the work.

Preliminary Data Gathering

My initial goal is to assimilate knowledge about how WOz
has been accomplished by experts in design and research.
The information I collect will provide a strong foundation
for the technical directions I take with specific tools. I will
also seek to identify useful methods for evaluating the
performance of wizard activities. By explicitly looking
“behind the curtain”, I seek to understand the relationship
between wizard task performance, wizard cognitive load,
system implementation, and user experience.

Expert Interviews

Through my experience with the WOz method and an
extensive review of research cases in the community, I
perceive a vast range of approaches and issues with the
methodology. To supplement my current understanding and
to uncover WOz practices “in the wild”, I plan a series of
open-ended interviews with “expert” wizard designers and
practitioners. I will interview fifteen or more designers who
have used the WOz method to help facilitate a design or
research project. An ideal participant will be able to speak
to their experience as a designer (of both the user and
wizard interfaces) as well as a wizard (serving as the actual
hidden operator). Conducting the interviews near the WOz
setup will also provide an opportunity to probe the exact
technical setup and to ask for a demonstration. The goal of
these interviews will be to understand why, when, and how
designers choose to use WOz methods, above and beyond
the details reported in their research papers.

Empirically Analyzing Wizard Activities

To gain experience in analyzing activities behind the
curtain, I will utilize an existing application, the AR Façade
interactive drama [5], and create several variations of the
existing wizard task. Currently, in order to facilitate the AR
Façade experience, a wizard operator uses a remote
interface to complete two tasks: 1) listen and convert
players’ speech utterances into text and 2) watch a live
video feed of the player and press buttons when certain
physical gestures are used.

The wizard’s actions fill in for both speech and gesture
recognition software. The AR Façade system takes the text
and user gestures and translates them into “meaning”
within the natural language understanding (NLU) software
module. The NLU accomplishes a challenging task,
filtering a wide range of speech and gesture acts to a subset
of possible player intentions (such as “player agrees”) [14].
While one wizard can manage these tasks, it is not trivial.
Anecdotally, we noticed many errors by the wizard during
a previous experiment, both falsely identifying user
interaction and missing user gestures and speech. Since the
focus of our previous study was to understand the user
experience, we did not record anything happening behind
the curtain at the wizard station. As a follow-up study,
changing my focus to behind the curtain, I will study the
interaction of wizards as participants. The study will seek
to understand the impact of two factors on wizard
performance: using multiple wizards and changing the
nature of the wizard task from cognitive to perceptual.

To investigate these factors, I will develop several versions
of the wizard station. First, the tasks will be split across two
remote machines to allow two wizards to collaborate, each
performing one of the tasks (speech to text conversion and
gesture input). Second, I will modify the AR Façade code
base to change the nature of the wizard task. I will
essentially bypass the NLU software module and allow the
wizard (or multiple wizards) to directly select the discourse
acts in AR Façade, a distinctly more cognitive task than
speech to text conversion. This creates four separate wizard
task combinations with two different emulation tasks
(speech/gesture input vs. discourse selection) and either
working solely or collaboratively.

Each wizard participant will emulate the system with all
four types of interaction for real player participants so that
they can reflect on their experiences. Wizard activities will
be recorded during each trial (through video and interface
interaction logs). I will interview each wizard participant
before and after her experience as the wizard. The first
interview will establish each wizard participant’s pre-
understanding of her task and the technologies involved.
The second interview will reveal adaptation strategies used
during both the solo and collaborative versions of the
wizard task.

I will also talk to players to learn if they perceived a wizard
operator or noticed any differences in system performance.
This study will provide a novel dataset for understanding
the effect of cognitive load and collaboration on a wizard’s
ability to emulate a system. Analyzing the qualitative and
quantitative data from “behind the curtain” will provide
unique insights for a WOz conceptual framework.

Conceptual Framework

One outcome of this research will be to describe my notion
for expanded WOz prototyping and development. To some
extent these ideas originate from distributed cognition
(DCog), a branch of cognitive science conceived by
Hutchins, Hollan, and their collaborators after extensive
observation of human activity in the wild. Their claim is
that “cognitive processes may be distributed across the
members of a social group, …distributed in the sense that
the operation of the cognitive system involves coordination
between internal and external (material or environmental)
structure, and …distributed through time in such a way that
the products of earlier events can transform the nature of
later events” [9]. DCog and related theories may prove
helpful for analyzing how a system operates when
responsibility is shared by computing artifacts and one or
more human wizards. The DCog framework suggests a
more expanded use of WOz, allowing HCI researchers and
designers to take advantage of WOz methods throughout an
evolving design process, between multiple wizards, and
across multiple decomposed system modules or artifacts.

My thesis will attempt to articulate an abstract conceptual
framework so that designers and developers may
incorporate this approach. More practically, I will look at
compiling preliminary WOz findings into a useful index,
such as the design patterns framework [15], structured
around WOz scenarios and canonical examples. The goal
will be to create a practical online resource for anyone
preparing to use the Wizard-of-Oz method, while also
providing potential users with documentation and tutorials
for the tools I distribute. For example, there might be a
WOz design pattern for speech recognition and within that
category, variants for open-ended speech emulation vs.
predetermined utterance selection.

Development of WOz Tool Support

Currently, I envision WOz tools as two complimentary
components, a stand-alone WOz Generator application that
runs on a remote machine waiting for requests to generate a
specific wizard interface, and “hooks” that designers add to
their prototype to support this wizard interface generation.
Longer term, I could look to support a range of prototyping
environments, all using the same low-level protocol for
communicating to the WOz Generator and each other, but
the initial version will concentrate on one environment,
such as Macromedia Director (likely as an extension to
DART [13]). The key for hooking into the prototyping
environment is to require minimal effort with no significant
changes to a designer’s normal prototyping process.

One approach, which we currently use in DART, is to
dynamically monitor the prototype at run time, and create
WOz support based on the current configuration of the
system. In DART, we monitor the event-passing
subsystem, scanning for all subscribed event names, and
generate a dynamically changing button interface in the
remote wizard interface [4]. This approach has the
advantage of allowing the wizard to trigger any action the
designer has defined, regardless of whether the system is
currently capable of generating the corresponding event
itself. In practice, this means developers can add
functionality aimed at wizard control very quickly, and
gradually transition it to system control by adding
subsystems to generate the appropriate events.

Another approach would be to have the designer explicitly
indicate places to get WOz support within their code,
allowing her to specify parameters for the WOz Generator
or construct her own wizard interfaces. The annotation
could range from instantiating specific objects or calling
WOz functions, to code annotation in conjunction with a
language pre-processor. This approach would provide
support for customization and it would eventually work for
a wider variety of programming models. In any case, WOz
hooks for the programming environment should handle
low-level networking and broadcasting within a discovery
service, so that the requests for WOz support can be picked
up by any nearby WOz generator.

Templates will be available for common data structures or
frequent wizard situations, but the architecture will be
flexible enough to allow the wizard/designer to customize
the interface. In DART, for example, we provide support
for manually generating interfaces and injecting the same
kinds of events that our automatic interfaces generate [3].
Finally, the WOz generator will support data capture and
playback for evaluating all interactions between users,
wizards, and the system. Analysis tools will be intended to
help to evaluate the design of the user interface and,
speculatively, contribute to directions for each particular
software module being emulated.

Evaluation of Conceptual Framework and Tools

After creating and ironing out initial issues with the
conceptual framework and prototyping tools for WOz, I
will provide a significant public release of the tools. The
online tutorials for both the tools and my philosophy of
WOz-centric iterative design will give potential users
scaffolding on the potential power of the tools. At this
point, I will begin working closely with one or more groups
of designers on real-world scenarios to understand if my
WOz approach and tools can be useful throughout a design
process. Based on recent history, my advisor and I will
likely team up with designers from the neighboring Digital
Media program at Georgia Tech to create a novel adaptive
media experience with embodied forms of interaction.

Throughout the iterative process I will gather data about
how the design method and software tools are used. I can
record click navigation through the online documentation
as well as track changes to the system code base. I am
hesitant to state an explicit hypothesis about how the
method and tools will be appropriated, but I expect that
more extensive understanding and adoption of the WOz
framework and tools will lead to more frequent evaluation
and testing at multiple levels of formality, and ideally, to a
better end result.

I will adapt a design research method popular with
education technology [1], where we introduce our tools,
train the users in their use, and help them as needed. At
multiple stages of the process, I will interview the
experienced designers who can reflect on the impact of the
WOz method and tool support. In addition to contributing
tools to support more expansive and valuable use of the
WOz methodology, our observations of the design process
and WOz activities “behind the curtain” may reveal
insights for design theory––my preliminary thoughts on
deliberate re-distribution of cognition.

EXPECTED BENEFITS OF PARTICIPATION

The graduate symposium at Creativity and Cognition will
be timely and valuable for my thesis work. Currently, I am
writing my proposal for thesis research and I plan to
present my research plan to my committee this coming
spring. I have a two-year timeline for my research, so
gathering feedback from fellow colleagues would be
fruitful at this time. The interdisciplinary nature of this
conference provides an ideal setting for sharing my ideas.
During my dissertation I intend to collaborate with and
support both designers and developer, so I hope to
communicate with folks doing both creative and technical
work. My thesis also draws heavily from theories in
cognitive science as I look to understand the deliberate
distribution and redistribution of cognition during iterative
design of complex systems. Dr. Hollan and Dr. Czerwinski
will undoubtedly have useful feedback on my topic area.

REFERENCES

1. Barab, S. & Squire K. (2004) Design-Based Research: Putting
a Stake in the Ground, In Journal of Learning Sciences, 13(1).
2. Dahlback, N., Jonsson, A., & Ahrenberg, L. (1993) Wizard of
Oz Studies - Why and How. In Knowledge Based Systems,
6(4): 258 - 266.
3. Dow, S., Lee, J., Oezbek, C., MacIntyre, B., Bolter, J.D., &
Gandy, M. (2005) Exploring Spatial Narratives and Mixed
Reality Experiences in Oakland Cemetery. In ACM SIGCHI
Conf. on Advances in Computer Entertainment (ACE’05).
4. Dow, S., MacIntyre, B., Lee, J., Oezbek, C., Bolter, J.D., &
Gandy, M. (2005b) Wizard of Oz Support throughout an
Iterative Design Process. In IEEE Pervasive Computing
(Special Issue on Rapid Prototyping), November, 2005.
5. Dow, S., Mehta, M., Lausier, A., MacIntyre, B., & Mateas, M.
(2006) Initial Lessons from ARFaçade, An Interactive
Augmented Reality Drama. In ACM SIGCHI Conference on
Advances in Computer Entertainment (ACE’06).
6. Dow, S., Saponas, T. S., Li, Y., & Landay, J. A. (2006b)
External Representations in Ubiquitous Computing Design
and the Implications for Authoring Tools. In Conference on
Designing Interactive Systems (DIS’06).
7. Hollan, J. D., Hutchins, E. & Kirsh, D. (2000) Distributed
Cognition: a new foundation for human-computer interaction
research. ACM Transactions on Human-Computer Interaction.
7, No. 2, pages 174-196.
8. Höysniemi, J. & Read, J.C. (2005) Wizard of Oz Studies with
Children. In Proceedings of Interact 2005 Workshop on Child
Computer Interaction: Methodological Research.
9. Hutchins, E. (1995) Cognition in the Wild. Cambridge, Mass:
MIT Press.
10. Kelley, J. F. (1984) An iterative design methodology for user
friendly natural language office information applications. In
ACM Trans. on Office Information Systems, 2(1): 26 - 41.
11. Klemmer, S. R., Sinha, A. K., Chen, J., Landay, J.,
Aboobaker, N., & Wang, A. (2000) SUEDE: A Wizard of Oz
Prototyping Tool for Speech User Interfaces. In Proceedings
of UIST: ACM Symposium on User Interface Software and
Technology, pp 1-10.
12. Li, Y., Hong, J.I., & Landay J.A. (2004) Topiary: A Tool for
Prototyping Location-Enhanced Applications. ACM Symp. on
User Interface Software and Technology (UIST’04), 217-226.
13. MacIntyre, B., Gandy, M., Dow, S., & Bolter, J. (2004)
DART: A Toolkit for Rapid Design Exploration of
Augmented Reality Experiences. In Proceedings of User
Interface Software and Technology (UIST'04).
14. Mateas, M. and Stern, A. (2003) Façade: An Experiment in
Building a Fully-Realized Interactive Drama, In Game
Developer's Conference: Game Design Track.
15. Van Duyne, D., Landay, J.A., & Hong, J.I. (2006) The Design
of Sites: Patterns for Creating Winning Web Sites, 2nd edn.
Prentice Hall PTR.

Last modified 18 February 2008 at 9:48 pm by haleden