Links
Course Documents
     Main Page
     Assignments
     Contact Information
     Course Announcement
     Course Participants
     Discussion Forum
     Lecture Material
     Previous Course
     Project
     Questionnaires
     Schedule and Syllabus
     Swiki Basics
Swiki Features:
  View this Page
  Edit this Page
  Printer Friendly View
  Lock this Page
  References to this Page
  Uploads to this Page
  History of this Page
  Top of the Swiki
  Recent Changes
  Search the Swiki
  Help Guide
Related Links:
     Atlas Program
     Center for LifeLong Learning and Design
     Computer Science Department
     Institute of Cognitive Science
     College of Architecture and Planning
     University of Colorado at Boulder

Capturing and Presenting Relevant Information in the EDC

Final Report


(back to homepage)

Purpose

In Envisionment and Discovery Collaboratory (EDC) problem solving situations, construction activities and information relevant to the current problem are tightly coupled. However, the current implementation of the EDC does not interact well with the outside world. Currently, there is no direct or automatic connection between information resources and the EDCs information space. People who find interesting articles must add them to the EDC by hand.

In this project we sought to explore an interface by which the EDC might provide access to such external information sources to participants in a useful manner.

Rationale

The situations for which the EDC is most useful, in particular community oriented design problems, rarely exist in an informational vacuum. Expert opinions, previous EDC sessions, news articles, and other sources might provide valuable insight into the problems EDC participants face. Therefore, we set out to design and implement a system which would provide users of the EDC with useful information about the problems they sought to address.

Implementation

Orginaly, our plan was to develop a system to mine an entity, such as the Boulder Daily Camera web site, for articles relating to transportation in Boulder and bring them back to the EDC environment.

However, as we began to discuss approaches to this detailed problem, we realized that there were tools that would make harvesting articles that appeared to relate to transportation would easy and we could do more. Instead of merely getting the articles, we could also design the parts of the system which would organize the information collected and present only articles likely to be useful.

We therefore began developing a thorough workflow with the intention of implementing a working prototype of the information retrieval system. While we had hope that we could develop a full system which could be integrated into the EDC, we ended with something less. We successfully developed a model for how a system could bring documents into the EDC and display relevant information to the user. In addition to this high level model, we have an API specified for interaction between components of the model and reference implementations for several components.

Our hope now is that we will be able to plant the seed for a system which would address these problems by integrating a knowledge management system with the EDC's reflection space.

Approach


The model we developed is fairly simple. Using a web-based interface, the user instructs agents to go out and retrieve information based on specified criteria (such as the topics for an upcoming EDC session). What it brings back is stored in a database along with any keywords with which it was associated when found. Later, a user may review the information brought back and associate further keywords and comments with it. When the EDC is in use, the organizer module can be asked to display any chunks of information it has on specific keywords. This means that relevance is determined in two manners, first contextually, and finally by review.

To actually build this model, we decided to create a web application that uses Java classes to interact with a MySQL database. We chose this approach over several other quality options largely because of an assortment of tools available to us if we followed this path. A Java toolkit for testing web pages, httpUnit, enabled more rapid development of agents. A Java framework [onebook.sourceforge.net] for storing information about documents in a MySQL database was also freely available. Below is an overview of the components of the model, and a discussion of how they behave and interact.

Agents: Agents are required to go out to a specific source and collect documents from it. Each agent will be custom tailored to a technique or source of getting information. While this is our intent, there are no firm restrictions on agents. The only real restriction is that they have to extend the Java interface of agent, which we have specified.

Database: To track resources and remember which resources are associated with which keywords, we implemented a simple MySQL database.

Keywords: Keywords are simple terms and phrases that can be collected into groups. One group could be "Bus Routes" and contain terms such as "227","210","A", and "The Ride" (names of RTD bus routes).

Documents: The information we gather is represented in this model as documents. In the current system, all documents are stored as URLs in the database. Additional other information could also be stored, such as description of the document and names.

The Organizer: The organizer is responsible for getting the agents to interact well with the back end. It also provides the web interface with APIs to get Documents meeting certain criteria.

User Interface: The user interface presents the users with web pages that allow for manipulation of the other components.

System Behavior


For our implementation, we decided to focus on retrieving information from online news feeds. We decided that an autonomous agent would have a fair chance of detecting useful information, but the final decision should be made by a human. Our system supports agents that suggest certain resources as valuable and people who decide if the resource should be kept or thrown out. The problem of teams passing information to other teams attempting a similar problem was left mostly unaddressed. However, the underlying framework would support several approaches to that problem.



Use of the information retrieval system works as such:

  • An EDC session is scheduled and a list of keywords is developed and passed to the system
  • The keywords are given to the agents, which access their assigned resources and look for relevant documents
  • These documents are tagged with associated keywords and passed to the organizer, which catalogs them
  • At this point (or any other) a user can query the system for newly found documents. These can be accepted into the system or rejected as irrelevant. Keywords and descriptions could be changed at this point as well.
  • In session, the EDC provides a list of resources, along with any notes it has, to participants
    • Participants may annotate the resources to further refine their relevance to the task

Progress

Some of these areas we have developed reference implementations for; others we have not. Below is a list of the components and a discussion of the reference implementions, if any:

Agents: We have implemented two reference agents. One uses httpUnit [httpunit.org] to retrieve bus schedule information from the RTD. The other provides a very basic interface for adding URLs of documents from the command line.

Database: We have a put together a simple database schema that supports the functionality created thus far.

Keywords: Keywords and keyword groups have been implemented in Java under the names Tag and TagDictionary.



Documents: A reference implementation of Document is provided. A handful of improvements would be useful, though. For instance, small changes would need to be made in order to have a web page automatically saved locally and accessed from there.

The Organzer: The current reference implementation is half completed.

User Interface: While we have experimented some with JSP to ensure the other components are functional in a web environment, there is currently very little in the way of an implemented user interface.

Future Work

We feel that we were unable to carry this project as far as we desired during the time we had this semester. As we stated earlier, we had hoped to develop a fully functioning prototype, we were merely able to develop a proof of concept around our ideas. However, we feel that the workflow we have developed will be of great use to anyone interested in pursuing this project further than we were able to.

We have several avenues that we feel can be explored fruitfully using
our system as a point of departure. As with many of the projects
undertaken this semester, success relies in large part on information
being provided to the user in a timely and informative manner. This
project lays down a foundation for future work in this area. Our
current implementation attempts to strike a balance between push and
pull, as we feel that neither can be effective by itself. What we do
is provide two methods of developing the store of information to be
provided to participants in the EDC. First, the system provides
resources to the user by matching information about the session with
its database of articles. Second, the system allows the user to annotate these links to refine these associations. In order to make this even more useful, it would be nice for the recomendations of the EDC to be influenced by the actions of the EDC participants. In this manner we feel that the issues explored by Dipti and Payal could be of great use in further developing this system

Furthermore, work needs to be done to improve the intelligence of the
agents that collect articles for the EDC. The current set of agents
are particularly naive about content in a given web resource.
Research done with regards to search engines and document analysis
would greatly improve the intelligence of the agents. In addition, an alternative architecture could be explored. Rather than having
autonomous agents searching for resources, it is possible that
syndicated feeds could be provided to the EDC, and that the EDC could
monitor these feeds so as to gather articles in a more passive
manner. This could be accomplished using a syndication technology
such as RDF.

References

Arias, E. G., Eden, H., Fischer, G., Gorman, A., & Scharff, E. (2000) "Transcending the Individual Human Mind–Creating Shared Understanding through Collaborative Design," ACM Transactions on Computer Human-Interaction, 7(1), pp. 84-113.

Redmiles, D. F. (1993) "Reducing the Variability of Programmer's Performance Through Explained Examples," InterCHI '93, pp.67-73.

View this PageEdit this PagePrinter Friendly ViewLock this PageReferences to this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide