dGnabasik colloquia review 1

Oct. 4, 2007 Ben Schneiderman's presentation on "The Thrill of Discovery: Information Visualization for High-Dimensional Spaces" at http://www.cs.colorado.edu/events/colloquia/2007-2008/shneiderman.html.

Demonstrations of various information visualization programs were given over multiple problem domains. Whereas traditional scientific visualization uses liner, map, and world (1-, 2-, 3-D) data types these newer programs employ (static) multi-variable, temporal, tree, and network-based data types to impressive effect. The manipulation of real-time streaming data was not demonstrated. The goal is to discover features in these multi-variable data that can then be ranked. The investigator must know what he is looking for, in terms of metrics, before the extracted features can be ranked. This is a powerful concept that should bear a lot of research fruit. However, as he pointed out, there are always large correlations among the data features whether described linearly or non-linearly, and all the more so in higher dimensions.

Dr. Schneiderman spoke some interesting phrases:
1. "Overview, zoom and filter, details on-demand."
2. "Rapid, incremental, reversible operations."
3. "Visualization discovery provides answers to questions you didn't think you had."
4. "Using vision to think."
5. "As you increase the number of dimensions, the number of outliers increases until every datum is an outlier."

Items 1 and 1 are program design principles that have proven their effectiveness. I'm sure that a lot of time is spent preparing, normalizing and organizing the data into an efficient set of data structures in order for these principles to be realized! Item 3 puts flesh on the generation of scientific hypotheses. At the very least, recognizing an unusual trend or set of outliers forces an hypothesis to be more robust. Item 4 is a powerful statement if it operates on processes and data that are normally invisible to vision. The last item is worrisome to me because I work with proteomic data that involves thousands of experimental variables (i.e., dimensions). Normal statistical analysis of this very low sample size data is very suspect and inference is nearly impossible. But I have downloaded the software to see if I can gain some insight into the data that I have.

Last modified 4 October 2007 at 9:32 pm by dgnabasik