Monday, April 02, 2007

Visualizing hierarchical data

Today much data is gathered from the web. Data from websites often tend to be hierarchical in nature: For example, on Amazon we have categories (music, books, etc.), then within a category there are sub-categories (e.g, within Books: Business & Technology, Childrens' books, etc.), and sometimes there are ever additional layers. Other examples are eBay, epinions, and almost any e-tailor. Even travel sites usually include some level of hierarchy.

The standard plots and graphs such as bar charts, histograms, boxplots might be useful for visualizing a particular level of hierarchy, but not the "big picture". The method of trellising is useful, where a particular graph is "broken down" by one or more variables. However, you still do not directly see the hierarchy.

An ingenious method for visualizing hierarchical data is the Treemap, designed by Professor Ben Shneiderman from the Human-Computer Lab at the University of Maryland. The treemap is basically a rectangle region broken down into sub-rectangles (and then possbily into further sub-sub-rectangles), where each basic smallest rectangle represents the unit of interest. Then color and/or size can be used to describe measures of interest.

Treemap's original goal was to visualize one's hard drive (with all its directories and sub-directories) for detecting pheonomena such as duplications. There a file was a single entity, and its size, for instance, could be represented by the rectangle's size. Since its development in the 1990s it has spread widely across almost every possible discipline. Probably the most popular application is in SmartMoney's Map of the Market where you can visualize the current state of the entire stock market. The strength of the treemap lies both in the ability to include multiple levels of hierarchy (you can drill-in and out to different levels) and also in its interactive nature, where users can choose to manipulate color, size, and order to represent measures of interest.

Microsoft research posts a free Excel add-on called Treemapper, but after trying it out I think it is too limited: It allows only one level of hierarchy and does not have any interactivity (it also requires only numerical information).

Last month the business section of the New York Times featured an article This time, no roadside assistance on DaimlerChrysler, which included a neat Treemap. Since it is no longer available online (NYT does not include graphics in its archives...) here it is -- courtesy of Amanda Cox from the NYT, known as their "statistics wiz".


You can find many more neat examples of using Treemap on the HCIL website.

1 comment:

Unknown said...

Treemapper allows multiple levels of data: simply embed the hierarchy in the names of your fields.

That is, make your spreadsheet look like
1 1 TOP
1 2 TOP L1
1 3 TOP L2
1 4 TOP L2 DEEPLY
2 1 OTHER

...etc...