Optimizing Hierarchical Visualizations with the Minimum Description Length Principle

Rafael Veras

Christopher Collins

Directory Mozilla (DMOZ)
Dataset

2,083,282 web pages

498,487 categories

More than 2.5M nodes

Overview

5 Top Levels

4 Top Levels

“Optimal” level of abstraction varies
with display size and dataset.

How to automatically
select a good level of abstraction?

Clutter- and Information-aware Responsive Views

W S D A to resize treemap.

Scope

Weighted hierarchies with recursive scores.

Space-filling representations (e.g., treemap, sunburst).

Goals

Strike a balance between clutter and information.

Data centric: work without user input.

No mysterious parameters.

Move cut up W and down S.
Click on nodes for uneven cut.

Minimum Description Length Principle (MDL)


The best model is that which provides the best balance between conciseness and fitness to data.

MDL evaluates models in two parts:

  • cost of encoding the model
  • cost of encoding model residuals (error)

MDL selects the model that minimizes the sum of these costs.

Data

rabbit 5
cat 10
tiger 10
leopard 10
fox 2
wolf 8
fly 5

Cost (Message Length)

parameters error total
7 0 7

Cost (Message Length)

parameters error total
7 0 7
5 0 5

Cost (Message Length)

parameters error total
7 0 7
5 0 5
2 6 8

Display as a channel

Partial

some source symbols cannot be encoded (when mapped to subpixel areas)

Lossy

a pair of symbols can share the same representation

Full overview

Top Levels

Level-by-level

MDL

DL = Cost(Visualization) + Weight * Cost(Error)

Evaluation

User Study

Clutter Model

Task

Find target on treemap
(e.g., Top/Arts/Music/Target)

Measures

Time - Number of drill downs

Results

Abstracted views: large improvements over non-abstracted views.

Tree size reduction meant worse times for all abstracted views.
Smallest negative impact observed on MDL views.

Results

MDL views: largest positive effect on outlier target.

MDL views: had fewer drill downs than the other abstraction approaches.

Results

MDL had better times than the unabstracted views,
and similar times to T3 and T4.

Feature Congestion Model
(Rosenholtz et al., 2007)

Measure of clutter based on the statistical saliency model.

Summary

MDL Views

Adapt abstraction as function of available display size.

MDL Drill Down

Asymmetrical, information-aware drill down.

Demos

http://vialab.ca/mesh/

http://vialab.ca/dmoz/

Source code

http://github.com/
rafaveguim/treecut.js

On twitter: @rafaveguim

Sponsors

// TODO: Modify cut path to go around obstacles. // Improve DMOZ hierarchy snapshots. // Center dynamic treemap // Improve performance in dynamic treemap // Add texture to internal nodes of dynamic treemap