Optimizing Hierarchical Visualizations with the
Minimum Description Length Principle
Directory Mozilla (DMOZ)
2,083,282 web pages
The story of this paper begins with us exploring the Directory Mozilla Dataset (DMOZ).
DMOZ is a collection of websites organized into categories.
It has about half a million categories.
We wanted to explore the relative size of the categories, so we plotted the data as a treemap...
Not surprisingly, it is hard to read. Lots of clutter and barely space for labels.
5 Top Levels
Then we set a threshold on the number of hierarchy levels. This treemap only shows the 5 first levels, out of 13.
The left side looks fairly good, in terms of density. But the right side is still very cluttered.
4 Top Levels
Raising the threshold to 4 improves a bit the clutter on the right side
but now there is waste of space on the left side.
On a mobile display, the same 4 level treemap looks even harder to read, because the small rectangles are even tinier.
Perhaps a better choice would be showing only the first two levels.
“Optimal” level of abstraction varies
with display size
and dataset. So the problem is that the best level of abstraction will vary as a function of the display size and the dataset.
Adjusting the level abstraction is a common problem faced by vis designers.
select a good level of abstraction?
So the question we’re trying to answer with this paper is “How to automatically select a good level of abstraction”?
Clutter- and Information-aware Responsive Views
W S D A to resize treemap.
We present an algorithm for creating responsive views of hierarchical datasets that are clutter and information aware.
---Plays with chart---
But what does information aware mean?
It means that, in the process of collapsing nodes the algorithm tries to minimize the loss that comes from hiding important nodes. In order words, it tries to keep outliers exposed.
Before we talk about how we accomplished this, let’s define the scope and goals in more detail.
Weighted hierarchies with recursive scores.
Space-filling representations (e.g., treemap, sunburst).
We limit the scope to weighted hierarchies with recursive scores. Those are usually hierarchies where the leaves hold observable counts and the internal nodes are abstract classes.
And we focus on space filling-visual representations, such as treemap and sunburst diagrams, where scores are mapped to display area.
Strike a balance between clutter and information.
Data centric: work without user input.
No mysterious parameters.
Our primary goal is to produce views that balance clutter and information without user input.
So we are looking for overviews that are more useful by serving as better entry points for exploration.
Often techniques for adjusting the quality of visualizations require the user or the designer to adjust a free parameter. Instead, we wanted an algorithm that works well with what you already know.
Move cut up
W and down S.
Click on nodes for uneven cut.
The problem of optimizing the abstraction level reduces to the problem of finding a suitable tree cut.
There’s an inverse relation between the conciseness and the fitness of a cut as we can see in this example.
At the lowest level, the cut has the maximum length, and the best fitness, as it describes the data to full extent.
If we raise the abstraction level, the cut is more concise, but in order to get the values of the nodes underneath, we need to make inferences based on our expectation of the distribution of such values.
An even cut, however, is limiting. In order to have better control over the information density across the display, we can consider uneven tree cuts, such as those that I’m showing here.
With uneven cuts, we can go deep where there is more important information and remain abstract where there isn’t much.
Minimum Description Length Principle (MDL)
The best model is that which provides the best balance between
conciseness and fitness to data.
To select tree cuts, we use a model selection approach with roots in information theory called the minimum description length principle.
The principle states that, given a family of models of a dataset, the best model should provide the best balance between conciseness and fitness to data.
MDL evaluates models in two parts:
cost of encoding the
model cost of encoding
model residuals (error) MDL selects the model that
minimizes the sum of these costs.
MDL evaluates models in two parts:
The cost of encoding the model
The cost of encoding the error of the model
And MDL selects the models that minimizes the sum of these costs.
We can illustrate how MDL works with a simple example:
Cost (Message Length)
Cost (Message Length)
Cost (Message Length)
Suppose person A has a dataset and wants to transfer it to B over an expensive channel, where the price per bit is high.
So it’s in the best interest of A to send the shortest possible message.
Under the naive scheme, A sends all 7 data points to B.
But if A and B share a multilevel structure for this data, then they can do much better.
The purple tree cut corresponds to the naive scheme I mentioned, it doesn’t benefit from the hierarchy.
Note how all felidae share the same count. A can take advantage of this regularity to compress this group.
In this case, represented by the blue cut, B receives the count for felidae and divides the count by three to get the counts of leopard, tiger and cat. This message costs only five data points and does not introduce any error.
However, if A gets greedy and tries to abstract even more, the message gets longer. The cut has only two data points, but because the group of mammals is not uniform, when B reconstructs the scores, they are off. So A needs to send a second message correcting the error.
Note that the other cuts had zero error.
In the end, the blue model is selected by MDL for having the shortest message. It is, in fact, the one with the best balance of fitness and conciseness.
But if we run the MDL formulas on, say DMOZ, we will get the same tree cut regardless of the display. So our key contribution is adapting MDL to the visualization case.
This pipeline shows our concept. The visualization is an encoding of a tree cut model which, in turn, is a model of the data. The display is the channel carrying the message to the user.
Display as a channel
some source symbols cannot be encoded (when mapped to subpixel areas)
a pair of symbols can share the same representation
The MDL calculations assume that the channel doesn’t impose any limits on the accuracy of messages.
Our key contribution comes from realizing that this is not true for the display.
In practice, the display only allows partial and lossy encoding. You can see that in the snapshot on the slide.
These dense blobs that form when there are too many data points make decoding impossible.
In other words, you cannot trust the display to carry all messages without a loss.
With that in mind, we adapted the MDL formulation.
For the model cost, we wrote a function that relates to the level of clutter in the visualization.
And for the error cost, we modelled the information loss not only due to abstracted nodes, but also, limitations of the display.
Now I will show you a new example that motivates a different application of MDL.
Docuburst is a visualization tool published in 2009 by my co-author Christopher Collins.
In docuburst, you enter a document and it colors a hierarchy of concepts according to the frequency of these concepts in the document.
The problem we’re trying to solve here is finding a useful entry point for exploration.
As you can see in the slide, the full overview is a bit overwhelming (too many nodes)
And the abstract views that cut the hierarchy at a certain level are too abstract. Note how the nodes don’t tell much about the document.
Usually, the most important nodes are buried deep into the hierarchy, but the upper levels are still interesting as they provide context.
Here we see snapshots for the book Gamer Theory. Relevant keywords here are topology, game, universe, storyline, boredom, play and time.
So the questions is how can we support quick access to these nodes without too much clutter? We thought we could use MDL for this.
Here we see the user expanding the tree level by level, as it is standard on Docuburst.
After many interactions, the nodes I mentioned, are not yet revealed.
With the MDL tree cuts, each drill down goes a lot deeper in an asymmetrical way, revealing the outliers early.
We call this technique MDL drill downs.
DL = Cost(Visualization) +
Weight * Cost(Error)
We did this by introducing a weight parameter in the Description Length calculation.
Every time the user drills down we increase the importance of fitness to data, which makes MDL select deeper cuts.
We evaluated our responsive views in two steps:
A crowdsourced user study,
and using an analytical clutter model.
Find target on treemap
(e.g., Top/Arts/Music/Target) Measures
Time - Number of drill downs
For the user study,
we asked participants to find a
randomly placed target on a treemap of the DMOZ data.
They were given the ancestor path to the target and the treemap was interactive.
So participants could either pick the target directly or drill down.
We measured time and the number of drill down actions, and treated them
as indirect measures of clutter.
We tested four abstraction approaches:
views with no abstraction
two threshold values for depth: 3 and 4. These are views that show only 4 and 5 levels at a tim.
So we are comparing MDL with a high-density approach and two low density approaches.
We varied both display and tree size, as we wanted to know how consistent
these approaches are in different enviroments.
We also varied the depth of the target and the target value. Here we wanted
to know if MDL was any good at making deep outliers easier to spot.
Abstracted views: large improvements over non-abstracted views.
Tree size reduction meant worse times for all abstracted views.
Smallest negative impact observed on MDL views.
We found that, for large trees and displays, abstract views are indeed better than unfiltered views. Not a surprise here.
And we also found that the benefit of abstraction is much lower when the tree is smaller.
MDL views: largest positive effect on outlier target.
MDL views: had fewer drill downs than the other abstraction approaches.
In addition, we found that MDL views impacted the most the times for outlier targets, which confirms the behaviour of the technique.
Also, participatns who interacted with MDL required fewer drill downs to find the targets.
MDL had better times than the unabstracted views,
and similar times to T3 and T4.
In general, MDL had better times than the unabstracted views, and similar times
to the low density views.
Feature Congestion Model
(Rosenholtz et al., 2007) Measure of clutter based on the statistical saliency model.
In addition to the user study, we measured the clutter levels of MDL using the
feature congestion measure of clutter, which as a popular measure in the field of
The figure on the left shows that the level of clutter of MDL, in purple, remains
nearly constant as the size of the display increases.
In comparison, the unfiltered views, in red, have higher level of clutter,
and the fixed threshold views, in blue and green, become very sparse as the
On the right side,
we see the number of nodes for each approach. We can see that MDL achieves
the stable level of clutter by adding more nodes to fill the new empty space.
Adapt abstraction as function of available display size.
MDL Drill Down
Asymmetrical, information-aware drill down.
In summary, I presented two applications of MDL to visualization problems.
MDL Views adjust the level of abstraction as a function of available display size.
MDL Drill Down is a asymmetrical, information-aware approach to explore hierarchies.