Parallel Tag Clouds to Explore Faceted Text Corpora

Contributors

Christopher Collins, Fernanda B. Viégas, Martin Wattenberg

Abstract

Do court cases differ from place to place? What kind of picture do we get by looking at a country’s collection of law cases? We introduce Parallel Tag Clouds: a new way to visualize differences amongst facets of very large metadata-rich text corpora. We have pointed Parallel Tag Clouds at a collection of over 600,000 US Circuit Court decisions spanning a period of 50 years and have discovered regional as well as linguistic differences between courts. The visualization technique combines graphical elements from parallel coordinates and traditional tag clouds to provide rich overviews of a document collection while acting as an entry point for exploration of individual texts. We augment basic parallel tag clouds with a details-in-context display and an option to visualize changes over a second facet of the data, such as time. We also address text mining challenges such as selecting the best words to visualize, and how to do so in reasonable time periods to maintain interactivity.

Publications

  • C. Collins, F. B. Viégas, and M. Wattenberg, “Parallel Tag Clouds to Explore and Analyze Facted Text Corpora,” in Proc. of the IEEE Symp. on Visual Analytics Science and Technology (VAST), 2009.
    [Bibtex] [PDF] [DOI]
    @InProceedings{COL2009b,
      key =     {COL2009b},
      author =   {Christopher Collins and Fernanda B. Vi\'egas and Martin Wattenberg},
      title =   {Parallel Tag Clouds to Explore and Analyze Facted Text Corpora},
      booktitle =   {Proc. of the IEEE Symp. on Visual Analytics Science and Technology (VAST)},
      year =   2009,
      page = {91 - 98},
      doi = {10.1109/VAST.2009.5333443}
    }

Media

Slides from the presentation at IEEE VAST 2009.

film [Download high resolution mp4]

 


Acknowledgements

This research is the result of an internship project at the Visual Communications Lab offsite link at IBM Research offsite link (TJ Watson).

| © Copyright vialab | Dr. Christopher Collins, Canada Research Chair in Linguistic Information Visualization |