About

Getting Started

Reading DocuBurst

Interacting With DocuBurst

Comparing Documents

Deleting Documents

Performance

The Data Logging User Study

Additional Resources



Video Tutorial




What is DocuBurst?

DocuBurst is an online document visualization tool, and can be used for:


Getting Started

1 Upload a document

Click the "Upload" button on the home page. Place the document's text directly into the textbox or upload it as a text file. By clicking the "Search" button on the homepage, you may also explore documents uploaded by other users

2 Wait for the processing to complete...

Processing a document involves dividing it into pieces (or roughly, paragraphs), and extracting relevant key words. For longer documents, this process will take longer.

3 Select a root

The DocuBurst visualizes hierarchically structured nouns. A root is the starting point or centre word of the DocuBurst. If you are unsure of which root to start with, try clicking on one of the suggested roots. These suggested roots have been carefully selected, creating DocuBursts with a good amount of detail.

If you decide to enter a root word, you may notice the option to select a sense. Senses are different meanings of a word. For example, "The fisherman caught the sea bass" and "The boy plays bass guitar" are two different senses of "bass". The senses are ordered by popularity: more common senses are listed first.


Reading DocuBurst

DocuBurst

DocuBurst is a hierarchically structured visualization of nouns. It begins with a root word, which is very generic, and extends outwards to it's children which are more specific words

The darker colours of DocuBurst represent words which occur more often in the document. The lighter coloured words occur less often in the document.

The DocuBurst slices are sized based on it's number of children, or number of slices directly placed directly above it.

Word Cloud

The word cloud is an unstructured visualization of proper nouns (e.g. names of people, places, companies etc.). Words are sized based on how many times they occur in the document.

Colour-by & Depth

Single node colours words directly by their score, or number of times they occur in the document
Cumulative colours words based on an accumulation of scores of all children
The depth of DocuBurst can be increased or decreased, which will add or remove a layer of children

Scorebar

When a word is selected on the DocuBurst, the score bar will display the word's score in the document. This score represents how strongly the word occurs in the document.

Interacting With DocuBurst

Explore the DocuBurst

Hover over any word to highlight other words found near it in the document
Left click a word to select it
Right click a word to select its sub-tree
Double click a word to create a new DocuBurst rooted at the selected word
Click and drag the background to move the DocuBurst and pan the screen
Use the mouse wheel to zoom

Explore the Word Cloud

Hover over any word to highlight other words found near it in the document
Left click a word to select it and filter the word cloud to contain only words occurring near the selected word in the document

Explore Both

Left click one word from both the word cloud and DocuBurst. Both words are found in the orange-coloured paragraphs (left hand side)
Similarily, a word from the DocuBurst can be right-clicked.
Hovering over any word re-colours the view. For example, if "animal" is hovered on, then the coloured words in the word cloud occur near "animal" somewhere in the document. The coloured paragraphs (left hand side) show where "animal" occurs in the document.

Comparing Documents



By selecting two documents from the homepage, a comparative DocuBurst is created. The DocuBurst and word cloud contain words from both documents.

In this image, the books Alice in Wonderland and The Little Mermaid, are being compared using the DocuBurst rooted at the word "fish". Blue and green distinguish the different documents, and red words are found in both documents.

Deleting Documents

Any document that you have uploaded can be removed by visiting the Delete page. If you did not provide your e-mail, then you may be required to enter the 4-digit admin code given when the document was uploaded.


Performance

We are still working on improving the performance of DocuBurst. For more general root words, such as "entity", which produce a large DocuBurst tree, some performance related issues may occur:

We appreciate your patience with these issues and encourage you to e-mail the researcher (Christopher.Collins@uoit.ca) reporting any problems. This will help us with continually fixing problems.


Additional Resources

Christopher Collins. Interactive Visualizations of Natural Language . University of Toronto. 2010. [PDF]

Sheelagh Carpendale, Christopher Collins and Gerald Penn. DocuBurst:Visualizing Document Content Using Language Structure. Eurovis '09. 2009.  [PDF]

Brittany Kondo. Incorporating Proper Nouns into a Web-Based Document Visualization . University of Ontario Institute of Technology. 2012.  [PDF]

Couldn't find what you were looking for?

Please don't hesitate to contact Christopher Collins by e-mail: Christopher.Collins@uoit.ca for any inquiries.


Back to Top of Page