Interacting With DocuBurst Comparing Documents
The Data Logging User Study
What is DocuBurst?
DocuBurst is an online document visualization tool, and can be used for:
- Uploading your own text documents
- Creating interactive visual summaries of documents
- Exploring keywords to uncover document themes or topics
- Investigating intra-document word patterns, such as character relationships
- Comparing documents
- Commenting, annotating and sharing visualizations with others
1 Upload a document
Click the "Upload" button on the home page. Place the document's text directly into the textbox or upload it as a text file. By clicking the "Search" button on the homepage, you may also explore documents uploaded by other users
2 Wait for the processing to complete...
Processing a document involves dividing it into pieces (or roughly, paragraphs), and extracting relevant key words. For longer documents, this process will take longer.
3 Select a root
The DocuBurst visualizes hierarchically structured nouns. A root is the starting point or centre word of the DocuBurst.
If you are unsure of which root to start with, try clicking on one of the suggested roots. These suggested roots have been carefully selected, creating DocuBursts with a good amount of detail.
If you decide to enter a root word, you may notice the option to select a sense. Senses are different meanings of a word. For example, "The fisherman caught the sea bass" and "The boy plays bass guitar" are two different senses of "bass". The senses are ordered by popularity: more common senses are listed first.
The darker colours of DocuBurst represent words which occur more often in the document. The lighter coloured words occur less often in the document.
The DocuBurst slices are sized based on it's number of children, or number of slices directly placed directly above it.
The word cloud is an unstructured visualization of proper nouns (e.g. names of people, places, companies etc.). Words are sized based on how many times they occur in the document.
Colour-by & Depth
When a word is selected on the DocuBurst, the score bar will display the word's score in the document. This score represents how strongly the word occurs in the document.
By selecting two documents from the homepage, a comparative DocuBurst is created.
The DocuBurst and word cloud contain words from both documents.
In this image, the books Alice in Wonderland and The Little Mermaid, are being compared using the DocuBurst rooted at the word "fish". Blue and green distinguish the different documents, and red words are found in both documents.
Any document that you have uploaded can be removed by visiting the Delete page. If you did not provide your e-mail, then you may be required to enter the 4-digit admin code given when the document was uploaded.
We are still working on improving the performance of DocuBurst. For more general root words, such as "entity", which produce a large DocuBurst tree, some performance related issues may occur:
- Slow load time of the DocuBurst
- Slow interactivity
- Missing pink highlights on the DocuBurst. In this case the data is being retrieved behind the scenes and may take a few minutes to arrive and appear on the DocuBurst. However, interacting with the DocuBurst will not affect this process.
We appreciate your patience with these issues and encourage you to e-mail the researcher (Christopher.Collins@uoit.ca) reporting any problems. This will help us with continually fixing problems.
Christopher Collins. Interactive Visualizations of Natural Language . University of Toronto. 2010. [PDF]
Sheelagh Carpendale, Christopher Collins and Gerald Penn. DocuBurst:Visualizing Document Content Using Language Structure. Eurovis '09. 2009.  [PDF]
Brittany Kondo. Incorporating Proper Nouns into a Web-Based Document Visualization . University of Ontario Institute of Technology. 2012.  [PDF]