Investigating the Semantic Patterns of Passwords


Rafael Veras Guimaraes, Julie ThorpeChristopher Collins


What is the meaning within a password?  And, how does the meaning in your password relate to security risks?  In our research into the ‘secret language of passwords,’ we have investigated the numerical and textual patterns from a semantic (meaning) point of view.  Where prior research investigated letter and number sequences to expose vulnerable passwords, such as “password123,” our research has delved into the composition of seemingly complex passwords such as “ilovedan1201” or “may101982” and revealed common patterns.  In these cases, the patterns of <I><love><male-name><number> and <month in letters><day in numbers><year after 1980 in numbers> are common patterns which, once learned, can be used to generate password guesses, such as “IloveMike203” and “July022001”.

Using linguistic analysis and interactive visualization techniques, we have investigated the patterns of ‘date-like’ numbers in passwords, and the meaning and relationships between types of words in passwords.  The resulting analysis guided our creation of a password guessing system (not available to the public!) which on several measures is better than any prior published result.  The exposed vulnerabilities are motivating our ongoing work into new ways to help people create semantically secure passwords. This research contributed to a major story in the New York Times Magazine on the Secret Life of Passwords.

Our research started with the many large password leaks that were made publicly available on the Internet.  In particular, the 32 million passwords from the RockYou website, exposed in 2009.

Our published research was conducted in two phases:

Date and Numbers

We started exploring date patterns, as 24% of the RockYou passwords contain a numeric sequence of at least 4 digits. So we wondered whether or not these sequences are dates, and if so, are there any temporal patterns? Our analyses found that 6% of these passwords (almost 2 million accounts!) contain numbers that match a date. To facilitate exploration of the patterns in the choice of dates, we created an interface that allows one to find the frequency that each day, month, year or decade (back to the year 1900) is referred to, as well as the corresponding passwords. We did not count passwords with numbers that are more likely to be keyboard patterns than dates, such as “111111”. Exploring the data through this interface, we confirmed some predictable patterns, such as the preference for dates that have repeated days and months (e.g., 08/08/1989), but also uncovered hidden ones, such as a consistent preference for the first two days of months, holidays, and a few notorious dates (e.g., Titanic accident) . For a detailed report on this work please read our paper or try our exploratory interface.

Words and Building a Password Grammar

In the second part of this research, we turned our attention to semantic patterns in the choice of words. Employing natural language processing techniques, we broke each password into words and classified the words according to their syntactic (grammar) function and semantic (meaning) content. The result is a rich model representing the syntactic and semantic patterns of a collection of passwords. With this model, we can rank the semantic categories to find that “love” is the most prevalent verb in passwords, “honey” is the most used food-related word, and “monkey” is the most popular animal, for example. Contrary to reported psychology research, we found that many categories related to sexuality and profanity are among the top 100. Our work also brought insight on the relations between concepts; for example, our model shows that a male name is four times more likely to follow the string “ilove” than a female name. Our paper, published in the NDSS Symposium 2014, discusses the security implications of our work. In summary, we show that the security provided by passwords is overestimated by methods that do not account for semantic patterns.


  • [IMG]
    R. Veras, J. Thorpe, and C. Collins, “Visualizing Semantics in Passwords: The Role of Dates,” in Proc. of the IEEE Symposium on Visualization for Cyber Security (VizSec), 2012, pp. 88-95.
    [Bibtex] [PDF] [DOI]

       author = {Rafael Veras and Julie Thorpe and Christopher Collins},
       title = {Visualizing Semantics in Passwords: The Role of Dates},
       booktitle = {Proc. of the IEEE Symposium on Visualization for Cyber Security (VizSec)},
       year = 2012,
       pages = {88 -- 95},
       doi = {10.1145/2379690.2379702}
  • [IMG]
    R. Veras, “An Investigation of Semantic Patterns in Passwords,” Master Thesis, 2013.
    [Bibtex] [PDF]

      author =    {Rafael Veras},
      title =    {An Investigation of Semantic Patterns in Passwords},
      school =    {University of Ontario Institute of Technology},
      year =    2013
  • [IMG]
    R. Veras, C. Collins, and J. Thorpe, “On Semantic Patterns of Passwords and their Security Impact,” in Proceedings of the Network and Distributed System Security Symposium (NDSS’14), 2014.
    [Bibtex] [PDF]

      author =    {Rafael Veras and Christopher Collins and Julie Thorpe},
      title =    {On Semantic Patterns of Passwords and their Security Impact},
      booktitle =    {Proceedings of the Network and Distributed System Security Symposium (NDSS’14)},
      year =   2014

Online Demos

Try the dates visualization yourself!

Try the words visualization yourself! 



Media Coverage

Our research has also been featured in additional media, including:

We have also been featured on UOIT Homepage, including and article entitled “Heartbleed update: UOIT researchers analyze why consumers use weak passwords


Thanks to undergraduate alumni Jeffrey Hickson and Swapan Lobana who worked as research assistants on this project, and to the funding agencies who supported this work.


Supporting Serendipitous Discovery and Balanced Analysis of Online Product Reviews with Interaction-Driven Metrics and Bias-Mitigating Suggestions

Covid Connect: Chat-Driven Anonymous Story-Sharing for Peer Support

Learn, Generate, Rank, Explain: A Case Study of Visual Explanation by Generative Machine Learning

Professional Differences: A Comparative Study of Visualization Task Performance and Spatial Ability Across Disciplines

Card-IT: a Dynamic FSM-based Flashcard Generator for Learning Italian Verb Morphology

Visual Analytics Tools for Academic Advising

Érudit and Vialab Collaboration Projects

Academia is Tied in Knots

Tilt-Responsive Techniques for Digital Drawing Boards

Textension: Digitally Augmenting Document Spaces in Analog Texts

Eye Tracking for Target Acquisition in Sparse Visualizations

Guidance in the human–machine analytics process

H-Matrix: Hierarchical Matrix for Visual Analysis of Cross-Linguistic Features in Large Learner Corpora

A Visual Analytics Framework for Adversarial Text Generation

Design by Immersion: A Transdisciplinary Approach to Problem-Driven Visualizations

Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections

Discriminability Tests for Visualization Effectiveness and Scalability

Saliency Deficit and Motion Outlier Detection in Animated Scatterplots

ActiveInk: (Th)Inking with Data

Visual Analytics for Topic Model Optimization based on User-Steerable Speculative Execution

ThreadReconstructor: Modeling Reply-Chains to Untangle Conversational Text through Visual Analytics

Detecting Negative Emotion for Mixed Initiative Visual Analytics

EduApps – Supporting Non-Native English Speakers to Overcome Language Transfer Effects

Metatation: Annotation as Implicit Interaction to Bridge Close and Distant Reading

DataTours: A Data Narratives Framework

Perceptual Biases in Font Size as a Data Encoding

Progressive Learning of Topic Modeling Parameters: A Visual Analytics Framework

Abbreviating Text Labels on Demand

NEREx: Named-Entity Relationship Exploration in Multi-Party Conversations

ConToVi: Multi-Party Conversation Exploration using Topic-Space Views

PhysioEx: Visual Analysis of Physiological Event Streams

Using Visual Analytics of Heart Rate Variation to Aid in Diagnostics

Off-Screen Desktop


Reading Comprehension on Mobile Devices

#FluxFlow: Visual Analysis of Anomalous Information Spreading on Social Media

Balancing Clutter and Information in Large Hierarchical Visualizations

Lexichrome: Text Construction and Lexical Discovery with Word-Color Associations Using Interactive Visualization

SentimentState: Exploring Sentiment Analysis on Twitter

Facilitating Discourse Analysis with Interactive Visualization




Simple Multi-Touch Toolkit

Exploring Text Entities with Descriptive Non-photorealistic Rendering

Investigating the Semantic Patterns of Passwords

Bubble Sets: Revealing Set Relations with Isocontours over Existing Visualizations

Parallel Tag Clouds to Explore Faceted Text Corpora

VisLink: Revealing Relationships Amongst Visualizations

DocuBurst: Visualizing Document Content using Language Structure

Tabletop Text Entry Techniques

Lattice Uncertainty Visualization: Understanding Machine Translation and Speech Recognition

WordNet Visualization

// Where the sidebar information is stored
| © Copyright vialab | Dr. Christopher Collins, Canada Research Chair in Linguistic Information Visualization |