Menu Close

Investigating the Semantic Patterns of Passwords

Summary

What is the meaning within a password?  And, how does the meaning in your password relate to security risks?  In our research into the ‘secret language of passwords,’ we have investigated the numerical and textual patterns from a semantic (meaning) point of view.  Where prior research investigated letter and number sequences to expose vulnerable passwords, such as “password123,” our research has delved into the composition of seemingly complex passwords such as “ilovedan1201” or “may101982” and revealed common patterns.  In these cases, the patterns of <I><love><male-name><number> and <month in letters><day in numbers><year after 1980 in numbers> are common patterns which, once learned, can be used to generate password guesses, such as “IloveMike203” and “July022001”.

Using linguistic analysis and interactive visualization techniques, we have investigated the patterns of date-like numbers in passwords, and the meaning and relationships between types of words in passwords.  The resulting analysis guided our creation of a password guessing system (not available to the public!) which on several measures is better than any prior published result.  The exposed vulnerabilities are motivating our ongoing work into new ways to help people create semantically secure passwords. This research contributed to a major story in the New York Times Magazine on the Secret Life of Passwords.

Our research started with the many large password leaks that were made publicly available on the Internet.  In particular, the 32 million passwords from the RockYou website, exposed in 2009.

Our published research was conducted in two phases:

Date and Numbers

We started exploring date patterns, as 24% of the RockYou passwords contain a numeric sequence of at least 4 digits. So we wondered whether or not these sequences are dates, and if so, are there any temporal patterns? Our analyses found that 6% of these passwords (almost 2 million accounts!) contain numbers that match a date. To facilitate exploration of the patterns in the choice of dates, we created an interface that allows one to find the frequency that each day, month, year or decade (back to the year 1900) is referred to, as well as the corresponding passwords. We did not count passwords with numbers that are more likely to be keyboard patterns than dates, such as “111111”. Exploring the data through this interface, we confirmed some predictable patterns, such as the preference for dates that have repeated days and months (e.g., 08/08/1989), but also uncovered hidden ones, such as a consistent preference for the first two days of months, holidays, and a few notorious dates (e.g., Titanic accident) . For a detailed report on this work please read our paper or try our exploratory interface.

Words and Building a Password Grammar

In the second part of this research, we turned our attention to semantic patterns in the choice of words. Employing natural language processing techniques, we broke each password into words and classified the words according to their syntactic (grammar) function and semantic (meaning) content. The result is a rich model representing the syntactic and semantic patterns of a collection of passwords. With this model, we can rank the semantic categories to find that “love” is the most prevalent verb in passwords, “honey” is the most used food-related word, and “monkey” is the most popular animal, for example. Contrary to reported psychology research, we found that many categories related to sexuality and profanity are among the top 100. Our work also brought insight into the relations between concepts; for example, our model shows that a male name is four times more likely to follow the string “ilove” than a female name. Our paper, published in the NDSS Symposium 2014, discusses the security implications of our work. In summary, we show that the security provided by passwords is overestimated by methods that do not account for semantic patterns.

Online Demos

Try the dates visualization yourself!

Try the words visualization yourself! 

Software

Semantic-Guesser

Media Coverage

Our research has also been featured in additional media, including:

We have also been featured on UOIT Homepage, including an article entitled “Heartbleed update: UOIT researchers analyze why consumers use weak passwords“.

Publications

    [pods name="publication" id="4365" template="Publication Template (list item)" shortcodes=1] [pods name="publication" id="4398" template="Publication Template (list item)" shortcodes=1] [pods name="publication" id="4347" template="Publication Template (list item)" shortcodes=1]

Acknowledgements

Thanks to undergraduate alumni Jeffrey Hickson and Swapan Lobana who worked as research assistants on this project, and to the funding agencies who supported this work.

Card-IT Language Learning

What is Card-it?

Card-it is a web application for learning Italian verb morphology, in other words, Italian verb conjugations. Unlike other flashcard applications (i.e., Anki), Card-it’s offers (1) the semi-automatic creation of cards using a Finite-State Morphological (FSM) analyzer, reducing repetitive labour and human error inputting the morphological data, and (2) the possibility of classroom integration with student analytics supporting students, teachers and autonomous learners of Italian as a second language.

How was Card-it created?

Card-it was born from a collaboration between two Ph.D. students: Mariana Shimabukuro (Ontario Tech University) and Jessica Zipf (University of Konstanz). As a computational linguist, Jessica focuses on rule-based morphological tools to support second language acquisition and computer-assisted language learning (CALL). As a computer scientist, Mariana is trained in human-computer interaction (HCI); her work focuses on data-driven and learner-centred design for language learning applications to empower second language learners towards autonomy. Combining the interests and expertise of these two, Card-it features a learner-centred design providing an NLP-based approach to creating the study content and flexibility for curating and studying flashcards with informative learner feedback. Shawn Yama is also a valuable member of this project; Shawn was a research assistant who was responsible for most of the implementation of Card-it during his undergraduate studies.

Is Card-it available in other languages?

Unfortunately, Card-it only supports learners of Italian at the moment. However, Card-it features a modularized architecture which makes it easily expandable to other languages as long as we can provide it with the FSM or an extensive schema of verb forms in a different target language. Other modules in Card-it, such as its user interface, classroom, and interaction features, are applicable to any other language.

If you have the resources or interest in adapting Card-it to a different language, please contact us, and we will be happy to work with you to make it happen.

Video Presentation from EUROCALL 2021

Other language learning projects:

See more about Card-it in its demo and related publications:

Try Card-it Yourself: DEMO

Although Card-IT is still in the latter stages of development, you can try out the demo at cardit.vialab.ca by logging in using demo@email.com with the password livecardit.

Card-it at the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

M. Shimabukuro, J. Zipf, S. Yama, and C. Collins. 2023. Evaluating Classroom Potential for Card-it: Digital Flashcards for Studying and Learning Italian Morphology. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 130–136, Toronto, Canada. Association for Computational Linguistics.

Card-it Versus: Bachelors Theses 2022

S. Yama, “Card-IT Versus: A Competitive Multiplayer Game for Testing Italian Verb Morphology,” Bachelors Thesis, 2022.

Card-it Versus Screenshot

Card-it Versus is a gamified multiplayer version of Card-it. Using Card-it as the underlying system, Shawn added a module, “Card-it Versus”. In this extension, multiple players can compete for points while answering flashcard quizzes on Italian conjugations synchronously. While they compete to finish their quizzes the fastest, players are rewarded with items designed to boost their own performance or to sabotage their opponents. For example, an item can be used to erase or scramble your opponent’s letter, which may lead to them losing accuracy and points! This extension is not available in the live version of Card-it, but it exemplifies how Card-it can be expanded into adjacent projects.

Online presented talk at EuroCALL 2021

J. Zipf, M. Shimabukuro, and C. Collins, Card-IT: a Dynamic FSM-based Flashcard Generator for Learning
Italian Verb Morphology, Abstract presented at EuroCALL (online), 2021.

Extended Abstract

We report on a novel approach to training and testing Italian verb morphology by developing a flashcard application. Instead of manually curated content, this application integrates a large-scale finite-state morphological (FSM) analyzer which both analyzes a user’s input and dynamically generates specific verb forms (flashcards). FSMs are widely used in natural language processing as part of a system’s text preprocessing pipeline. Our main contribution is to leverage the FSM as the core component to implement a dynamic verb generator based on defined morphological features or return a form’s morphological analysis. Therefore, we developed Card-IT, a web-based application powered by the FSM that uses flashcards as a way for learners to utilize the analyzer in a user-friendly manner. The two-sided cards represent both functions of the FSM: analysis and generation.

Card-IT can be used to quickly analyze a form or to look up entire verb paradigms where the users (teachers or learners) can freely define morphological features, such as tense, mood, etc. Optionally, they can choose to leave any feature unspecified. Depending on the user’s selection, the application returns the corresponding flashcards, which can be saved and organized into a new or existing deck for testing and training. The organization and sorting of decks and cards allow learners to study verbs based on their individual study interests/needs e.g., one might choose to focus on subjunctive forms or past tense only. Additionally, teachers can create decks to provide their students with specific learning content and exercises.

As studies have shown, knowledge of the underlying linguistic concepts benefits the acquisition of a new language (e.g., Heift, 2004). Therefore Card-IT embeds explanations of linguistic terms (e.g., mood, conditional) using visual components, to allow learners to identify linguistic patterns and raise their metalinguistic awareness over time. Moreover, in Card-IT all linguistic terms are provided in the target language.

We plan on evaluating Card-IT with experts, Italian teachers, and implementing their feedback before evaluating it with students. At its current version, Card-IT offers three functions: (1) form analysis and look-up as mentioned above; (2) training, and (3) testing. In training using the self or teacher-curated decks generated with the help of the FSM, learners can study and learn verbs along with their inflectional forms. The testing mode consists of two different exercises: a conjugation quiz that prompts the user to type a form based on provided linguistic specification; and a tense quiz that offers a form asking the user to pick the corresponding tense out of three. Optimally, the learner may also select a mixed-mode that combines both testing exercises.

Feedback plays a crucial role in learning in that it must be both informative and motivating, yet not discouraging (Livingstone, 2012). Whenever the learner enters an incorrect verb form, the FSM the system checks whether it corresponds to any other tense/mood configurations. If so, the system reports it to the user to provide targeted feedback on errors with indications of how to improve rather than just an (in)correct message.

Publications

    [pods name="publication" id="9343" template="Publication Template (list item)" shortcodes=1] [pods name="publication" id="9157" template="Publication Template (list item)" shortcodes=1]

SentimentState: Exploring Sentiment Analysis on Twitter

Twitter feeds are a potential source of useful information regarding the state of mind of persons who are the subject of legal or medical assessment. These may include persons suspected of committing crimes or patients that arrive at a hospital for a mental health emergency, for example, attempted suicide. Messages called “tweets” can expose the state of mind of a Twitter user.  Analysts are challenged with creating reports of the online presence of users quickly and efficiently. We present a web-based visualization tool called SentimentState that performs sentiment analysis on tweets from a user’s Twitter account.

SentimentState analyses tweets based on ten emotions (positive, negative, anger, anticipation, disgust, fear, joy, sadness, surprise and trust) and creates an interactive timeline graph of the emotional state of the user. It uses a collection of emotion 24,200 word-sense pairs collected from the National Research Council of Canada (NRC). We anticipate that this interactive visualization can have applications throughout, and even beyond, legal and medical assessments, and will provide analysts with timely and relevant information regarding the mood state of clients, patients and other persons under assessment.

Check out our Online Demo and our GitHub Repository for source code related to this project.

Publications

    [pods name="publication" id="4353" template="Publication Template (list item)" shortcodes=1]

Acknowledgements

Thanks to Saif Mohammed for providing the NRC Emotion Lexicon for this project.

DocuBurst Website Now Live!

Try DocuBurst, an online document visualization tool for: Uploading your own text documents Generating interactive visual summaries Exploring keywords to uncover document themes or topics…

vialab contributions at IEEE VIS 2016

This year at the IEEE VIS Conference in Baltimore members of the lab will present papers, posters, and workshop contributions! These contributions also represent collaborations with the University…

PivotSlice

Many datasets, such as scientific literature collections, contain multiple heterogeneous facets which derive implicit relations, as well as explicit relational references between data items. The exploration of this data is challenging not only because of large data scales but also the complexity of resource structures and semantics. In this paper, we present PivotSlice, an interactive visualization technique that provides efficient faceted browsing as well as flexible capabilities to discover data relationships. With the metaphor of direct manipulation, PivotSlice allows the user to visually and logically construct a series of dynamic queries over the data, based on a multi-focus and multi-scale tabular view that subdivides the entire dataset into several meaningful parts with customized semantics. PivotSlice further facilitates the visual exploration and sensemaking process through features including live search and integration of online data, graphical interaction histories and smoothly animated visual state transitions. We evaluated PivotSlice through a qualitative lab study with university researchers and report the findings from our observations and interviews. We also demonstrate the effectiveness of PivotSlice using a scenario of exploring a repository of information visualization literature.

Check out our Github Repository for source code related to this project.

Media

Presentation Slides

Publications

    [pods name="publication" id="4380" template="Publication Template (list item)" shortcodes=1]

Acknowledgements