Known Weak Passwords

There are already plenty of articles on how to choose a good password. In this article, we will look at the other end of the question: How to identify the most terrible passwords.

The requirements and my problems with common implementations

In our article on password rules we already mentioned NIST Special Publication 800-63B. The section about which I want to write about here is the following:

When processing requests to establish and change memorized secrets, verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised. For example, the list MAY include, but is not limited to:
Passwords obtained from previous breach corpuses.
Dictionary words.
Repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’).
Context-specific words, such as the name of the service, the username, and derivatives thereof.

This recommendation was published in 2017 and marked a sharp reversal in how applications should handle user-provided secrets. Although not yet universally adopted, an increasing number of vendors and applications are abandoning the old style arbitrary complexity rules and implementing the NIST recommendations.

As the recommendation is rather vague on implementation details (note that the bullet points are just one possible approach), there are various interpretations. Examples include zxcvbn, which is used by WordPress and provides a password strength oracle, and Microsfts own global banned password list for Azure AD, which they do not publish.

However, most implementations probably follow more or less the suggestion given by NIST itself. Therefore, let’s look at the first two suggestions:

Passwords obtained from previous breach corpuses.
Dictionary words.

Passwords from breach corpuses (meaning passwords that have been leaked in previous attacks) are compromised and dictionary words are expected. It seems obvious that both would be bad choices for passwords (and they are!). But why exactly?

Because they are used by attackers in their attacks. It’s as simple as that. Attackers can either try every possible combination of letters, numbers and special characters (known as a brute force attack), or they can compile a list of likely passwords from strings that are known to be used as passwords: breach corpuses and dictionary words.

The two suggestions are therefore attempts to use knowledge about the methods of attackers to approximate the passwords commonly-used by attackers.

This leads to the main question af this article:

Can we do it better?

The first step is to understand what passwords attackers are actually using. At Lutra Security we do this by running honeypots and monitoring the passwords of login attempts. After a while this leads to a huge list of passwords that are definitely commonly-used¹.

Is this the list we want? Let’s take a look at last year’s top 10:

root
1234
123456
password
admin
toor
12345
123
qwerty
1

These passwords are undoubtedly terrible, but if we combine them with a minimum length of 8 characters (which we should definitely use to protect against a simple brute-force attack), only password would be allowed anyway. Applying this to the list, we can significantly reduce the size of the list and it starts to look like this:

password
12345678
123456789
admin123
6uPF5Cofvyjcew9
1q2w3e4r
changeme
raspberry
abcd1234
q1w2e3r4

If we look a little further down, we start seeing a lot of passwords like this:

P@ssw0rd
p@ssw0rd

Both passwords are variations of password, derived by replacing letters with special characters or numbers and altering case. These are common methods used by attackers to generate their password lists. By reversing these manipulations (and reapplying them before checking a password against the list), we could further reduce the size of the list. However, this would require us to make assumptions about which manipulations are common and add complexity to the check, which I would like to avoid.

Nevertheless, I decided to apply one particular transformation: The case. Converting all passwords to lowercase reduces the size of the list by 10-20% and is easy to implement during the check (just convert the password to lowercase before comparing it to the list).

We now have a list of the most terrible choices for passwords. Which is exactly what we were looking for if we were going to implement NIST’s password recommendations.

Where to get a good bad passwords list?

At Lutra Security we run various honeypots that give us insight into passwords used by attackers. From this we have compiled a list of 20 000 commonly-used passwords that are at least 8 characters long.

We provide this list free of charge (but we do ask for your email – we need to do some marketing too 😉):

The confirmation email you receive after subscribing will include a link to the newsletter dashboard where you’ll find the download link.

Is this all I should do?

No. As discussed above, this list only works in combination with the requirement for passwords to be at least 8 characters long. And checking against breached passwords is still a good idea, as these passwords are out there and an attacker can use them. This is especially true if the attacker realizes that their favorite list of passwords does not work on your site anymore.

If you check against a static list of 20 000 passwords, I am convinced that our list is better than any list derived from breach corpuses. However, HIBP contains hundreds of millions of passwords. Take the best of both worlds and check against both if possible.

We regularly publish the 1000 most commonly seen passwords. ↩︎

And how to avoid them

The requirements and my problems with common implementations

Can we do it better?

Where to get a good bad passwords list?

Is this all I should do?

Konstantin Weddige