Skip to content
ase-logo-2022-color-notag

On Passwords and Passphrases - Complexity, Length, Crackability, and Memorability - and Data Breaches

There is a lot of conflicting advice about passwords and passphrases out there. Length (minimum & maximum), complexity, whether or not to use 2FA/MFA, and more.

The classic advice ("long" & at least 3-of-4 character classes (upper case, lower case, numerals, special characters)) is finally under a long-overdue review.

Several prominent sources ranging from Bill Burr to the XKCD web comic to Bruce Schneier to SANS to NIST all now note how those guidelines and rules have led to a far worse situation than ever intended. Is this because long, complex passwords actually are "less secure" than shorter/simpler ones? Or is it because people are bad at doing/detecting randomness? Yes. Both.

1GoodPassword!

For years, a prominent pharmaceutical company used "1GoodPassword!" for all of their new Windows Server builds' admin password. It met minimum length (14+ characters) requirements, and had all 4 possible components of the "4-of-4" rule. But it is a bad password: it's easy for humans to remember, which can be useful in initial server configurations, but it is also exceptionally simple to guess/crack.

Password strength is linked most closely to length and amount of entropy encapsulated in the password. Wikipedia has an extended segment on entropy on their Password Length entry - there are ~6 bits of entropy per character in the list of upper and lower case letters plus numbers (62 possible choices per position). Every additional bit of entropy doubles the maximum time to brute-force crack a password. A 7-character password that truly utilizes random values a-z, A-Z, and 0-9 as possible values, has a maximum total entropy of about 42 bits - in other words, it would take a maximum of 242 (~1013) sequential guesses to find the password. On average, however, you only need to check half the possible search space, so a password with 42 bits of entropy would really only take about 1012 (241) guesses to find. And that is with an ideal password.

But as already mentioned, people are bad at coming up with truly-random values. Every language has certain predilections for some letters to be used very heavily, and others not so much - reflected in games such as Scrabble or letter-substitution puzzles in magazines. And most people like to pick things they think are "clever" or "smart", when they turn out not to be at all. Human-generated passwords have a "real-world" entropy well under half the theoretical maximum. In that study of over 3 million 8-character passwords, "e" was dramatically over-represented statistically - appearing in over half of the passwords, while "f" only showed-up in about 250,000 entries (a truly-random distribution should have had each letter appear about 900,000 times).

Password crackers exploit linguistic features to cut their probable search space from the possible search space. In another analysis of a 460,000-entry, English and English-like word list used for generating passphrases, "e" showed up 475,000 times (more than once per word, on average), while "q" only appeared 8100 times.

NIST estimates "real-world" entropy of human-generated passwords thus:

  • 4 bits - first character
  • 2 bits - each of next 7 characters
  • 5 bits - characters 9-20
  • 1 bit - each character over 20

So, a 14 character human-generated password (like "1GoodPassword!") only has about 27 bits of entropy - vs the statistical total entropy possibility of ~87 bits (with each of 77 characters possible in each position (as tools like  https://password.ga/ offer)) or ~83 bits that just utilizing a-z, A-Z, & 0-9 randomly offer. Practically, this means it would take less than 2 days, at only 1000 guesses per second, to happen across a "long" password created by a person that still meets the 4-of-4 requirements and 14 characters.

Passphrases

Passphrases ideally utilize random words in a memorable (to the creator) order. Infamous quotations could be used and/or incorporate unusual idiosyncrasies to help the user remember what they are ("the great pyramids at giza", for example, would be less secure than "khufu khafre menakaure sphinx" - because the former is a known place name, while the latter, still a reference to the same thing, is a list of the names of the builders and another structure.) A phrase like "hilltopping acranial grimmer", generated at random from a long word list, is better because there should be no inherent bias on the part of the user as to the contents of the passphrase.

Regardless of whether passwords or passphrases are used, a second (or third) layer of security can be added with Multifactor Authentication, often shortened to MFA.

MFA is most commonly seen as two-factor authentication (2FA) accomplished with either an authenticator app like Google Authenticator on a user's smartphone, or a token generator like a SecurID fob. The layer added with a 2FA system is that you have something you know (the password/passphrase) and something you have (the code generator - which is also time-based), so *merely* cracking a password won't be as useful to a potential attacker, since they do not have the time-sensitive code.

Additional levels of authentication can be added with biometrics (fingerprints, retinal scans, etc), smart cards, and other techniques.

What is the current best solution to the password/passphrase problem?   

  • Use randomly-generated passwords/passphrases (there are many generators available from built-in functionality in browsers like Safari to dedicated apps to web services)
  • Secure those generated passwords in a password database with a strong passphrase
  • Don't reuse them (or chunks of them) across sites and services unless (and only unless) you use MFA when you reuse them

Have I been part of a data breach?

As said by IRS Commissioner John Koskinen, your identity has probably already been stolen. Your data has probably already been breached somewhere (probably several some wheres) - Yahoo! had every account (3 billion of them) breached a few years ago, Equifax lost data on at least 145 million Americans (and many foreign nationals), Anthem was breached to the tune of ~80 million people, the US Office of Personnel Management was breached with over 20 million people's records exposed, and on and on.

If you want to see a few lists of common/probable passwords, there are many openly available online (like this one) - in short, you are not as clever as you think you are when coming up with passwords.