hn-classics/_stories/2008/5876887.md

---
created_at: '2013-06-13T22:07:54.000Z'
title: Dont Hash Secrets (2008)
url: http://benlog.com/articles/2008/06/19/dont-hash-secrets/
author: pizza
points: 75
story_text: ''
comment_text: 
num_comments: 34
story_id: 
story_title: 
story_url: 
parent_id: 
created_at_i: 1371161274
_tags:
- story
- author_pizza
- story_5876887
objectID: '5876887'
year: 2008

---
Building secure systems is difficult. It would be nice if we had a bunch
of well-designed crypto building blocks that we could assemble in all
sorts of ways and be certain that they would, no matter what, yield a
secure system overall. There are, in fact, folks working on such things
at a theoretical level \[[Universal
Composability](http://en.wikipedia.org/wiki/Universal_composability)\].

But even if you had these building blocks, you would still have to use
them in their intended way. A component can only be secure under certain
well-defined circumstances, not for any use that happens to look
similar.

One area of secure protocol development that seems to consistently yield
poor design choices is the use of hash functions. What I’m going to say
is not 100% correct, but it is on the conservative side of correct, so
if you follow the rule, you (probably) can’t go wrong. You might be
considered overly paranoid, but as they say, just because you’re
paranoid doesn’t mean they’re not after you.

So here it is: Don’t hash secrets. Never. No, sorry, I know you think
your case is special but it’s not. No. Stop it. Just don’t do it. You’re
making the cryptographers cry.

What the heck am I talking about, you say? I’ll explain. But before we
get lost in the details, just remember. Don’t hash secrets. Ever.
Kapish?

#### What exactly do you mean by “Hash”?

A hash function takes any document, big or small, and creates a short
fingerprint. That gigabyte movie of your newborn baby? Hash it with
SHA1, and you’ve got yourself a 160 bit (~30 alphanumeric characters)
fingerprint. Now, hold on, you say, 30 characters? You’ve hashed my baby
to pieces and all that’s left is a measly 30 characters? No, no, don’t
worry, your baby is still a unique snowflake. You can’t take those 30
characters and, from them, recover your gigabyte video. This is not
uber-data-compression.

But it’s going to be darn hard for you to find any other document, big
or small, that hashes to the same 30 characters. In fact, it’s so hard,
even the most powerful computer in the world dedicated to this one task
for hundreds of years won’t succeed at finding that doppelganger
document. You’ve got lots of computers you say? You’re Google and you
have hundreds of thousands of computers? Yeah, well…. tough. You still
won’t succeed.

In fact, you can try something that should be easier: rather than find
another document that hashes specifically to those 30 characters that
represent your baby, you can go looking for any two documents that
happen to hash to the same thing (collide). And you won’t find any such
pair. Promised. We call that “collision resistance”. That thing about
how you can’t find another document that hashes to the same value as
your baby video? We call that “second pre-image resistance.”

Oh, and I forgot to mention that this magical function, SHA1, is public.
Anyone can see the code. There are no secrets. Even if you see the code,
you can’t find a collision. No, really, I’m not screwing with you.

#### I want to hash everything\!

That’s usually the reaction after discovering the amazing power of hash
functions. There are all sorts of nails just waiting for this magical
hammer, so let’s start hashing everything in sight. De-duplicating large
documents? Hash and compare\! Passwords in a database? Hash and store\!
Anonymizing names in a database? Hash and pseudonymize\!

After all, the magical power of a hash function is that you can’t “go
back,” right? Given a hash, it’s impossible to get that pre-image, so
hash away, my magical crypto friends\!

#### Wrong.

Yeah, so it’s not quite that magical.

Let’s say I give you a SHA1 hash value
`29b0d7c86b88565b78efffeea634cee81a209c92`. From that hash alone, you
can’t tell what I hashed. But if I tell you that I hashed a password,
then all you need to do is try a bunch of common passwords and see which
one matches. In this case, I hashed “love”, one of the most common
passwords there is.

Now you start to see how this “you can’t go back” reasoning fails: if
you know the domain of possible pre-images, and that domain is not too
large, then you can just try them all and see which one matches. That’s
a big strike against the “hash everything” approach.

#### Sprinkle in some Salt

It gets more interesting with the complete password use-case. Many web
developers already know that they shouldn’t store user passwords in the
clear in the database, just in case a break-in reveals every user’s
password. So, instead of storing passwords in the clear, let’s store a
SHA1 hash of the password, against which a candidate password can be
easily checked: hash it and compare.

Now the web developers who have been around the block a few times know
that, if you just apply SHA1 blindly, you’ve got the “small domain”
problem I just mentioned. An attacker can build up a huge dictionary of
hashed passwords just once, and, when he breaks into your web site,
check the hashes against this pre-built dictionary.

To prevent these “dictionary attacks”, we add salt to the hashing
process, so that each user’s password is hashed differently, and generic
attacks don’t work: you have to rebuild the dictionary for each user you
choose to attack. Sprinkling in salt is easy: just concatenate the
password with a random
    string:

    SHA1("TheAnswerIs42" || "love") = ce75a1c90ed564a231de85d93520f1e47726df64

Then, when a user types in a password, e.g. “lvoe” (a typo), the system
checks:

    SHA1("TheAnswerIs42" || "lvoe") = f832b210d62251c19a374a175bff760935c540d4
                                   != ce75a1c90ed564a231de85d93520f1e47726df64

and sure enough, that doesn’t match, so the password is rejected.

Of course, the system has to keep the salt “TheAnswerIs42” around to
check the password, otherwise, it can’t re-perform the hash. So, if an
attacker breaks in, he’ll find the salts, of course. This means that
salting won’t protect a user with a weak password. But it will provide
better protection for users with reasonable passwords, since, even with
the salt in hand, the attacker will have to re-compute the dictionary
for each salt, and thus each user.

So the moral of the story is that hashing the secret password directly
is a bad idea.

And this is typically where most developers stand. They understand that
hashing is good, they vaguely understand the notion of salting, and they
figure that salt+hash is all they need to know. Except it’s not.

#### When hashing is really a signature

One interesting application of hash functions that has spread like
wildfire in the last few years is in the realm of cheap signatures.
Consider an application, **SuperAnnoyingPoke** that wants to send an
authenticated message to **MyFace**. It could apply a full digital
signature, using say RSA, so that MyFace can be sure that the message
really came from SuperAnnoyingPoke. But that actually takes milliseconds
on an average computer, and milliseconds are a lot. Plus, there’s all
sorts of weird padding issues and size limitations that might require
hybrid encryption, so it’s messy.

But hey, let’s take out our trusty cryptographic Swiss Army Knife, the
hash function\! Let’s salt+hash\! We’ll just make sure that
SuperAnnoyingPoke and MyFace share a secret string that’s a good 20
characters long or so, and when SuperAnnoyingPoke wants to send a
message to MyFace, it will also send along a “Message Authentication
Code” (MAC) that is computed as:

    MAC = SHA1(secret_string || message)

MyFace can easily look at the message that is sent, recompute the MAC
given the secret string it shares with SuperAnnoyingPoke, and compare it
to the MAC sent along with the message. Heck, you can even put a
timestamp in there to make sure the message can’t be re-played by an
attacker at a later date. After all, since the hash function makes it
hard to “go back” when you’re using a salt (the secret string), this
should be a secure and cheap way to sign messages\! Super\!

Except this is where things really fall apart.

The security property we want here is that, if the attacker sees a
message and its corresponding MAC, then it should not be able to figure
out the MAC for a different message. That’s the whole point of a
signature. And, unfortunately, there’s a property of SHA1 and lots of
other hash functions like it that make it a fast hash function, but a
terrible way to compute a MAC.

Here’s the deal: if I tell you that `SHA1(foo)` is X, then it turns out,
in a lot of cases, to be quite easy for you to determine what `SHA1(foo
|| bar)` is. You don’t need to know what `foo` is. It’s just that,
because SHA1 is iterative and works block by block, if you know the hash
of `foo`, then you can extend the computation to determine the hash of
`foo || bar`.

Oh crap.

That means that if you know `SHA1(secret || message)`, then you can
compute `SHA1(secret || message || ANYTHING)`, which is a valid
signature for `message || ANYTHING`. So to break this system, you just
need to see one signature from SuperAnnoyingPoke, and then you can
impersonate SuperAnnoyingPoke for lots of other messages.

Why? How??? But…. I thought hash functions didn’t let me “go back\!”
Well, note how I didn’t say the attacker would recover the secret. It’s
just that, given one hash, they can compute others for related
pre-images. That’s why you have to be careful about using hash functions
when you’re hashing secrets. Another strike against using hash functions
willy-nilly.

(If you’re keeping up, your next suggestion is “well, put the secret
AFTER the message, not before”. And yeah, that’s a reasonable
suggestion, but it points out how you’re now assuming some extra
properties of the SHA1 hash function in your design, and that’s bad.
What if you upgrade to a different hash function in 5 years, do you then
have to update your protocol to match? The point is that you shouldn’t
be using a hash function for this, that’s not its purpose\!)

#### HMAC

What you should be using is HMAC: Hash-function Message Authentication
Code. You don’t need to know exactly how it works, just like you don’t
need to know exactly how SHA1 works. You just need to know that HMAC is
specifically built for message authentication codes and the use case of
SuperAnnoyingPoke/MyFace. Under the hood, what’s approximately going on
is two hashes, one after the other, with the secret combined after the
first hash… but don’t worry about it\! That’s the whole point\! HMAC is
built for this feature.

HMAC has two inputs and one output: in go a message, and a secret, and
out comes a message authentication code (i.e. a signature). The security
of HMAC is such that, you can see as many messages and corresponding
signatures as your heart desires, and you still won’t be able to
determine the signature on a message you haven’t seen yet. That’s the
security property you’re looking for. And HMAC is built on top of a hash
function, so more specifically you should be using HMAC-SHA1.

So, again, don’t hash secrets. HMAC them.

#### In Conclusion

There’s plenty more to read if you’re interested in this topic, but
chances are, you’re not and you just want a recommendation. “Don’t Hash
Secrets” is not always entirely necessary. In the password example, you
can hash a password as long as you salt it correctly. But do you really
want to have to worry about that? I don’t. In fact, I use HMAC for my
password databases, too. It’s overkill, but it lets me use a standard
library that likely makes me safer in the long run.

So the next time you’re using a hash function on anything, ask yourself:
is any of the stuff I’m hashing supposed to stay secret? If so, don’t
hash. Instead, use HMAC.