hn-classics/_stories/2008/5876887.md

258 lines
12 KiB
Markdown
Raw Permalink Normal View History

---
created_at: '2013-06-13T22:07:54.000Z'
title: Dont Hash Secrets (2008)
url: http://benlog.com/articles/2008/06/19/dont-hash-secrets/
author: pizza
points: 75
story_text: ''
comment_text:
num_comments: 34
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1371161274
_tags:
- story
- author_pizza
- story_5876887
objectID: '5876887'
2018-06-08 12:05:27 +00:00
year: 2008
---
2018-03-03 09:35:28 +00:00
Building secure systems is difficult. It would be nice if we had a bunch
of well-designed crypto building blocks that we could assemble in all
sorts of ways and be certain that they would, no matter what, yield a
secure system overall. There are, in fact, folks working on such things
at a theoretical level \[[Universal
Composability](http://en.wikipedia.org/wiki/Universal_composability)\].
2018-02-23 18:19:40 +00:00
2018-03-03 09:35:28 +00:00
But even if you had these building blocks, you would still have to use
them in their intended way. A component can only be secure under certain
well-defined circumstances, not for any use that happens to look
similar.
2018-02-23 18:19:40 +00:00
2018-03-03 09:35:28 +00:00
One area of secure protocol development that seems to consistently yield
poor design choices is the use of hash functions. What Im going to say
is not 100% correct, but it is on the conservative side of correct, so
if you follow the rule, you (probably) cant go wrong. You might be
considered overly paranoid, but as they say, just because youre
paranoid doesnt mean theyre not after you.
2018-02-23 18:19:40 +00:00
2018-03-03 09:35:28 +00:00
So here it is: Dont hash secrets. Never. No, sorry, I know you think
your case is special but its not. No. Stop it. Just dont do it. Youre
making the cryptographers cry.
What the heck am I talking about, you say? Ill explain. But before we
get lost in the details, just remember. Dont hash secrets. Ever.
Kapish?
#### What exactly do you mean by “Hash”?
A hash function takes any document, big or small, and creates a short
fingerprint. That gigabyte movie of your newborn baby? Hash it with
SHA1, and youve got yourself a 160 bit (~30 alphanumeric characters)
fingerprint. Now, hold on, you say, 30 characters? Youve hashed my baby
to pieces and all thats left is a measly 30 characters? No, no, dont
worry, your baby is still a unique snowflake. You cant take those 30
characters and, from them, recover your gigabyte video. This is not
uber-data-compression.
But its going to be darn hard for you to find any other document, big
or small, that hashes to the same 30 characters. In fact, its so hard,
even the most powerful computer in the world dedicated to this one task
for hundreds of years wont succeed at finding that doppelganger
document. Youve got lots of computers you say? Youre Google and you
have hundreds of thousands of computers? Yeah, well…. tough. You still
wont succeed.
In fact, you can try something that should be easier: rather than find
another document that hashes specifically to those 30 characters that
represent your baby, you can go looking for any two documents that
happen to hash to the same thing (collide). And you wont find any such
pair. Promised. We call that “collision resistance”. That thing about
how you cant find another document that hashes to the same value as
your baby video? We call that “second pre-image resistance.”
Oh, and I forgot to mention that this magical function, SHA1, is public.
Anyone can see the code. There are no secrets. Even if you see the code,
you cant find a collision. No, really, Im not screwing with you.
#### I want to hash everything\!
Thats usually the reaction after discovering the amazing power of hash
functions. There are all sorts of nails just waiting for this magical
hammer, so lets start hashing everything in sight. De-duplicating large
documents? Hash and compare\! Passwords in a database? Hash and store\!
Anonymizing names in a database? Hash and pseudonymize\!
After all, the magical power of a hash function is that you cant “go
back,” right? Given a hash, its impossible to get that pre-image, so
hash away, my magical crypto friends\!
#### Wrong.
Yeah, so its not quite that magical.
Lets say I give you a SHA1 hash value
`29b0d7c86b88565b78efffeea634cee81a209c92`. From that hash alone, you
cant tell what I hashed. But if I tell you that I hashed a password,
then all you need to do is try a bunch of common passwords and see which
one matches. In this case, I hashed “love”, one of the most common
passwords there is.
Now you start to see how this “you cant go back” reasoning fails: if
you know the domain of possible pre-images, and that domain is not too
large, then you can just try them all and see which one matches. Thats
a big strike against the “hash everything” approach.
#### Sprinkle in some Salt
It gets more interesting with the complete password use-case. Many web
developers already know that they shouldnt store user passwords in the
clear in the database, just in case a break-in reveals every users
password. So, instead of storing passwords in the clear, lets store a
SHA1 hash of the password, against which a candidate password can be
easily checked: hash it and compare.
Now the web developers who have been around the block a few times know
that, if you just apply SHA1 blindly, youve got the “small domain”
problem I just mentioned. An attacker can build up a huge dictionary of
hashed passwords just once, and, when he breaks into your web site,
check the hashes against this pre-built dictionary.
To prevent these “dictionary attacks”, we add salt to the hashing
process, so that each users password is hashed differently, and generic
attacks dont work: you have to rebuild the dictionary for each user you
choose to attack. Sprinkling in salt is easy: just concatenate the
password with a random
string:
SHA1("TheAnswerIs42" || "love") = ce75a1c90ed564a231de85d93520f1e47726df64
Then, when a user types in a password, e.g. “lvoe” (a typo), the system
checks:
SHA1("TheAnswerIs42" || "lvoe") = f832b210d62251c19a374a175bff760935c540d4
!= ce75a1c90ed564a231de85d93520f1e47726df64
and sure enough, that doesnt match, so the password is rejected.
Of course, the system has to keep the salt “TheAnswerIs42” around to
check the password, otherwise, it cant re-perform the hash. So, if an
attacker breaks in, hell find the salts, of course. This means that
salting wont protect a user with a weak password. But it will provide
better protection for users with reasonable passwords, since, even with
the salt in hand, the attacker will have to re-compute the dictionary
for each salt, and thus each user.
So the moral of the story is that hashing the secret password directly
is a bad idea.
And this is typically where most developers stand. They understand that
hashing is good, they vaguely understand the notion of salting, and they
figure that salt+hash is all they need to know. Except its not.
#### When hashing is really a signature
One interesting application of hash functions that has spread like
wildfire in the last few years is in the realm of cheap signatures.
Consider an application, **SuperAnnoyingPoke** that wants to send an
authenticated message to **MyFace**. It could apply a full digital
signature, using say RSA, so that MyFace can be sure that the message
really came from SuperAnnoyingPoke. But that actually takes milliseconds
on an average computer, and milliseconds are a lot. Plus, theres all
sorts of weird padding issues and size limitations that might require
hybrid encryption, so its messy.
But hey, lets take out our trusty cryptographic Swiss Army Knife, the
hash function\! Lets salt+hash\! Well just make sure that
SuperAnnoyingPoke and MyFace share a secret string thats a good 20
characters long or so, and when SuperAnnoyingPoke wants to send a
message to MyFace, it will also send along a “Message Authentication
Code” (MAC) that is computed as:
MAC = SHA1(secret_string || message)
MyFace can easily look at the message that is sent, recompute the MAC
given the secret string it shares with SuperAnnoyingPoke, and compare it
to the MAC sent along with the message. Heck, you can even put a
timestamp in there to make sure the message cant be re-played by an
attacker at a later date. After all, since the hash function makes it
hard to “go back” when youre using a salt (the secret string), this
should be a secure and cheap way to sign messages\! Super\!
Except this is where things really fall apart.
The security property we want here is that, if the attacker sees a
message and its corresponding MAC, then it should not be able to figure
out the MAC for a different message. Thats the whole point of a
signature. And, unfortunately, theres a property of SHA1 and lots of
other hash functions like it that make it a fast hash function, but a
terrible way to compute a MAC.
Heres the deal: if I tell you that `SHA1(foo)` is X, then it turns out,
in a lot of cases, to be quite easy for you to determine what `SHA1(foo
|| bar)` is. You dont need to know what `foo` is. Its just that,
because SHA1 is iterative and works block by block, if you know the hash
of `foo`, then you can extend the computation to determine the hash of
`foo || bar`.
Oh crap.
That means that if you know `SHA1(secret || message)`, then you can
compute `SHA1(secret || message || ANYTHING)`, which is a valid
signature for `message || ANYTHING`. So to break this system, you just
need to see one signature from SuperAnnoyingPoke, and then you can
impersonate SuperAnnoyingPoke for lots of other messages.
Why? How??? But…. I thought hash functions didnt let me “go back\!”
Well, note how I didnt say the attacker would recover the secret. Its
just that, given one hash, they can compute others for related
pre-images. Thats why you have to be careful about using hash functions
when youre hashing secrets. Another strike against using hash functions
willy-nilly.
(If youre keeping up, your next suggestion is “well, put the secret
AFTER the message, not before”. And yeah, thats a reasonable
suggestion, but it points out how youre now assuming some extra
properties of the SHA1 hash function in your design, and thats bad.
What if you upgrade to a different hash function in 5 years, do you then
have to update your protocol to match? The point is that you shouldnt
be using a hash function for this, thats not its purpose\!)
#### HMAC
What you should be using is HMAC: Hash-function Message Authentication
Code. You dont need to know exactly how it works, just like you dont
need to know exactly how SHA1 works. You just need to know that HMAC is
specifically built for message authentication codes and the use case of
SuperAnnoyingPoke/MyFace. Under the hood, whats approximately going on
is two hashes, one after the other, with the secret combined after the
first hash… but dont worry about it\! Thats the whole point\! HMAC is
built for this feature.
HMAC has two inputs and one output: in go a message, and a secret, and
out comes a message authentication code (i.e. a signature). The security
of HMAC is such that, you can see as many messages and corresponding
signatures as your heart desires, and you still wont be able to
determine the signature on a message you havent seen yet. Thats the
security property youre looking for. And HMAC is built on top of a hash
function, so more specifically you should be using HMAC-SHA1.
So, again, dont hash secrets. HMAC them.
#### In Conclusion
Theres plenty more to read if youre interested in this topic, but
chances are, youre not and you just want a recommendation. “Dont Hash
Secrets” is not always entirely necessary. In the password example, you
can hash a password as long as you salt it correctly. But do you really
want to have to worry about that? I dont. In fact, I use HMAC for my
password databases, too. Its overkill, but it lets me use a standard
library that likely makes me safer in the long run.
So the next time youre using a hash function on anything, ask yourself:
is any of the stuff Im hashing supposed to stay secret? If so, dont
hash. Instead, use HMAC.