2018-02-23 18:58:03 +00:00
|
|
|
|
---
|
|
|
|
|
created_at: '2013-06-13T22:07:54.000Z'
|
|
|
|
|
title: Dont Hash Secrets (2008)
|
|
|
|
|
url: http://benlog.com/articles/2008/06/19/dont-hash-secrets/
|
|
|
|
|
author: pizza
|
|
|
|
|
points: 75
|
|
|
|
|
story_text: ''
|
|
|
|
|
comment_text:
|
|
|
|
|
num_comments: 34
|
|
|
|
|
story_id:
|
|
|
|
|
story_title:
|
|
|
|
|
story_url:
|
|
|
|
|
parent_id:
|
|
|
|
|
created_at_i: 1371161274
|
|
|
|
|
_tags:
|
|
|
|
|
- story
|
|
|
|
|
- author_pizza
|
|
|
|
|
- story_5876887
|
|
|
|
|
objectID: '5876887'
|
2018-06-08 12:05:27 +00:00
|
|
|
|
year: 2008
|
2018-02-23 18:58:03 +00:00
|
|
|
|
|
|
|
|
|
---
|
2018-03-03 09:35:28 +00:00
|
|
|
|
Building secure systems is difficult. It would be nice if we had a bunch
|
|
|
|
|
of well-designed crypto building blocks that we could assemble in all
|
|
|
|
|
sorts of ways and be certain that they would, no matter what, yield a
|
|
|
|
|
secure system overall. There are, in fact, folks working on such things
|
|
|
|
|
at a theoretical level \[[Universal
|
|
|
|
|
Composability](http://en.wikipedia.org/wiki/Universal_composability)\].
|
2018-02-23 18:19:40 +00:00
|
|
|
|
|
2018-03-03 09:35:28 +00:00
|
|
|
|
But even if you had these building blocks, you would still have to use
|
|
|
|
|
them in their intended way. A component can only be secure under certain
|
|
|
|
|
well-defined circumstances, not for any use that happens to look
|
|
|
|
|
similar.
|
2018-02-23 18:19:40 +00:00
|
|
|
|
|
2018-03-03 09:35:28 +00:00
|
|
|
|
One area of secure protocol development that seems to consistently yield
|
|
|
|
|
poor design choices is the use of hash functions. What I’m going to say
|
|
|
|
|
is not 100% correct, but it is on the conservative side of correct, so
|
|
|
|
|
if you follow the rule, you (probably) can’t go wrong. You might be
|
|
|
|
|
considered overly paranoid, but as they say, just because you’re
|
|
|
|
|
paranoid doesn’t mean they’re not after you.
|
2018-02-23 18:19:40 +00:00
|
|
|
|
|
2018-03-03 09:35:28 +00:00
|
|
|
|
So here it is: Don’t hash secrets. Never. No, sorry, I know you think
|
|
|
|
|
your case is special but it’s not. No. Stop it. Just don’t do it. You’re
|
|
|
|
|
making the cryptographers cry.
|
|
|
|
|
|
|
|
|
|
What the heck am I talking about, you say? I’ll explain. But before we
|
|
|
|
|
get lost in the details, just remember. Don’t hash secrets. Ever.
|
|
|
|
|
Kapish?
|
|
|
|
|
|
|
|
|
|
#### What exactly do you mean by “Hash”?
|
|
|
|
|
|
|
|
|
|
A hash function takes any document, big or small, and creates a short
|
|
|
|
|
fingerprint. That gigabyte movie of your newborn baby? Hash it with
|
|
|
|
|
SHA1, and you’ve got yourself a 160 bit (~30 alphanumeric characters)
|
|
|
|
|
fingerprint. Now, hold on, you say, 30 characters? You’ve hashed my baby
|
|
|
|
|
to pieces and all that’s left is a measly 30 characters? No, no, don’t
|
|
|
|
|
worry, your baby is still a unique snowflake. You can’t take those 30
|
|
|
|
|
characters and, from them, recover your gigabyte video. This is not
|
|
|
|
|
uber-data-compression.
|
|
|
|
|
|
|
|
|
|
But it’s going to be darn hard for you to find any other document, big
|
|
|
|
|
or small, that hashes to the same 30 characters. In fact, it’s so hard,
|
|
|
|
|
even the most powerful computer in the world dedicated to this one task
|
|
|
|
|
for hundreds of years won’t succeed at finding that doppelganger
|
|
|
|
|
document. You’ve got lots of computers you say? You’re Google and you
|
|
|
|
|
have hundreds of thousands of computers? Yeah, well…. tough. You still
|
|
|
|
|
won’t succeed.
|
|
|
|
|
|
|
|
|
|
In fact, you can try something that should be easier: rather than find
|
|
|
|
|
another document that hashes specifically to those 30 characters that
|
|
|
|
|
represent your baby, you can go looking for any two documents that
|
|
|
|
|
happen to hash to the same thing (collide). And you won’t find any such
|
|
|
|
|
pair. Promised. We call that “collision resistance”. That thing about
|
|
|
|
|
how you can’t find another document that hashes to the same value as
|
|
|
|
|
your baby video? We call that “second pre-image resistance.”
|
|
|
|
|
|
|
|
|
|
Oh, and I forgot to mention that this magical function, SHA1, is public.
|
|
|
|
|
Anyone can see the code. There are no secrets. Even if you see the code,
|
|
|
|
|
you can’t find a collision. No, really, I’m not screwing with you.
|
|
|
|
|
|
|
|
|
|
#### I want to hash everything\!
|
|
|
|
|
|
|
|
|
|
That’s usually the reaction after discovering the amazing power of hash
|
|
|
|
|
functions. There are all sorts of nails just waiting for this magical
|
|
|
|
|
hammer, so let’s start hashing everything in sight. De-duplicating large
|
|
|
|
|
documents? Hash and compare\! Passwords in a database? Hash and store\!
|
|
|
|
|
Anonymizing names in a database? Hash and pseudonymize\!
|
|
|
|
|
|
|
|
|
|
After all, the magical power of a hash function is that you can’t “go
|
|
|
|
|
back,” right? Given a hash, it’s impossible to get that pre-image, so
|
|
|
|
|
hash away, my magical crypto friends\!
|
|
|
|
|
|
|
|
|
|
#### Wrong.
|
|
|
|
|
|
|
|
|
|
Yeah, so it’s not quite that magical.
|
|
|
|
|
|
|
|
|
|
Let’s say I give you a SHA1 hash value
|
|
|
|
|
`29b0d7c86b88565b78efffeea634cee81a209c92`. From that hash alone, you
|
|
|
|
|
can’t tell what I hashed. But if I tell you that I hashed a password,
|
|
|
|
|
then all you need to do is try a bunch of common passwords and see which
|
|
|
|
|
one matches. In this case, I hashed “love”, one of the most common
|
|
|
|
|
passwords there is.
|
|
|
|
|
|
|
|
|
|
Now you start to see how this “you can’t go back” reasoning fails: if
|
|
|
|
|
you know the domain of possible pre-images, and that domain is not too
|
|
|
|
|
large, then you can just try them all and see which one matches. That’s
|
|
|
|
|
a big strike against the “hash everything” approach.
|
|
|
|
|
|
|
|
|
|
#### Sprinkle in some Salt
|
|
|
|
|
|
|
|
|
|
It gets more interesting with the complete password use-case. Many web
|
|
|
|
|
developers already know that they shouldn’t store user passwords in the
|
|
|
|
|
clear in the database, just in case a break-in reveals every user’s
|
|
|
|
|
password. So, instead of storing passwords in the clear, let’s store a
|
|
|
|
|
SHA1 hash of the password, against which a candidate password can be
|
|
|
|
|
easily checked: hash it and compare.
|
|
|
|
|
|
|
|
|
|
Now the web developers who have been around the block a few times know
|
|
|
|
|
that, if you just apply SHA1 blindly, you’ve got the “small domain”
|
|
|
|
|
problem I just mentioned. An attacker can build up a huge dictionary of
|
|
|
|
|
hashed passwords just once, and, when he breaks into your web site,
|
|
|
|
|
check the hashes against this pre-built dictionary.
|
|
|
|
|
|
|
|
|
|
To prevent these “dictionary attacks”, we add salt to the hashing
|
|
|
|
|
process, so that each user’s password is hashed differently, and generic
|
|
|
|
|
attacks don’t work: you have to rebuild the dictionary for each user you
|
|
|
|
|
choose to attack. Sprinkling in salt is easy: just concatenate the
|
|
|
|
|
password with a random
|
|
|
|
|
string:
|
|
|
|
|
|
|
|
|
|
SHA1("TheAnswerIs42" || "love") = ce75a1c90ed564a231de85d93520f1e47726df64
|
|
|
|
|
|
|
|
|
|
Then, when a user types in a password, e.g. “lvoe” (a typo), the system
|
|
|
|
|
checks:
|
|
|
|
|
|
|
|
|
|
SHA1("TheAnswerIs42" || "lvoe") = f832b210d62251c19a374a175bff760935c540d4
|
|
|
|
|
!= ce75a1c90ed564a231de85d93520f1e47726df64
|
|
|
|
|
|
|
|
|
|
and sure enough, that doesn’t match, so the password is rejected.
|
|
|
|
|
|
|
|
|
|
Of course, the system has to keep the salt “TheAnswerIs42” around to
|
|
|
|
|
check the password, otherwise, it can’t re-perform the hash. So, if an
|
|
|
|
|
attacker breaks in, he’ll find the salts, of course. This means that
|
|
|
|
|
salting won’t protect a user with a weak password. But it will provide
|
|
|
|
|
better protection for users with reasonable passwords, since, even with
|
|
|
|
|
the salt in hand, the attacker will have to re-compute the dictionary
|
|
|
|
|
for each salt, and thus each user.
|
|
|
|
|
|
|
|
|
|
So the moral of the story is that hashing the secret password directly
|
|
|
|
|
is a bad idea.
|
|
|
|
|
|
|
|
|
|
And this is typically where most developers stand. They understand that
|
|
|
|
|
hashing is good, they vaguely understand the notion of salting, and they
|
|
|
|
|
figure that salt+hash is all they need to know. Except it’s not.
|
|
|
|
|
|
|
|
|
|
#### When hashing is really a signature
|
|
|
|
|
|
|
|
|
|
One interesting application of hash functions that has spread like
|
|
|
|
|
wildfire in the last few years is in the realm of cheap signatures.
|
|
|
|
|
Consider an application, **SuperAnnoyingPoke** that wants to send an
|
|
|
|
|
authenticated message to **MyFace**. It could apply a full digital
|
|
|
|
|
signature, using say RSA, so that MyFace can be sure that the message
|
|
|
|
|
really came from SuperAnnoyingPoke. But that actually takes milliseconds
|
|
|
|
|
on an average computer, and milliseconds are a lot. Plus, there’s all
|
|
|
|
|
sorts of weird padding issues and size limitations that might require
|
|
|
|
|
hybrid encryption, so it’s messy.
|
|
|
|
|
|
|
|
|
|
But hey, let’s take out our trusty cryptographic Swiss Army Knife, the
|
|
|
|
|
hash function\! Let’s salt+hash\! We’ll just make sure that
|
|
|
|
|
SuperAnnoyingPoke and MyFace share a secret string that’s a good 20
|
|
|
|
|
characters long or so, and when SuperAnnoyingPoke wants to send a
|
|
|
|
|
message to MyFace, it will also send along a “Message Authentication
|
|
|
|
|
Code” (MAC) that is computed as:
|
|
|
|
|
|
|
|
|
|
MAC = SHA1(secret_string || message)
|
|
|
|
|
|
|
|
|
|
MyFace can easily look at the message that is sent, recompute the MAC
|
|
|
|
|
given the secret string it shares with SuperAnnoyingPoke, and compare it
|
|
|
|
|
to the MAC sent along with the message. Heck, you can even put a
|
|
|
|
|
timestamp in there to make sure the message can’t be re-played by an
|
|
|
|
|
attacker at a later date. After all, since the hash function makes it
|
|
|
|
|
hard to “go back” when you’re using a salt (the secret string), this
|
|
|
|
|
should be a secure and cheap way to sign messages\! Super\!
|
|
|
|
|
|
|
|
|
|
Except this is where things really fall apart.
|
|
|
|
|
|
|
|
|
|
The security property we want here is that, if the attacker sees a
|
|
|
|
|
message and its corresponding MAC, then it should not be able to figure
|
|
|
|
|
out the MAC for a different message. That’s the whole point of a
|
|
|
|
|
signature. And, unfortunately, there’s a property of SHA1 and lots of
|
|
|
|
|
other hash functions like it that make it a fast hash function, but a
|
|
|
|
|
terrible way to compute a MAC.
|
|
|
|
|
|
|
|
|
|
Here’s the deal: if I tell you that `SHA1(foo)` is X, then it turns out,
|
|
|
|
|
in a lot of cases, to be quite easy for you to determine what `SHA1(foo
|
|
|
|
|
|| bar)` is. You don’t need to know what `foo` is. It’s just that,
|
|
|
|
|
because SHA1 is iterative and works block by block, if you know the hash
|
|
|
|
|
of `foo`, then you can extend the computation to determine the hash of
|
|
|
|
|
`foo || bar`.
|
|
|
|
|
|
|
|
|
|
Oh crap.
|
|
|
|
|
|
|
|
|
|
That means that if you know `SHA1(secret || message)`, then you can
|
|
|
|
|
compute `SHA1(secret || message || ANYTHING)`, which is a valid
|
|
|
|
|
signature for `message || ANYTHING`. So to break this system, you just
|
|
|
|
|
need to see one signature from SuperAnnoyingPoke, and then you can
|
|
|
|
|
impersonate SuperAnnoyingPoke for lots of other messages.
|
|
|
|
|
|
|
|
|
|
Why? How??? But…. I thought hash functions didn’t let me “go back\!”
|
|
|
|
|
Well, note how I didn’t say the attacker would recover the secret. It’s
|
|
|
|
|
just that, given one hash, they can compute others for related
|
|
|
|
|
pre-images. That’s why you have to be careful about using hash functions
|
|
|
|
|
when you’re hashing secrets. Another strike against using hash functions
|
|
|
|
|
willy-nilly.
|
|
|
|
|
|
|
|
|
|
(If you’re keeping up, your next suggestion is “well, put the secret
|
|
|
|
|
AFTER the message, not before”. And yeah, that’s a reasonable
|
|
|
|
|
suggestion, but it points out how you’re now assuming some extra
|
|
|
|
|
properties of the SHA1 hash function in your design, and that’s bad.
|
|
|
|
|
What if you upgrade to a different hash function in 5 years, do you then
|
|
|
|
|
have to update your protocol to match? The point is that you shouldn’t
|
|
|
|
|
be using a hash function for this, that’s not its purpose\!)
|
|
|
|
|
|
|
|
|
|
#### HMAC
|
|
|
|
|
|
|
|
|
|
What you should be using is HMAC: Hash-function Message Authentication
|
|
|
|
|
Code. You don’t need to know exactly how it works, just like you don’t
|
|
|
|
|
need to know exactly how SHA1 works. You just need to know that HMAC is
|
|
|
|
|
specifically built for message authentication codes and the use case of
|
|
|
|
|
SuperAnnoyingPoke/MyFace. Under the hood, what’s approximately going on
|
|
|
|
|
is two hashes, one after the other, with the secret combined after the
|
|
|
|
|
first hash… but don’t worry about it\! That’s the whole point\! HMAC is
|
|
|
|
|
built for this feature.
|
|
|
|
|
|
|
|
|
|
HMAC has two inputs and one output: in go a message, and a secret, and
|
|
|
|
|
out comes a message authentication code (i.e. a signature). The security
|
|
|
|
|
of HMAC is such that, you can see as many messages and corresponding
|
|
|
|
|
signatures as your heart desires, and you still won’t be able to
|
|
|
|
|
determine the signature on a message you haven’t seen yet. That’s the
|
|
|
|
|
security property you’re looking for. And HMAC is built on top of a hash
|
|
|
|
|
function, so more specifically you should be using HMAC-SHA1.
|
|
|
|
|
|
|
|
|
|
So, again, don’t hash secrets. HMAC them.
|
|
|
|
|
|
|
|
|
|
#### In Conclusion
|
|
|
|
|
|
|
|
|
|
There’s plenty more to read if you’re interested in this topic, but
|
|
|
|
|
chances are, you’re not and you just want a recommendation. “Don’t Hash
|
|
|
|
|
Secrets” is not always entirely necessary. In the password example, you
|
|
|
|
|
can hash a password as long as you salt it correctly. But do you really
|
|
|
|
|
want to have to worry about that? I don’t. In fact, I use HMAC for my
|
|
|
|
|
password databases, too. It’s overkill, but it lets me use a standard
|
|
|
|
|
library that likely makes me safer in the long run.
|
|
|
|
|
|
|
|
|
|
So the next time you’re using a hash function on anything, ask yourself:
|
|
|
|
|
is any of the stuff I’m hashing supposed to stay secret? If so, don’t
|
|
|
|
|
hash. Instead, use HMAC.
|