hn-classics/_stories/2008/5876887.md

12 KiB
Raw Permalink Blame History

created_at title url author points story_text comment_text num_comments story_id story_title story_url parent_id created_at_i _tags objectID year
2013-06-13T22:07:54.000Z Dont Hash Secrets (2008) http://benlog.com/articles/2008/06/19/dont-hash-secrets/ pizza 75 34 1371161274
story
author_pizza
story_5876887
5876887 2008

Building secure systems is difficult. It would be nice if we had a bunch of well-designed crypto building blocks that we could assemble in all sorts of ways and be certain that they would, no matter what, yield a secure system overall. There are, in fact, folks working on such things at a theoretical level [Universal Composability].

But even if you had these building blocks, you would still have to use them in their intended way. A component can only be secure under certain well-defined circumstances, not for any use that happens to look similar.

One area of secure protocol development that seems to consistently yield poor design choices is the use of hash functions. What Im going to say is not 100% correct, but it is on the conservative side of correct, so if you follow the rule, you (probably) cant go wrong. You might be considered overly paranoid, but as they say, just because youre paranoid doesnt mean theyre not after you.

So here it is: Dont hash secrets. Never. No, sorry, I know you think your case is special but its not. No. Stop it. Just dont do it. Youre making the cryptographers cry.

What the heck am I talking about, you say? Ill explain. But before we get lost in the details, just remember. Dont hash secrets. Ever. Kapish?

What exactly do you mean by “Hash”?

A hash function takes any document, big or small, and creates a short fingerprint. That gigabyte movie of your newborn baby? Hash it with SHA1, and youve got yourself a 160 bit (~30 alphanumeric characters) fingerprint. Now, hold on, you say, 30 characters? Youve hashed my baby to pieces and all thats left is a measly 30 characters? No, no, dont worry, your baby is still a unique snowflake. You cant take those 30 characters and, from them, recover your gigabyte video. This is not uber-data-compression.

But its going to be darn hard for you to find any other document, big or small, that hashes to the same 30 characters. In fact, its so hard, even the most powerful computer in the world dedicated to this one task for hundreds of years wont succeed at finding that doppelganger document. Youve got lots of computers you say? Youre Google and you have hundreds of thousands of computers? Yeah, well…. tough. You still wont succeed.

In fact, you can try something that should be easier: rather than find another document that hashes specifically to those 30 characters that represent your baby, you can go looking for any two documents that happen to hash to the same thing (collide). And you wont find any such pair. Promised. We call that “collision resistance”. That thing about how you cant find another document that hashes to the same value as your baby video? We call that “second pre-image resistance.”

Oh, and I forgot to mention that this magical function, SHA1, is public. Anyone can see the code. There are no secrets. Even if you see the code, you cant find a collision. No, really, Im not screwing with you.

I want to hash everything!

Thats usually the reaction after discovering the amazing power of hash functions. There are all sorts of nails just waiting for this magical hammer, so lets start hashing everything in sight. De-duplicating large documents? Hash and compare! Passwords in a database? Hash and store! Anonymizing names in a database? Hash and pseudonymize!

After all, the magical power of a hash function is that you cant “go back,” right? Given a hash, its impossible to get that pre-image, so hash away, my magical crypto friends!

Wrong.

Yeah, so its not quite that magical.

Lets say I give you a SHA1 hash value 29b0d7c86b88565b78efffeea634cee81a209c92. From that hash alone, you cant tell what I hashed. But if I tell you that I hashed a password, then all you need to do is try a bunch of common passwords and see which one matches. In this case, I hashed “love”, one of the most common passwords there is.

Now you start to see how this “you cant go back” reasoning fails: if you know the domain of possible pre-images, and that domain is not too large, then you can just try them all and see which one matches. Thats a big strike against the “hash everything” approach.

Sprinkle in some Salt

It gets more interesting with the complete password use-case. Many web developers already know that they shouldnt store user passwords in the clear in the database, just in case a break-in reveals every users password. So, instead of storing passwords in the clear, lets store a SHA1 hash of the password, against which a candidate password can be easily checked: hash it and compare.

Now the web developers who have been around the block a few times know that, if you just apply SHA1 blindly, youve got the “small domain” problem I just mentioned. An attacker can build up a huge dictionary of hashed passwords just once, and, when he breaks into your web site, check the hashes against this pre-built dictionary.

To prevent these “dictionary attacks”, we add salt to the hashing process, so that each users password is hashed differently, and generic attacks dont work: you have to rebuild the dictionary for each user you choose to attack. Sprinkling in salt is easy: just concatenate the password with a random string:

SHA1("TheAnswerIs42" || "love") = ce75a1c90ed564a231de85d93520f1e47726df64

Then, when a user types in a password, e.g. “lvoe” (a typo), the system checks:

SHA1("TheAnswerIs42" || "lvoe") = f832b210d62251c19a374a175bff760935c540d4
                               != ce75a1c90ed564a231de85d93520f1e47726df64

and sure enough, that doesnt match, so the password is rejected.

Of course, the system has to keep the salt “TheAnswerIs42” around to check the password, otherwise, it cant re-perform the hash. So, if an attacker breaks in, hell find the salts, of course. This means that salting wont protect a user with a weak password. But it will provide better protection for users with reasonable passwords, since, even with the salt in hand, the attacker will have to re-compute the dictionary for each salt, and thus each user.

So the moral of the story is that hashing the secret password directly is a bad idea.

And this is typically where most developers stand. They understand that hashing is good, they vaguely understand the notion of salting, and they figure that salt+hash is all they need to know. Except its not.

When hashing is really a signature

One interesting application of hash functions that has spread like wildfire in the last few years is in the realm of cheap signatures. Consider an application, SuperAnnoyingPoke that wants to send an authenticated message to MyFace. It could apply a full digital signature, using say RSA, so that MyFace can be sure that the message really came from SuperAnnoyingPoke. But that actually takes milliseconds on an average computer, and milliseconds are a lot. Plus, theres all sorts of weird padding issues and size limitations that might require hybrid encryption, so its messy.

But hey, lets take out our trusty cryptographic Swiss Army Knife, the hash function! Lets salt+hash! Well just make sure that SuperAnnoyingPoke and MyFace share a secret string thats a good 20 characters long or so, and when SuperAnnoyingPoke wants to send a message to MyFace, it will also send along a “Message Authentication Code” (MAC) that is computed as:

MAC = SHA1(secret_string || message)

MyFace can easily look at the message that is sent, recompute the MAC given the secret string it shares with SuperAnnoyingPoke, and compare it to the MAC sent along with the message. Heck, you can even put a timestamp in there to make sure the message cant be re-played by an attacker at a later date. After all, since the hash function makes it hard to “go back” when youre using a salt (the secret string), this should be a secure and cheap way to sign messages! Super!

Except this is where things really fall apart.

The security property we want here is that, if the attacker sees a message and its corresponding MAC, then it should not be able to figure out the MAC for a different message. Thats the whole point of a signature. And, unfortunately, theres a property of SHA1 and lots of other hash functions like it that make it a fast hash function, but a terrible way to compute a MAC.

Heres the deal: if I tell you that SHA1(foo) is X, then it turns out, in a lot of cases, to be quite easy for you to determine what SHA1(foo || bar) is. You dont need to know what foo is. Its just that, because SHA1 is iterative and works block by block, if you know the hash of foo, then you can extend the computation to determine the hash of foo || bar.

Oh crap.

That means that if you know SHA1(secret || message), then you can compute SHA1(secret || message || ANYTHING), which is a valid signature for message || ANYTHING. So to break this system, you just need to see one signature from SuperAnnoyingPoke, and then you can impersonate SuperAnnoyingPoke for lots of other messages.

Why? How??? But…. I thought hash functions didnt let me “go back!” Well, note how I didnt say the attacker would recover the secret. Its just that, given one hash, they can compute others for related pre-images. Thats why you have to be careful about using hash functions when youre hashing secrets. Another strike against using hash functions willy-nilly.

(If youre keeping up, your next suggestion is “well, put the secret AFTER the message, not before”. And yeah, thats a reasonable suggestion, but it points out how youre now assuming some extra properties of the SHA1 hash function in your design, and thats bad. What if you upgrade to a different hash function in 5 years, do you then have to update your protocol to match? The point is that you shouldnt be using a hash function for this, thats not its purpose!)

HMAC

What you should be using is HMAC: Hash-function Message Authentication Code. You dont need to know exactly how it works, just like you dont need to know exactly how SHA1 works. You just need to know that HMAC is specifically built for message authentication codes and the use case of SuperAnnoyingPoke/MyFace. Under the hood, whats approximately going on is two hashes, one after the other, with the secret combined after the first hash… but dont worry about it! Thats the whole point! HMAC is built for this feature.

HMAC has two inputs and one output: in go a message, and a secret, and out comes a message authentication code (i.e. a signature). The security of HMAC is such that, you can see as many messages and corresponding signatures as your heart desires, and you still wont be able to determine the signature on a message you havent seen yet. Thats the security property youre looking for. And HMAC is built on top of a hash function, so more specifically you should be using HMAC-SHA1.

So, again, dont hash secrets. HMAC them.

In Conclusion

Theres plenty more to read if youre interested in this topic, but chances are, youre not and you just want a recommendation. “Dont Hash Secrets” is not always entirely necessary. In the password example, you can hash a password as long as you salt it correctly. But do you really want to have to worry about that? I dont. In fact, I use HMAC for my password databases, too. Its overkill, but it lets me use a standard library that likely makes me safer in the long run.

So the next time youre using a hash function on anything, ask yourself: is any of the stuff Im hashing supposed to stay secret? If so, dont hash. Instead, use HMAC.