Making a secure data hash

string sha1 ( string source [, bool raw_output])

SHA stands for the "Secure Hash Algorithm", and it is a way of converting a string of any size into a 40-bit hexadecimal number that can be used for verification. If you did not know what hashes are, they are like unidirectional (one-way) encryption designed to check the accuracy of input. By unidirectional I mean that you cannot run $hash = sha1($somestring), then somehow decrypt $hash to get $somestring - it is just not possible, because a hash does not contain its original text. What, then, are hashes good for?

Well, imagine you have users enter a password. How do you check the password is correct?

if ($password == "Frosties") {
// ........

While that solution certainly works, it means that whoever reads your source code gets your password. Similarly if you store all your users' passwords in your database and someone cracks it, you are going to look pretty dumb. If you hash the passwords of people on your database, or in your files, then malicious users will not be able to retrieve the original password.

The downside of that is that authorised users will not be able to get at the passwords either - whether or not that is a good thing varies from case to case, but usually having hashed passwords is worthwhile, and people who forget their password must simply reset it to a new password as opposed to retrieving it.

Hashing is also commonly used to check whether files have downloaded properly - if your hash is equal to the correct hash value, then you have downloaded the file without problem.

The process of data hashing involves taking a value and converting it into a semi-meaningless string of letters and numbers of a fixed length. There is no way - no way whatsoever - to "decrypt" a hash to obtain the original value. The only way to hack a hash is to try all possible combinations of input, which, given that the input for the hash can be as long as you want, can take an awfully long time.

Consider this script:

print sha1("hello") . "\n";
sha1("Hello") . "\n";
sha1("hello") . "\n";
sha1("This is a very, very, very, very, very, very, very long test");

Here is the output I get:


There are three key things to notice there: firstly, all the output is exactly 40 characters in length, and always will be. Secondly, the difference between the hash of "hello" and the hash of "Hello" is gigantic despite the only difference being a small caps change. Finally, notice that there is no way to distinguish between long strings and short strings - because the hash is not reversible (that is, you cannot extract the original input from the hash), you can create a hash of strings of millions of characters in just 40 bytes.

If you had stored your users' passwords hashed in your database, then you need to hash the password they provide before you compare it against the value in your database. One thing that is key to remember is that sha1() will always give the same output for a given input.

Author's Note: If you set the optional second parameter to true, the SHA1 hash is returned in raw binary format and will have a length of 20.


Next chapter: Alternative data hashing >>

Previous chapter: Changing string case

Jump to:


Home: Table of Contents

Follow us on or Twitter

Username:   Password:
Create Account | About TuxRadar