AKA Best practice for storing your user’s data.
Just last month Quora (a question and answer style site) suffered a huge data breach losing the personal data of over 100 million users (that’s slightly less than the population of Egypt).
So how do you prevent your site from being a target?
In the military world, there is a concept called Defence in Depth, the idea is to make it as hard as possible for your opponent, by slowing them down, forcing them to fight battle after battle for every inch of ground. This concept has been co-opted by the computer security world.
The rest of the article will assume you’ve been hacked, or rather you will be hacked.
If you look through Wikipedia’s list of security hacks, you’ll notice that originally (70’s, 80’s and 90’s) these hacks where, for the most part, pranks, mostly aimed at either getting relatively minor services (telephone calls) for free, while the later hacks have been aimed at earning serious money for the criminals involved.
So you may feel that since you don’t run a banking or crypto currency site, your don’t need to worry.
Sadly most people at one time or another have used the same (or almost the same) password on different sites, so while someone stealing all the user names and passwords to your “I love kittens” forum may allow the evil dog lover to post photos of dogs and embarrassed your members if they have used the same password email address and password on “high-end-fashion.com” the evil dog lover could login their and (if high end fashion stores your users payment details) order designer dresses for themselves.
If you where a money grabbing ner-well-a-do, rather than the fine upstanding person you are, you may be tempted to think “Pah , what do I care, I won’t lose anything” however you’d be wrong. If just one of the people whose data was stolen from you was a citizen of the European Union, their data comes under the jurisdiction of the GDPR and you legally *must* protect it.
One of the most basic ways of protecting a users password is not to store it at all, but store a cryptographic “hash” of the password.
Say for example you’re user choose the password Pa$$w0rd rather then store that you could store 02726d40f378e716981c4321d60ba3a325ed6a4c which is the hashed version of that password.
As you can see it looks nothing like the password, whats more theirs no way to get from the result of the hash to the password, it’s gone, additionally even a small change in the password give a totally different hash result
Pa%$w0rd, a change of one bit gives you the result a3e35fb1bc27126e65b396456a048c99bea9a5fb and Pa$%w0rd, the same characters as the last one but in a slightly different order, gives you 590f36c43ede760092da844e34e4895c71c5f9f9
However applying the same hash function to the same input word will give the same output hash, so every-time you hash Pa$$w0rd you will get 02726d40f378e716981c4321d60ba3a325ed6a4c. This means that rather than checking if a password exists and user name combination exist within the database you can just check if the hash exists.
As I alluded to before their are a number of different hash functions, some stronger then others. In general you should go with the strongest you can get. With the latest version of PHP this is likely SHA3-512. if your having to work with older versions of PHP it may be SHA2.
Within PHP this can be implemented like this
A full description of the PHP function hash can be found in the online manual.
As noted above every-time you run a word through a hash function you get the same answer. So what if someone where to run every common password through a hash function, note the results, then if you see that value again you know what the password was originally.
Well people do, do this, and the results are called rainbow tables. and you can even do them on line. If you put the three hash we created earlier into crack station you will notice that it only comes up with one success.
This is because it takes a *lot* of space, time and therefore money to generate rainbow tables, and therefore they only focus on the most common billion or so passwords (yep, billion). So how do you defeat rainbow tables? make the password not in throes billion words.
If when you generate the hash for the password you add a random word, called a salt, to the end you create a password that (likely) won’t appear in a rainbow table. and as long as you also store that salt in the database along with the hash you’ll still be able to append it to the password and check the hash.
If you wanted to improve the security (slightly) more you could use two salts, one fixed salt, stored in a config file and one generated for each account and stored within the database. Within PHP this may look a little like this.
The benefit of adding a fixed salt may not the immediately obvious. From a hackers point of view, they may see a database table with a password that’s clearly a hash and a salt and, if your passwords are customers are worth the extra effort, try common passwords + the hash against the hash you have stored. Including a fixed hash that the hacker will only know about if they have access to you’re source code / config files (and if they do you have serious problems) allows you to make that extra effort by the hacker pointless.
As I said in my last paragraph
if they have access to you’re source code / config files
When most people think of a hacked site they assume the hacker has total access to everything. However generally the hacker is “just” after the database. Most attacks of this nature use what is called SQL injection to send rouge commands to the database.
The best way to deal with SQL injection is to use SQL Parameters, in PHP that is done as follows.
This is best used with with stored procedures
Even if you’ve salted and hashed all the user passwords and put all your queries in parameter’s their are still things that can be done to protect you users data, it can be encrypted while stored in the database.
By using AES_ENCRYPT when inserting and updating data (and AES_DECRYPT when Selecting) with a known passphrase stored within a config file if a hacker does gain access to your database all they will get is encrypted data.