Implementing rate limiting with PHP

This article should take around 2 minutes 57 seconds to read.

If you’ve ever implemented a public facing API, you will know how important rate limiting is. If you don’t someone, somewhere will abuse your API’s sending millions of requests a second, either because they want to pull your your data (just ask FFS), because they want to take your service down or just because they messed up the coding and put the request in an infinite loop (… no… I’ve never done that… honest… Sorry @Jack). Whatever the reason, such a huge volume of requests will, like the “Tragedy of the commons“, cause at least a reduced service for other users and increased costs for you and, if left unchecked, eventually cause your service to start falling over.

The answer to this is of course to limit the number of requests a user can make. Their are two main approaches to this, limit by IP address or limit by account, for the purposes of this post, it doesn’t matter too much which you choose, their are good arguments for either (or both) schema’s, the one you decided to implement will be dependent on your use, threat model and willingness to manage accounts.

Regardless of which approach you choose the logic you need to follow is the same. For the purposes of the examples below I’ll assume we’re managing by a key and allowing no more than 100 requests every 10 minutes.

  1. When a request is made, and before you perform whatever function the API performs, log the key/IP address in a table. The details you store here should be just the key/IP address, the endpoint being accessed and a DateTime stamp of the transaction
  2. Perform an SQL query something like SELECT count(*) FROM table WHERE key=@key AND timestamp < DATE_ADD(CURRENT_TIMESTAMP(), INTERVAL -10 MINUTE);
  3. This will return the total number of requests in the last 10 minutes, if that’s less then 101 (101 because we’ve already logged the current request) then we can allow the call otherwise we return a failed message.

The problem with this is that this table will grow huge in a very short amount of time, therefore we should add an event to reduce the size of this table every hour or so (of courses if you want to the know exactly who has accessed what and when, you could you this as as way to log that information).

A second problem arises with this if we want to allow different limits for different users. For example you may want to allow FaceBook to make slightly more calls to your API’s then you would allow this blog to make. This can easily be implemented with a step 0.

  1. Validate the users key and retrieve their settings.

This allows you to revoke keys, set limits at the key level and do fancy things like return different data for different users (who pay different amounts) based on the program they have signed up to.

Continue Reading

Securing data at rest and the database

This article should take around 9 minutes 12 seconds to read.

AKA Best practice for storing your user’s data.

Just last month Quora (a question and answer style site) suffered a huge data breach losing the personal data of over 100 million users (that’s slightly less than the population of Egypt).

So how do you prevent your site from being a target?

In the military world, there is a concept called Defence in Depth, the idea is to make it as hard as possible for your opponent, by slowing them down, forcing them to fight battle after battle for every inch of ground. This concept has been co-opted by the computer security world.

The rest of the article will assume you’ve been hacked, or rather you will be hacked. 
If you look through Wikipedia’s list of security hacks, you’ll notice that originally (70’s, 80’s and 90’s) these hacks where, for the most part, pranks, mostly aimed at either getting relatively minor services (telephone calls) for free, while the later hacks have been aimed at earning serious money for the criminals involved.

So you may feel that since you don’t run a banking or crypto currency site, your don’t need to worry. 

Sadly most people at one time or another have used the same (or almost the same) password on different sites, so while someone stealing all the user names and passwords to your “I love kittens” forum may allow the evil dog lover to post photos of dogs and embarrassed your members if they have used the same password email address and password on “high-end-fashion.com” the evil dog lover could login their and (if high end fashion stores your users payment details) order designer dresses for themselves.

If you where a money grabbing ner-well-a-do, rather than the fine upstanding person you are, you may be tempted to think “Pah , what do I care, I won’t lose anything” however you’d be wrong. If just one of the people whose data was stolen from you was a citizen of the European Union, their data comes under the jurisdiction of the GDPR and you legally *must* protect it.

Hash passwords

One of the most basic ways of protecting a users password is not to store it at all, but store a cryptographic “hash” of the password.
Say for example you’re user choose the password Pa$$w0rd rather then store that you could store 02726d40f378e716981c4321d60ba3a325ed6a4c which is the hashed version of that password.

As you can see it looks nothing like the password, whats more theirs no way to get from the result of the hash to the password, it’s gone, additionally even a small change in the password give a totally different hash result 

Pa%$w0rd, a change of one bit gives you the result a3e35fb1bc27126e65b396456a048c99bea9a5fb and Pa$%w0rd, the same characters as the last one but in a slightly different order, gives you 590f36c43ede760092da844e34e4895c71c5f9f9

However applying the same hash function to the same input word will give the same output hash, so every-time you hash Pa$$w0rd you will get 02726d40f378e716981c4321d60ba3a325ed6a4c. This means that rather than checking if a password exists and user name combination exist within the database you can just check if the hash exists.

As I alluded to before their are a number of different hash functions, some stronger then others. In general you should go with the strongest you can get. With the latest version of PHP this is likely SHA3-512. if your having to work with older versions of PHP it may be SHA2.

Within PHP this can be implemented like this

A full description of the PHP function hash can be found in the online manual.

SaltedHash

As noted above every-time you run a word through a hash function you get the same answer. So what if someone where to run every common password through a hash function, note the results, then if you see that value again you know what the password was originally.

Well people do, do this, and the results are called rainbow tables. and you can even do them on line. If you put the three hash we created earlier into crack station you will notice that it only comes up with one success.

A screen shot of the results from crackstation

This is because it takes a *lot* of space, time and therefore money to generate rainbow tables, and therefore they only focus on the most common billion or so passwords (yep, billion). So how do you defeat rainbow tables? make the password not in throes billion words.

If when you generate the hash for the password you add a random word, called a salt, to the end you create a password that (likely) won’t appear in a rainbow table. and as long as you also store that salt in the database along with the hash you’ll still be able to append it to the password and check the hash. 

If you wanted to improve the security (slightly) more you could use two salts, one fixed salt, stored in a config file and one generated for each account and stored within the database. Within PHP this may look a little like this.

The benefit of adding a fixed salt may not the immediately obvious. From a hackers point of view, they may see a database table with a password that’s clearly a hash and a salt and, if your passwords are customers are worth the extra effort, try common passwords + the hash against the hash you have stored. Including a fixed hash that the hacker will only know about if they have access to you’re source code / config files (and if they do you have serious problems) allows you to make that extra effort by the hacker pointless.

Stored procedures

As I said in my last paragraph

if they have access to you’re source code / config files

When most people think of a hacked site they assume the hacker has total access to everything. However generally the hacker is “just” after the database. Most attacks of this nature use what is called SQL injection to send rouge commands to the database. 

The best way to deal with SQL injection is to use SQL Parameters, in PHP that is done as follows.

This is best used with with stored procedures

Encrypted columns

Even if you’ve salted and hashed all the user passwords and put all your queries in parameter’s their are still things that can be done to protect you users data, it can be encrypted while stored in the database.

By using AES_ENCRYPT when inserting and updating data (and AES_DECRYPT when Selecting) with a known passphrase stored within a config file if a hacker does gain access to your database all they will get is encrypted data.

 

 

Continue Reading

Can we use PHP for machine learning?

This article should take around 2 minutes 46 seconds to read.
In response to a question asked on a facebook group…

The simple answer is, yes you can. but before I explain how, I need to give a quick explanation of what “Machine learning” or “Artificial Intelligence” is.

At its heart “Machine learning” isn’t magic, it isn’t a black art, it is a set of algorithms which used mathematical functions which can be used to look for patterns in data.

Imagine you have two groups each containing a number of items.

{1,2,1,8,2,1,1,2,1}

{8,9,10,8,9,9,10,1}

and you have a single number you know belongs in one of the two groups but your not sure which one

2

Which group do you think the number belongs in? The first set contains mostly lower numbers and the second set mostly numbers above seven. So it’s likely that the item belongs in set one.

Automate this and you have machine learning.

Machine learning, therefore, can be implemented in almost any language, you just need to implement the relevant algorithms.

Sadly for all but trivial cases, such algorithms are hard to implement.

Thankfully Microsoft, Google, and Amazon all have API that can be used to implement Artificial Intelligence within your own applications.

Since all the API are web-based they can be accessed by any application which can make web-based calls, and so PHP can be used with the API’s.

Microsoft has five main types of cognitive services.

  • Vision – Image-processing algorithms to smartly identify, caption and moderate your pictures.
  • Knowledge – Map complex information and data in order to solve tasks such as intelligent recommendations and semantic search.
  • Language – Allow your apps to process natural language with pre-built scripts, evaluate sentiment and learn how to recognize what users want.
  • Speech – Convert spoken audio into text, use voice for verification, or add speaker recognition to your app.
  • Search – Add Bing Search APIs to your apps and harness the ability to comb billions of web pages, images, videos, and news with a single API call.

Which allow you to implement a wide range of functionality. Best of all, if you don’t already have an Azure account you can sign up for one for free and get more than enough free credits to build something amazing.

Continue Reading