Injection is the underlying issue for a large number of vulnerabilities, such as SQL injection, OS command injection, and XML injection. Together, injections account for a huge percentage of vulnerabilities found in real-world applications and APIs.
How injections happen
In a single sentence, injection happens when an application cannot properly distinguish between untrusted user data and code.
Untrusted user data can be HTTP request parameters, HTTP headers, and cookies. They can also come from databases or stored files that can be modified by the user. If the application does not properly process the untrusted user data before inserting it into a command or query, the program’s interpreter will confuse the user input as a part of a command or a query. In this case, attackers can send data to an application in a way that will change the meaning of its commands.
In a SQL injection attack, for example, the attacker injects data to manipulate SQL commands. And in a command injection attack, the attacker injects data that manipulates the logic of OS system commands on the hosting server. Any program that combines user data with programming commands or code is potentially vulnerable.
Injection vulnerabilities can affect API systems as well because an API is just another way untrusted user input can enter an application. Let’s take a look at how injection vulnerabilities appear in an API.
Example #1: Retrieving blog posts
Let’s say that an API allows its users to retrieve blog posts by sending a GET request like this one:
This request will cause the API to return post 12358. The server will retrieve the corresponding blog post from the database with a SQL query, where
post_id refers to the
id passed in by the user via the URL.
SELECT * FROM posts WHERE post_id = 12358
Now, what if the user requests this from the API endpoint instead?
GET /api/v1.1/posts?id=12358; DROP TABLE users
The SQL server would interpret the portion of the
id after the semicolon as a separate SQL command. So the SQL engine will first execute this command to retrieve the blog post as usual:
SELECT * FROM posts WHERE post_id = 12358;
Then, it will execute this command to delete the
users table, causing the application to lose the data stored in that table.
DROP TABLE users
This is called a SQL injection attack and can happen whenever user input is passed into SQL queries in an unsafe way. Note that user input in an API doesn’t just travel via URL parameters, they can also reach the application via POST requests, URL path parameters, and so on. So it’s important to secure those places too.
Example #2: Reading system files
Let’s say the site allows users to read the files that they’ve uploaded via an API endpoint:
This request will cause the server to retrieve the user’s files via a system command:
In this case, a user could inject new commands into the OS system command by adding additional commands after a semicolon.
GET /api/v1.1/files?id=1123581321; rm -rf /var/www/html/users
This command will force the server to remove the folder located at
/var/www/html/users, which is where the application stores user information.
rm -rf /var/www/html/userswal
Preventing injection vulnerabilities in APIs
These are simplified examples of injection vulnerabilities. In practice, it’s important to remember that injection vulnerabilities are not always this obvious. Manipulation can happen any time the injected data is being processed or used. Even if the malicious user data is not used by the application right away, the untrusted data can eventually travel somewhere in the program where it can do something bad, such as a dangerous function or an unprotected query. This is where they cause damage to the application, its data, or its users.
You can see why injection is so difficult to prevent. Untrusted data can attack any application component that it touches downstream. For every piece of untrusted data the application receives, meanwhile, it needs to detect and neutralize attacks targeting every part of the application. An application might conclude that a piece of data is safe because it does not contain any special characters used for triggering XSS — but the attacker actually intends to trigger an SQL injection instead. It’s not always straightforward to determine what data is safe and what data is not, because safe and unsafe data looks very different in different parts of the application.
So how do you protect against these threats? The first thing you can do is to validate the untrusted data. This means that you either implement a blocklist to reject any input that contains dangerous characters that might affect application components. Or you implement an allowlist that only allows input strings with known good characters. For example, let’s say that you are implementing a sign-up functionality. Since you know that the data is going to be inserted into a SQL query, you reject any username input that is special characters in SQL, like the single quote. Or, you can implement a rule that only allows alphanumeric characters.
But sometimes blocklists are hard to do because you don’t always know which characters are going to be significant to application components down the line. If you just miss one special character, that can allow attackers to bypass protection.
And allowlists may be too restrictive and in some cases, and sometimes you might need to accept special characters like single quotes in user input fields. For example, if a user named Conan O’Brien is signing up, he should be allowed to use a single quote in his name.
Another possible defense against injection is parameterization. Parameterization refers to compiling the code part of a command before any user-supplied parameters are inserted.
This means that instead of concatenating user input into program commands and sending it to the server to be compiled, you define all the logic first, compile it, then insert user input into the command right before execution. After the user input is inserted into the final command, the command will not be parsed and compiled again. And anything that was not in the original statement will be treated as string data, and not executable code. So the program logic part of your command will remain intact.
This allows the database to distinguish between the code part and the data part of the command, regardless of what the user input looks like. This method is very effective in preventing some injection vulnerabilities, but cannot be used in every context in code.
And finally, you can escape special characters instead. Escaping means that you encode special characters in user input so that they are treated as data and not as special characters. By using special markers and syntax to mark special characters in user input, escaping lets the interpreter know that the data is not intended to be executed.
But this method comes with its problems as well. For one, you must use the exact encoding syntax for every downstream parser or risk the encoded values being misinterpreted by a parser. You might also forget to escape some characters, which attackers can use to neutralize your encoding attempts. So a key to preventing injection vulnerabilities is to understand how parsers of different languages work, and which parsers run first in your processes.