You’ve got millions of open-source software components to choose from… and so do cybercriminals

Sponsored In November 2020, the JavaScript registry npm flashed a security advisory that a library called twilio-npm harboured malicious code which could backdoor any machine it was downloaded to. Perhaps the most troubling aspect of this tale is that this was the seventh such malicious package found on npm within a month, a stark illustration of the effort that cybercriminals are making to insert themselves into the open source software supply chain.

Between February 2015 and June 2019, 216 such Next Generation Software Supply Chain Attacks were recorded, according to Sonatype’s State of the Software Supply Chain Report, 2020. From July 2019, to May 2020, the number shot up to 929. Attacks jumped 430 per cent between 2019 and 2020.

Some of the attackers’ techniques remain surprisingly low-tech. The most common approach, according to the report, is typosquatting, as in the 2019 attack on Javascript utility library Lodash, which simply relied on developers accidentally typing Lodahs. This exposed them to a malicious package containing malware to exfiltrate cryptocurrency wallets.

More pernicious attacks come in the form of stealing credentials from a project maintainer or contributing malicious versions of a project to a repo or pull requests with malicious code. The Octopus Scanner attack in May 2020 malware hijacked the Apache Netbeans IDE to propagate itself, or as GitHub explained it, “the malware would proceed to backdoor NetBeans project builds“.

Software supply chain attacks are not new of course. The most notorious breach in recent years – the Struts attack which poleaxed credit giant Equifax, amongst others in 2017 – fell squarely into the software supply chain attack category. What made the vulnerability so fatal, in Equifax’s case, was the company’s failure to update the open source software library in a timely fashion.

For many, the benefits of building applications from OSS components clearly outweigh the risks. However, the Equifax incident vividly illustrates how developers’ increasing use of open source software can introduce security vulnerabilities and the need for rapid response. Just 10 per cent of the average enterprise application is built in-house these days, says Derek Weeks, vice president and DevOps advocate at Sonatype, and this essentially amounts to the glue holding an array of OSS components together.

According to Weeks, anywhere from 10 per cent to 40 percent of open source software components developers are downloading have known vulnerabilities. This creates a massive potential attack surface for adversaries – all they need to do is identify the particular components their targets are running.

It is therefore concerning that only 17 percent of organisations were aware of new vulnerabilities in open source components within a day of disclosure, reveals a survey of 679 developers conducted for Sonatype’s 2020 State of the Software Supply Chain Report. Some 35 per cent took between a day and a week, while the rest (48 per cent) remained in ignorance for at least a week.

When it comes to mean time to remediate such issues, tardiness gives attackers plenty of time to mount attacks on newly disclosed vulnerabilities in unpatched systems in such “legacy” software supply chain attacks.

Next-generation software supply chain attacks

But cyber-criminals are remorselessly forward looking. And as they look for ways to be even more efficient, it’s inevitable they would make the leap from thinking “what flawed components can I exploit” to “what if I could create a flawed component that everyone would download?” As Weeks puts it, attackers – whether individuals, syndicates, or nation states – are realizing, “I can create my own zero days.”

It’s clear that cyber-criminals have read and fully digested the DevOps and open source playbooks to conduct these next-generation software supply chain attacks aimed at open source software project code.

Many of the factors that make open software components the default for enterprise software developers are also factors that bad actors can leverage. Big open source projects sometimes rely on contributions from hundreds, potentially thousands of volunteers. And open source projects themselves have numerous dependencies, all of which can have vulnerabilities. Running through all of this is the shared trust model that is necessary to bind together global communities working towards a common goal.

It is easy to see how a bad actor can blend into the crowd, quietly secreting malicious code within the thousands of contributions making up a popular package or component. The multiplier effect of OSS downloads within popular projects means they can potentially make a far bigger return on their investment than through traditional attack methods. The more downloads a project gets following the insertion of malicious code, the more attack surface available for immediate exploit.

After all, according to Sonatype’s 2020 State of the Software Supply Chain Report projections, developers requested over 1.5 trillion open source software components and containers in 2020.

Just looking at npm packages, a 2019 study by Darmstadt University researchers found a typical package contained an average of 79 third party packages from 39 different maintainers. Furthermore, the researchers found these complex dependency chains meant 391 highly influential contributors affected more than 10,000 components.

If you were a bad actor, looking to pass yourself off as someone else, these would be your prime targets. Were you to gain access to the 20 most popular maintainer accounts, you could deploy malicious code hitting over half the npm ecosystem, the Darmstadt researchers concluded. Likewise, the “package reach” of the top five packages was between 134,774 and 166,086 other packages.

Meanwhile, the Linux Foundation’s Core Infrastructure Initiative found that seven of the top 10 most used software packages were hosted under individual developer accounts. Seizing control of such accounts to distribute attacks could mean chaos downstream in software supply chains.

As we’ve seen, many organisations struggle even to deal with the problems Equifax illustrated almost half a decade ago. How are they expected to protect themselves when every OSS package could potentially contain malicious code?

Keeping tabs

For enterprises, the first step in securely using open source is to simply be clear what open source packages their developers are using, says Weeks.

“If a vulnerability is announced today, your first question as a development team or a company is ‘did I ever download that package and use it?’ If the answer is yes, your next question is ‘Where is it?’ If you don’t keep an inventory of what you’re using, the resulting action looks more like a months long scavenger hunt than a precision, rapid response.’”

The second part, he says, is ensuring the organization has a system to evaluate whether what you’re using is good or bad. Crucially, this has to be done where it matters, and that is not within the security or governance team. Rather it has to find its way into the tools and environments being used by developers. “The only people in your organisation that are downloading open source code, are developers,” says Weeks.

So, it’s developers who need to be encouraged and empowered to ask the question: “Are the components I’m using good or bad?” But where do they find the answers?

Advanced Development Pack

As Weeks explains, developers have historically shared information about what packages to use by word of mouth. But this is hardly a robust way of comparing the quality and security implications of different packages – especially when any given organization downloads over 300,000 open source components annually.

This is the problem that Sonatype targets with its Advanced Development Pack, which leverages artificial intelligence and machine learning to uncover warning signs about open source projects.

“What we really set out to do a couple of years ago was to research who are the best quality suppliers,” Weeks says, “How many downloads has a project had? How many developers does the project have? How often are those projects releasing new versions, updating their dependencies to the latest versions, or remediating known vulnerabilities?”

Sonatype has evaluated over 20 different attributes of project behaviors over five years, to highlight the differences between the highest and lowest performing OSS projects. This provides the raw material for empirically rating component suppliers, beyond just word of mouth and who gives away the best swag at conferences.

The next step is using that raw info to take a more intelligence-driven approach to identifying behavior that could suggest bad actors are actively attempting to introduce vulnerabilities into open source projects. For example, the sudden addition of developers to a group and a dramatic uptick in release frequency might trigger further investigation.

“We’ve identified what is a traditional pattern across tens of thousands of projects that we’re looking at, and what is the anomalous behavior,” explains Weeks. “And we’ve trained our artificial intelligence and machine learning infrastructure involved in Sonatype’s Nexus Intelligence that fuels the Advanced Development Pack.”

Any anomaly noticed will provide the cue for Sonatype’s researchers to do more in-depth research on a given project or component. At the same time, via the Advanced Development Pack, it will flag anomalous behavior direct to developers, enabling them to make more informed decisions on whether to integrate a given component.

“In the Java space alone, there’s six million open source component releases out there, “Weeks notes. “You can’t assign manual review work to the software development or cybersecurity community and say, ‘can you keep your eyes on all six million of those things?’ You have to train machines to be able to look for and interpret that anomalous behavior that then tips you off to potential adversary behavior.”

Sponsored by Sonatype