Aaron Bray, Co-Founder and CEO of Phylum
A few weeks ago, PyPI announced that it temporarily disabled the ability for users to sign up and upload new packages due to “The volume of malicious users and malicious projects being created on the index in the past week.” Although PyPI stated that the move was a bit overblown, it made headlines because it comes at a time when attackers are evolving their techniques in the open-source ecosystem in an effort to poison the software supply chain and compromise developer environments. And it is working.
Prior to the shutdown, there was a persistent attack that would overwhelm any package manager trying to do the right thing: A bad actor on GitHub laced his repositories with malware written in Python and hosted on PyPI. Minutes after his malware gets taken down from PyPI, the same malware respawns on PyPI under a slightly different name. It’s a vicious cycle made easy for attackers as they embrace automation to avert best efforts by humans.
The situation highlights a few key challenges that businesses are just now coming to terms with:
- It underscores the risks organizations face from their blind trust in free and open-source software published by strangers on the internet. This is certainly not meant to disparage package registries as they created tremendous amounts of value for all of their downstream users and beneficiaries, but it highlights their susceptibility to being coopted as an attack vector.
- It proves just how prevalent bad actors in the open-source ecosystem have become. When my team first started monitoring the open-source ecosystem about a year and a half ago, we would frequently see spikes of malicious packages in the hundreds per month. Now, in just the first quarter of 2023, we saw nearly 900,000 packages that were objectively bad – either overtly malicious or spam.
- The tools and processes most companies have in place are not equipped to defend against the tactics attackers are deploying in the open-source ecosystem. Most initiatives either focus on scanning inventory, complying with regulatory or industry initiatives such as the creation and management of a Software Bill of Materials (SBOMs) or the SLSA framework, or on attestation that centers around ensuring assets aren’t tampered with during the development process. This leaves a blind spot around the inputs, efforts, and individuals involved in creating their software components, and whether or not they will behave as expected.
This new reality has company’s asking themselves: How can we trust the open-source packages we rely upon to build our applications? And how can we secure our software supply chain without impeding the speed of innovation to which we are accustomed?
Fundamentally, from a business risk perspective, organizations have backed themselves into a corner. The use of open source components has skyrocketed in recent years, with industry studies putting the average project’s composition at somewhere between 70-90% open source, with a scant 10-30% proprietary code, across a broad spectrum of verticals. To make matters worse, threat actors are more active than ever before in these ecosystems, and software supply chain attacks have continued to become both more prevalent and more targeted. Rather than being a “black swan event,” these problems have escalated to the point where developers now have difficulty ensuring that a package they pick is legitimate or malicious, and security teams are entirely in the dark as to what things are event being installed during the software development process.
The trend line of open-source utilization and reliance is only set to increase as time goes on. In fact even the CIO of the DoD, which operates under extremely stringent guidelines, has mandated more reliance on open-source software. With this in mind, it is important to remember just how difficult monitoring is for the service providers in this equation. The governance and curation of packages in PyPI, which is the major center of gravity for the Python ecosystem, is almost entirely managed by a few volunteer individuals. Most package registries are similarly understaffed, especially when considering the sheer volume of package publications that need to be managed. We see an average of 50,000 packages published every day across the ecosystems we support.
As attackers continue to target these ecosystems, and new artificial intelligence and automation innovations emerge, how can we expect package registries to manage this burden alone?
At the end of the day, organizations need to bear more responsibility for protecting their developers and the applications that are at the core of their livelihoods. It’s time for businesses to start questioning everything they thought they knew about securing code and protecting their software supply chains. The next time a package registry shuts down, it could be for good. What then?
Author
Aaron Bray, Co-Founder and CEO of Phylum
Aaron has 14 years of experience working in software engineering and information security. Aaron’s past research has focused on program synthesis, malware diversity, software anomaly detection, and the application of natural language processing techniques to binary analysis.