Combatting phantom secrets: have you heard of historical secrets scanning?

By Noa Shilo, Senior Director of Product, Aqua Security [ Join Cybersecurity Insiders ]
530

Most people are familiar with the concept of Schrödinger’s Cat – a thought experiment, whereby a hypothetical cat is sealed in a box with a radioactive substance and a device that releases a poison if the radioactive substance decays.  The experiment is designed to illustrate a quantum paradox wherein the cat may be considered both alive and dead simultaneously because its fate is linked to a random event that may (or may not) occur.

What does this have to do with modern software development? Well, it mirrors an increasingly critical risk associated with secrets embedded in code. These phantom secrets have the potential to cause major cybersecurity issues, yet a worrying number of developers aren’t aware of their existence. Many simply assume they’re long deleted, but until they examine the depths of commit history, they can’t be certain.

What are phantom secrets?

During development or testing, developers often embed sensitive secrets — such as credentials, API tokens, and passkeys — directly into their code, mainly for convenience. Of course, it goes without saying that these sensitive secrets must be removed before the code is pushed to production. To do this, developers typically rely on scanning tools, which find and erase them when the time comes.

However, while many scanners can detect the presence of secrets and accidental exposures, there’s a hidden threat that’s overlooked by a worrying number of these tools – even after secrets are removed, they can still be retrieved from the commit history.

This issue stems from a basic design flaw in Git-based infrastructure, and since this architecture underpins most Source Code Management (SCM) systems — including GitHub, GitLab, and Bitbucket — it impacts nearly all popular DevOps platforms. In fact, recent research by Aqua Nautilus found a vast number of secrets belonging to Fortune 500 companies on GitHub alone.

The implications are extremely concerning. Not only can attackers exploit these exposed secrets to move laterally within an organisation’s environment, escalate privileges, and gain access to sensitive data, but most scanning tools currently can’t detect this threat at all.

Why do secrets scanning tools miss secrets? 

Most of the time when developers run secrets scanning on their SCM, they will be using the git clone, command, either actively or behind the scenes in the internals of the scanning tool.

Due to edge cases or design choices of Git and SCM platforms, when using git clone command, developers will miss some commits that remain unscanned and unreachable. In this case, these commits may contain secrets that won’t be discovered.

GitHub is a popular platform with plenty of public repositories. Hence, it is often targeted by attackers who launch massive secrets harvesting campaigns. However, the problem certainly isn’t limited to GitHub alone.

Interestingly, in its documentation GitHub states unequivocally that sensitive data can be exposed via different scenarios, but it doesn’t explain how and why this exposure happens. It’s unclear for users how this happens and how to find this exposed sensitive data.

To demonstrate the risk, Aqua Nautilus recently conducted a detailed analysis into how many hidden secrets exist. The analysis involved scanning the top 100 organisations on GitHub, ranked by the number of stars, which together have 52,268 different repositories. Firstly, the repositories were scanned with Gitleaks using git clone, then they were scanned again using git clone –mirror. The number of unique secrets, meaning those that only exist in the mirrored version of the repository, were then counted. The analysis found that if users only scan for secrets using a regular git clone, they will miss around 17.78% of the potential secrets in their repositories, which is a startling number.

Eliminate oversights with historical secret scanning

Fortunately, there is now a way to eliminate the oversights inherent in many scanning tools –historical secrets scanning. This new technology, which is available in leading secrets scanning solutions like Aqua Trivy, is designed to identify and address secrets that, though deleted from code, remain accessible in the commit history.

Historical secret scanning works by thoroughly scanning and analysing commit history to uncover hidden or deleted secrets that traditional scanners miss, enabling teams to eliminate these risks once and for all. The key benefits of this approach include a complete view of all secrets without blind spots, enhanced detection that far surpasses conventional scanners alone, a reduced attack surface through the elimination of phantom secrets, and much stronger overall code security.

It’s critical that developers realise credentials, API tokens, and passkeys embedded in code can remain exposed for many years, even after they think they’ve been deleted. Releasing software with these secrets embedded in it poses a significant security risk. Fortunately, adoption of historical secret scanning is a great way to gain complete oversight of all secrets without blind spots, including those buried deep within the commit history. This oversight gives developers and organisations the ability to properly mitigate these risks, helping to reduce their exposure to cyberattacks and significantly bolster their security posture in the process.

Ad
Join over 500,000 cybersecurity professionals in our LinkedIn group "Information Security Community"!

No posts to display