University researchers have published details of a new attack method they call “Trojan Source” that injects vulnerabilities into the source code of a software project in a way that human examiners cannot detect. .
Trojan Source relies on a simple trick that does not require modifying the compiler to create vulnerable binaries.
The method works with some of the most used programming languages ââtoday, and adversaries could use it for supply chain attacks.
Abuse of text encoding standards
Researchers at the University of Cambridge, UK, have disclosed and demonstrated the class of “Trojan Source” attacks that could compromise proprietary software and supply chains.
âThe trick is to use Unicode control characters to rearrange the tokens in the source code at the encoding level,â reveals Nicholas Boucher, one of the researchers who discovered Trojan Source.
âWe have discovered ways to manipulate the encoding of source code files so that human viewers and compilers see different logic. One particularly pernicious method is using Unicode directionality wildcards to display the code as an anagram of its true logic, âsaid Ross Anderson, the other researcher behind the test of the Trojan Source attack method.
By using control characters embedded in comments and strings, a malicious actor can rearrange the source code to change its logic to create an exploitable vulnerability.
Bidirectional and homoglyph attack
Researchers have shown that one way to do this is to use Unicode controls for bidirectional text (for example, LRI -isolate from left to right and RLI -isolate from right to left) to dictate the direction in which the content is displayed. This method is now followed under the name CVE-2021-42574.
The LRI and RLI bidirectional controls (Bidi) are invisible characters, and they are not the only ones. By injecting these instructions, a compiler can compile completely different code than what a human sees.
In the image below, using the RLI / RLI controls inside the string, the second line is compiled while the human eye reads it as a comment that the compiler would ignore.
By injecting Unicode Bidi wildcards into comments and strings, an adversary could “produce syntactically valid source code in most modern languages ââfor which the order in which the characters are displayed has logic that deviates from the real logic â.
Another way is a homoglyphic attack (CVE-2021-42694), where two different characters have a similar visual representation, such as the number “zero” and the letter “O” or the lowercase “L” and the uppercase “i” . “
In a Homoglyph Trojan Source attack as shown below, the human eye will see the two identical functions, while the compiler distinguishes between the Latin “H” and the Cyrillic “H” and treats the code as having two different functions. , so the result will not be the same.
In a paper [PDF] Detailing the new Trojan Source attack method, the researchers point out that bidirectional wildcard characters (Bidi) persist through the action of copy / paste on most browsers, editors and operating systems.
Researchers tested the Trojan Source attack against several code editors and web repositories commonly used in programming and found that their method worked on many of them.
Following the principle of using Bidi replacements to create valid code when reordered, researchers have found at least three techniques for exploiting source code:
- Early returns – hide a real ‘return’ statement in a comment so that it can cause a function to return sooner than it looks
- Comment – trick human review by placing important code, such as a conditional, in a comment so that it is not taken into account by the compiler or interpreter
- Stretched strings – reverse the order of the code so that it appears to be outside of a string literal
One way to defend against Trojan Source is to reject the use of control characters for text directionality in language specifications and in compilers that implement languages.
“In most cases, this simple solution may be sufficient. If an application wants to print text that requires Bidi replacements, developers can generate these characters using escape sequences rather than potentially embedding characters. dangerous in the source code “
On July 25, researchers informed several maintainers of products affected by the Trojan Source attack method and set a 99-day embargo disclosure period.
The CERT Coordination Center also received a vulnerability report and assisted with coordinated disclosure by providing a shared communication platform for vendors implementing defenses.
As a result of their report, the researchers received an average of $ 2,246 in bug bounties from five of the recipients, although 11 of them had a bug bounty program.
At present, several compilers are unable to stop the Trojan Source attack method, although nearly two dozen software vendors are aware of the threat.
Since many maintainers have yet to implement a fix, the two researchers recommend that governments and businesses identify their vendors and push them to adopt the necessary defenses.
“The fact that the Trojan Source vulnerability affects nearly all computer languages ââmakes it a rare opportunity for a system-wide and environmentally sound comparison of responses between platforms and between vendors.”
The researchers added that three companies that maintain code repositories are currently deploying defenses against Trojan Source.
In a repository on GitHub, they provide proof of concept (PoC) scripts that demonstrate how threatening the Trojan Source attack can be.