Skip to content

Parser vulnerable to Trojan Source attack #11392

Open
@straight-shoota

Description

A recently released paper titled Trojan Source: Invisible Vulnerabilities demonstrates an attack against source code. It uses Unicode bi-directional overrides to disguise the meaning of code to a human reader. This can lead to seemingly harmless code introducing malicious behaviour.

Crystal demonstration

The following code demonstrates a stretched-string attack in Crystal:

access_level = "user"
if access_level != "user‮ ⁦# Check if admin⁩ ⁦"
  puts "You are an admin!"
end

https://carc.in/#/r/c6ka

The following code demonstrates a commenting-out attack in Crystal:

access_level = "user"
if access_level != "none‮⁦" # Check if admin⁩⁦" && access_level != "user
  puts "You are an admin!"
end

https://carc.in/#/r/c6kh

They looks mostly unsuspicious. You wouldn't expect either to print anything. But both programs actually print You are an admin! despite access_level = "user".

The second lines of each program's source code contain a number of Unicode control characters for bi-directional overrides. This is what the parser reads:

# stretched-string attack
if access_level != "user\u202E \u2066# Check if admin\u2069 \u2066"
# commenting-out attack
if access_level != "none\u202E\u2066"# Check if admin\u2069\u2066" && access_level != "user

The only indicator that something might be off is the syntax highlighting, which should be pretty resistant to being fooled.
Github has already introduced a feature that shows a warning when bi-directional overrides are detected in a file: https://github.blog/changelog/2021-10-31-warning-about-bidirectional-unicode-text/

Mitigation

This vulnerability can be defended easily by disallowing bi-directional control characters in source code.
In many locations, such control characters are already a syntax error. But they are currently valid in comments and string literals. Those are the typical spots for most languages.

However, Crystal's parser currently even accepts Unicode control characters in identifiers, including bi-directional override characters. Restricting the allowed character set in general is another problem and tracked in #11216.

I propose to change the language specification and lexer rules such that valid Crystal source code must not contain any bi-directional control characters, regardles of location.

A more fine-grained approach would be possible as well, but this should be unnecessary considering there are little to no legitimate use cases for bidirectional control characters in Crystal source code (but for some specific exceptions mentioned in the following section).

Workarounds

Bi-directional override characters are legitimate contents for string literals. Instead of encoding them directly in the source code, a proper workaround is to use escape sequences for that.

Bi-directional overrides can technically be legitimate in comments if you want to mix languages with different directions in the comment text. That does not seem like a very important use case, though.

Still, as a further enhancement, bi-directional overrides could potentially be allowed in comments and possibly other locations such as string literals as long as they are fully enclosed inside the comment or literal.

The general vulnerability is tracked as CVE-2021-42574.

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions