Skip to content

Commit e1b966e

Browse files
committed
Java: Add query to detect special characters in string literals
1 parent 12cda86 commit e1b966e

File tree

7 files changed

+355
-0
lines changed

7 files changed

+355
-0
lines changed

java/ql/integration-tests/java/query-suite/java-code-quality-extended.qls.expected

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ ql/java/ql/src/Violations of Best Practice/Naming Conventions/ConfusingMethodNam
7777
ql/java/ql/src/Violations of Best Practice/Naming Conventions/ConfusingOverloading.ql
7878
ql/java/ql/src/Violations of Best Practice/Naming Conventions/LocalShadowsFieldConfusing.ql
7979
ql/java/ql/src/Violations of Best Practice/Naming Conventions/SameNameAsSuper.ql
80+
ql/java/ql/src/Violations of Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.ql
8081
ql/java/ql/src/Violations of Best Practice/Undesirable Calls/CallsToStringToString.ql
8182
ql/java/ql/src/Violations of Best Practice/Undesirable Calls/DefaultToString.ql
8283
ql/java/ql/src/Violations of Best Practice/Undesirable Calls/DoNotCallFinalize.ql
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
## Overview
2+
3+
This query detects non-explicit control and whitespace characters in java literals.
4+
Such characters are often introduced accidentally and can be invisible or hard to recognize, leading to bugs when the actual contents of the string contain control characters.
5+
6+
## Recommendation
7+
8+
To avoid issues, use the encoded versions of control characters (e.g., ASCII `\n`, `\t`, or Unicode `U+000D`, `U+0009`).
9+
This makes the literals (e.g., string literals) more readable, and also helps to make the surrounding code less error-prone and more maintainable.
10+
11+
## Example
12+
13+
The following examples illustrate `NON_COMPLIANT` and `COMPLIANT` code:
14+
15+
`NON_COMPLIANT`:
16+
17+
```java
18+
char tabulationChar = ' '; // NON_COMPLIANT
19+
String tabulationCharInsideString = "A B"; // NON_COMPLIANT
20+
String fooZeroWidthSpacebar = "foo​bar"; // NON_COMPLIANT
21+
```
22+
23+
`COMPLIANT`:
24+
25+
```java
26+
char escapedTabulationChar = '\t';
27+
String escapedTabulationCharInsideString = "A\tB"; // COMPLIANT
28+
String fooUnicodeSpacebar = "foo\u0020bar"; // COMPLIANT
29+
String foo2Spacebar = "foo bar"; // COMPLIANT
30+
String foo3Spacebar = "foo bar"; // COMPLIANT
31+
```
32+
33+
## Implementation Notes
34+
35+
This query detects java literals that contain reserved control characters and/or non-printable whitespace characters, such as:
36+
37+
- Decimal and hexidecimal representations of ASCII control characters (code points 0-8, 11, 14-31, and 127).
38+
- Invisible characters (e.g., zero-width space, zero-width joiner).
39+
- Unicode C0 control codes, plus the delete character (U+007F), such as:
40+
41+
| Escaped Unicode | ASCII Decimal | Description |
42+
| --------------- | ------------- | ------------------------- |
43+
| `\u0000` | 0 | null character |
44+
| `\u0001` | 1 | start of heading |
45+
| `\u0002` | 2 | start of text |
46+
| `\u0003` | 3 | end of text |
47+
| `\u0004` | 4 | end of transmission |
48+
| `\u0005` | 5 | enquiry |
49+
| `\u0006` | 6 | acknowledge |
50+
| `\u0007` | 7 | bell |
51+
| `\u0008` | 8 | backspace |
52+
| `\u000B` | 11 | vertical tab |
53+
| `\u000E` | 14 | shift out |
54+
| `\u000F` | 15 | shift in |
55+
| `\u0010` | 16 | data link escape |
56+
| `\u0011` | 17 | device control 1 |
57+
| `\u0012` | 18 | device control 2 |
58+
| `\u0013` | 19 | device control 3 |
59+
| `\u0014` | 20 | device control 4 |
60+
| `\u0015` | 21 | negative acknowledge |
61+
| `\u0016` | 22 | synchronous idle |
62+
| `\u0017` | 23 | end of transmission block |
63+
| `\u0018` | 24 | cancel |
64+
| `\u0019` | 25 | end of medium |
65+
| `\u001A` | 26 | substitute |
66+
| `\u001B` | 27 | escape |
67+
| `\u001C` | 28 | file separator |
68+
| `\u001D` | 29 | group separator |
69+
| `\u001E` | 30 | record separator |
70+
| `\u001F` | 31 | unit separator |
71+
| `\u007F` | 127 | delete |
72+
73+
- Zero-width Unicode characters (e.g., zero-width space, zero-width joiner), such as:
74+
75+
| Escaped Unicode | Description |
76+
| --------------- | ------------------------- |
77+
| `\u200B` | zero-width space |
78+
| `\u200C` | zero-width non-joiner |
79+
| `\u200D` | zero-width joiner |
80+
| `\u2028` | line separator |
81+
| `\u2029` | paragraph separator |
82+
| `\u2060` | word joiner |
83+
| `\uFEFF` | zero-width no-break space |
84+
85+
The following list outlines the _**explicit exclusions from query scope**_:
86+
87+
- any number of simple space characters (`U+0020`, ASCII 32).
88+
- an escape character sequence (e.g., `\t`), or the Unicode equivalent (e.g., `\u0009`), for printable whitespace characters:
89+
90+
| Character Sequence | Escaped Unicode | ASCII Decimal | Description |
91+
| ------------------ | --------------- | ------------- | --------------- |
92+
| `\t` | \u0009 | 9 | horizontal tab |
93+
| `\n` | \u000A | 10 | line feed |
94+
| `\f` | \u000C | 12 | form feed |
95+
| `\r` | \u000D | 13 | carriage return |
96+
| | \u0020 | 32 | space |
97+
98+
- character literals (i.e. single quotes) containing control characters.
99+
- literals defined within "likely" test methods, such as:
100+
- JUnit test methods
101+
- methods annotated with `@Test`
102+
- methods of a class annotated with `@Test`
103+
- methods with names containing "test"
104+
105+
## References
106+
107+
- [Unicode Control Characters](https://www.unicode.org/charts/PDF/U0000.pdf).
108+
- [Unicode C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes).
109+
- [Unicode characters with property "WSpace=yes" or "White_Space=yes"](https://en.wikipedia.org/wiki/Unicode_character_property#Whitespace).
110+
- [Java String Literals](https://docs.oracle.com/javase/tutorial/java/data/characters.html).
111+
- [Java Class Charset](https://docs.oracle.com/javase/8/docs/api///?java/nio/charset/Charset.html).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
/**
2+
* @id java/non-explicit-control-and-whitespace-chars-in-literals
3+
* @name Non-explicit control and whitespace characters
4+
* @description Non-explicit control and whitespace characters in literals make code more difficult
5+
* to read and may lead to incorrect program behavior.
6+
* @kind problem
7+
* @precision medium
8+
* @problem.severity warning
9+
* @tags quality
10+
* correctness
11+
* maintainability
12+
* readability
13+
*/
14+
15+
import java
16+
17+
/**
18+
* A `Literal` that has a Unicode control character within its
19+
* literal value (as returned by `getLiteral()` member predicate).
20+
*/
21+
class ReservedUnicodeInLiteral extends Literal {
22+
private int indexStart;
23+
24+
ReservedUnicodeInLiteral() {
25+
not this instanceof CharacterLiteral and
26+
exists(int codePoint |
27+
this.getLiteral().codePointAt(indexStart) = codePoint and
28+
(
29+
// Unicode C0 control characters
30+
codePoint < 32 and not codePoint in [9, 10, 12, 13]
31+
or
32+
codePoint = 127 // delete control character
33+
or
34+
codePoint = 8203 // zero-width space
35+
)
36+
)
37+
}
38+
39+
/** Gets the starting index of the Unicode control sequence. */
40+
int getIndexStart() { result = indexStart }
41+
}
42+
43+
from ReservedUnicodeInLiteral literal, int charIndex, int codePoint
44+
where
45+
literal.getIndexStart() = charIndex and
46+
literal.getLiteral().codePointAt(charIndex) = codePoint and
47+
not literal.getEnclosingCallable() instanceof LikelyTestMethod
48+
select literal,
49+
"Literal value contains control or non-printable whitespace character(s) starting with Unicode code point "
50+
+ codePoint + " at index " + charIndex + "."
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
import java.util.List;
2+
import java.util.ArrayList;
3+
4+
public class CharTest {
5+
6+
public static void main(String[] args) {
7+
CharTest charTest = new CharTest();
8+
NonCompliantStringLiterals nonCompliant = charTest.new NonCompliantStringLiterals();
9+
CompliantStringLiterals compliant = charTest.new CompliantStringLiterals();
10+
CompliantCharLiterals compliantChar = charTest.new CompliantCharLiterals();
11+
12+
List<String> nonCompliantStrings = nonCompliant.getNonCompliantStrings();
13+
List<String> compliantStrings = compliant.getCompliantStrings();
14+
List<Character> compliantChars = compliantChar.getCompliantChars();
15+
16+
System.out.println("");
17+
System.out.println("Non-compliant strings:");
18+
for (String s : nonCompliantStrings) {
19+
System.out.println(s);
20+
System.out.println("");
21+
}
22+
23+
System.out.println("");
24+
System.out.println("Compliant strings:");
25+
for (String s : compliantStrings) {
26+
System.out.println(s);
27+
System.out.println("");
28+
}
29+
System.out.println("");
30+
31+
System.out.println("");
32+
System.out.println("Compliant character literals:");
33+
System.out.println("");
34+
for (Character c : compliantChars) {
35+
System.out.println("\\u" + String.format("%04X", (int) c));
36+
}
37+
System.out.println("");
38+
}
39+
40+
class CompliantCharLiterals {
41+
private List<Character> compliantChars;
42+
43+
public CompliantCharLiterals() {
44+
compliantChars = new ArrayList<>();
45+
compliantChars.add('A'); // COMPLIANT
46+
compliantChars.add('a'); // COMPLIANT
47+
compliantChars.add('\b'); // COMPLIANT
48+
compliantChars.add('\t'); // COMPLIANT
49+
compliantChars.add('\n'); // COMPLIANT
50+
compliantChars.add('\f'); // COMPLIANT
51+
compliantChars.add('\r'); // COMPLIANT
52+
compliantChars.add('\u0000'); // COMPLIANT
53+
compliantChars.add('\u0007'); // COMPLIANT
54+
compliantChars.add('\u001B'); // COMPLIANT
55+
compliantChars.add(' '); // COMPLIANT
56+
compliantChars.add('\u0020'); // COMPLIANT
57+
compliantChars.add('\u200B'); // COMPLIANT
58+
compliantChars.add('\u200C'); // COMPLIANT
59+
compliantChars.add('\u200D'); // COMPLIANT
60+
compliantChars.add('\u2028'); // COMPLIANT
61+
compliantChars.add('\u2029'); // COMPLIANT
62+
compliantChars.add('\u2060'); // COMPLIANT
63+
compliantChars.add('\uFEFF'); // COMPLIANT
64+
}
65+
66+
public List<Character> getCompliantChars() {
67+
return compliantChars;
68+
}
69+
}
70+
71+
class CompliantStringLiterals {
72+
private List<String> compliantStrings;
73+
74+
public CompliantStringLiterals() {
75+
compliantStrings = new ArrayList<>();
76+
compliantStrings.add(""); // COMPLIANT
77+
compliantStrings.add("X__Y"); // COMPLIANT
78+
compliantStrings.add("X_ _Y"); // COMPLIANT
79+
compliantStrings.add("X_\u0020_Y"); // COMPLIANT
80+
compliantStrings.add("X_ _Y"); // COMPLIANT
81+
compliantStrings.add("X_\u0020\u0020_Y"); // COMPLIANT
82+
compliantStrings.add("X_ _Y"); // COMPLIANT
83+
compliantStrings.add("X_ _Y"); // COMPLIANT
84+
compliantStrings.add("X_ _Y"); // COMPLIANT
85+
compliantStrings.add("X_ _Y"); // COMPLIANT
86+
compliantStrings.add("X_\b_Y"); // COMPLIANT
87+
compliantStrings.add("X_\u0000_Y"); // COMPLIANT
88+
compliantStrings.add("X_\u0001_Y"); // COMPLIANT
89+
compliantStrings.add("X_\u0002_Y"); // COMPLIANT
90+
compliantStrings.add("X_\u0003_Y"); // COMPLIANT
91+
compliantStrings.add("X_\u0004_Y"); // COMPLIANT
92+
compliantStrings.add("X_\u0005_Y"); // COMPLIANT
93+
compliantStrings.add("X_\u0006_Y"); // COMPLIANT
94+
compliantStrings.add("X_\u0007_Y"); // COMPLIANT
95+
compliantStrings.add("X_\u0008_Y"); // COMPLIANT
96+
compliantStrings.add("X_\u0009_Y"); // COMPLIANT
97+
compliantStrings.add("X_\u0010_Y"); // COMPLIANT
98+
compliantStrings.add("X_\u0011_Y"); // COMPLIANT
99+
compliantStrings.add("X_\u0012_Y"); // COMPLIANT
100+
compliantStrings.add("X_\u0013_Y"); // COMPLIANT
101+
compliantStrings.add("X_\u0014_Y"); // COMPLIANT
102+
compliantStrings.add("X_\u0015_Y"); // COMPLIANT
103+
compliantStrings.add("X_\u0016_Y"); // COMPLIANT
104+
compliantStrings.add("X_\u0017_Y"); // COMPLIANT
105+
compliantStrings.add("X_\u0018_Y"); // COMPLIANT
106+
compliantStrings.add("X_\u0019_Y"); // COMPLIANT
107+
compliantStrings.add("X_\u001A_Y"); // COMPLIANT
108+
compliantStrings.add("X_\u001B_Y"); // COMPLIANT
109+
compliantStrings.add("X_\u001C_Y"); // COMPLIANT
110+
compliantStrings.add("X_\u001D_Y"); // COMPLIANT
111+
compliantStrings.add("X_\u001E_Y"); // COMPLIANT
112+
compliantStrings.add("X_\u001F_Y"); // COMPLIANT
113+
compliantStrings.add("X_\u007F_Y"); // COMPLIANT
114+
compliantStrings.add("X_\u200B_Y"); // COMPLIANT
115+
compliantStrings.add("X_\u200C_Y"); // COMPLIANT
116+
compliantStrings.add("X_\u200D_Y"); // COMPLIANT
117+
compliantStrings.add("X_\u2028_Y"); // COMPLIANT
118+
compliantStrings.add("X_\u2029_Y"); // COMPLIANT
119+
compliantStrings.add("X_\u2060_Y"); // COMPLIANT
120+
compliantStrings.add("X_\uFEFF_Y"); // COMPLIANT
121+
compliantStrings.add("X_\uFEFF_Y_\u0020_Z"); // COMPLIANT
122+
compliantStrings.add("X_\uFEFF_Y_\uFEFF_Z"); // COMPLIANT
123+
compliantStrings.add("X_\u0020_Y_\uFEFF_Z"); // COMPLIANT
124+
compliantStrings.add("X_\t_Y"); // COMPLIANT
125+
compliantStrings.add("X_\t\t_Y"); // COMPLIANT
126+
compliantStrings.add("X_\\b_Y"); // COMPLIANT
127+
compliantStrings.add("X_\f_Y"); // COMPLIANT
128+
compliantStrings.add("X_\\f_Y"); // COMPLIANT
129+
compliantStrings.add("X_\n_Y"); // COMPLIANT
130+
compliantStrings.add("X_\n\t_Y"); // COMPLIANT
131+
compliantStrings.add("X_\\n_Y"); // COMPLIANT
132+
compliantStrings.add("X_\r_Y"); // COMPLIANT
133+
compliantStrings.add("X_\\r_Y"); // COMPLIANT
134+
compliantStrings.add("X_\t_Y"); // COMPLIANT
135+
compliantStrings.add("X_\\t_Y"); // COMPLIANT
136+
compliantStrings.add("X_\\u0000_Y"); // COMPLIANT
137+
compliantStrings.add("X_\\u0007_Y"); // COMPLIANT
138+
compliantStrings.add("X_\\u001B_Y"); // COMPLIANT
139+
compliantStrings.add("X_\\u200B_Y"); // COMPLIANT
140+
compliantStrings.add("X_\\u200C_Y"); // COMPLIANT
141+
compliantStrings.add("X_\\u200D_Y"); // COMPLIANT
142+
compliantStrings.add("X_\\u2028_Y"); // COMPLIANT
143+
compliantStrings.add("X_\\u2029_Y"); // COMPLIANT
144+
compliantStrings.add("X_\\u2060_Y"); // COMPLIANT
145+
compliantStrings.add("X_\\uFEFF_Y"); // COMPLIANT
146+
compliantStrings.add("lorem ipsum dolor "+"sit amet"); // COMPLIANT
147+
compliantStrings.add("lorem ipsum dolor " + "sit amet"); // COMPLIANT
148+
compliantStrings.add("lorem ipsum dolor sit amet, consectetur adipiscing elit, " + // COMPLIANT
149+
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.");
150+
compliantStrings.add("lorem ipsum dolor sit amet, consectetur adipiscing elit, " + // COMPLIANT
151+
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad "
152+
+ "minim veniam, quis nostrud exercitation ullamco "+"laboris nisi ut aliquip ex " +
153+
"ea commodo consequat.");
154+
compliantStrings.add("""
155+
lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
156+
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
157+
"""); // COMPLIANT
158+
}
159+
160+
public List<String> getCompliantStrings() {
161+
return compliantStrings;
162+
}
163+
}
164+
165+
class NonCompliantStringLiterals {
166+
private List<String> nonCompliantStrings;
167+
168+
public NonCompliantStringLiterals() {
169+
nonCompliantStrings = new ArrayList<>();
170+
nonCompliantStrings.add("X_​_Y"); // NON_COMPLIANT
171+
nonCompliantStrings.add("X_​_Y_​_Z"); // NON_COMPLIANT
172+
nonCompliantStrings.add("lorem​ipsum dolor sit amet,​consectetur adipiscing elit, " + // NON_COMPLIANT
173+
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.");
174+
nonCompliantStrings.add("""
175+
lorem​ipsum dolor sit amet,​consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
176+
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
177+
"""); // NON_COMPLIANT
178+
}
179+
180+
public List<String> getNonCompliantStrings() {
181+
return nonCompliantStrings;
182+
}
183+
}
184+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
| CharTest.java:170:37:170:43 | "X_\u200b_Y" | Literal value contains control or non-printable whitespace character(s) starting with Unicode code point 8203 at index 3. |
2+
| CharTest.java:171:37:171:47 | "X_\u200b_Y_\u200b_Z" | Literal value contains control or non-printable whitespace character(s) starting with Unicode code point 8203 at index 3. |
3+
| CharTest.java:171:37:171:47 | "X_\u200b_Y_\u200b_Z" | Literal value contains control or non-printable whitespace character(s) starting with Unicode code point 8203 at index 7. |
4+
| CharTest.java:172:37:173:80 | "lorem\u200bipsum dolor sit amet,\u200bconsectetur adipiscing elit, " + // NON_COMPLIANT\n "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." | Literal value contains control or non-printable whitespace character(s) starting with Unicode code point 8203 at index 6. |
5+
| CharTest.java:172:37:173:80 | "lorem\u200bipsum dolor sit amet,\u200bconsectetur adipiscing elit, " + // NON_COMPLIANT\n "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." | Literal value contains control or non-printable whitespace character(s) starting with Unicode code point 8203 at index 28. |
6+
| CharTest.java:174:37:177:15 | """\n lorem\u200bipsum dolor sit amet,\u200bconsectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\n """ | Literal value contains control or non-printable whitespace character(s) starting with Unicode code point 8203 at index 25. |
7+
| CharTest.java:174:37:177:15 | """\n lorem\u200bipsum dolor sit amet,\u200bconsectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\n """ | Literal value contains control or non-printable whitespace character(s) starting with Unicode code point 8203 at index 47. |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Violations of Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.ql
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
semmle-extractor-options: --javac-args -source 15 -target 15

0 commit comments

Comments
 (0)