-
Notifications
You must be signed in to change notification settings - Fork 79
Fix NumberFormatException in NPM Package Json Parse #1435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -96,12 +96,20 @@ private String extractLowestVersion(String value) { | |||
// Remove npm version selection characters that the KB won't match on | |||
.map(part -> part.replaceAll("[>=<~^]", "")) | |||
// Filter out parts that don't match the version pattern | |||
.filter(part -> part.matches("\\d+\\.\\d+\\.\\d+|\\d+\\.\\d+|\\d+")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly trying to understand what is going on before reviewing the changes. My understanding was that if ^17.0.0-1551262265873 came into this lambda that the replaceAll call would remove the ^ and then 17.0.0-1551262265873 would fail the part.matches filter because of the - and never get to the min call.
I must be missing something because I believe multiple devs have confirmed this passes the filter and blows up in the semVerComparator check but I guess I'm wondering a) how is it getting by the filter and b) can we perhaps fix the filter's regex instead of imposing the version character limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thoughtful query, David. Here’s what’s actually happening:
-
Input:
^17.0.0-1551262265873
-
The string is first split using
value.split("\\s+|\\|\\||-")
, so it's broken into:"^17.0.0"
"1551262265873"
-
Then
.replaceAll("[>=<~^]", "")
is applied to each part:"^17.0.0"
becomes"17.0.0"
"1551262265873"
stays the same
-
Both parts are passed to the version filter. The earlier regex (
\\d+\\.\\d+\\.\\d+|\\d+\\.\\d+|\\d+
) accepts"1551262265873"
because it matches\\d+
. -
That long numeric value is passed to
SemVerComparator
, which attempts to parse it usingInteger.parseInt
— resulting in aNumberFormatException
.
To avoid that, I added the isProbableVersion()
helper to guard against purely numeric values that are too long (likely timestamps or hashes rather than real versions).
Lines 108 to 114 in 9b03c9d
private boolean isProbableVersion(String part) { | |
// If purely numeric and very long, it's likely a timestamp/hash | |
if (part.matches("\\d{6,}")) return false; | |
// Accept X, X.Y, X.Y.Z format | |
return part.matches("\\d+(\\.\\d+){0,2}"); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good discussion.
I wonder if we should try to differentiate between situations when -
is used as a range operator vs when it's part of the version. It seems (but worth double-checking) like if it's a range operator then there must be spaces around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the very helpful explanation, I had ignored the split call.
I looked at the semver rules and a version must be a non-negative integer. That's not super helpful as an integer is kind of defined by the system and language.
I wonder if instead of limiting to 5 or 6 characters we could try to assign the variable part to an integer and then catch the NumberFormatException and return false if things don't go well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting idea. I'll try this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good discussion.
I wonder if we should try to differentiate between situations when
-
is used as a range operator vs when it's part of the version. It seems (but worth double-checking) like if it's a range operator then there must be spaces around it.
The current implementation doesn't consider hyphen (-
) as range operator. Rather it just splits the input version based on space or hyphen. And, if the parts are numeric, then it accepts the minimum value among all, using semantic version comparator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved but left another comment upon second thought.
assertEquals("1.2.0", extractor.extractLowestVersion("~1.2.x")); | ||
assertEquals("1.2.3", extractor.extractLowestVersion("1.2.3-beta.2")); | ||
assertEquals("5.0.0", extractor.extractLowestVersion( "5.0.0-alpha+build.1")); | ||
assertEquals("3.0.0", extractor.extractLowestVersion("3.0.0-abc123")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test like assertEquals("3.0.0", extractor.extractLowestVersion("3.0.0-1"));
will fail here because we're treating the -
as an operator instead of as part of the version.
This is why I was suggesting treating -
as an operator only when it's surrounded by spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the extractLowestVersion
method to treat range separators (" - "
) and pre-release hyphens ("-"
) differently by using this split pattern:
String[] parts = value.split("\\s+|\\|\\||\\s-\\s");
This ensures that we only split on version ranges like "1.0.0 - 2.0.0"
(as used in npm), space, and logical ORs (||
), but preserve valid pre-release versions like "3.0.0-1"
.
Additionally, the line:
.map(part -> part.replaceAll("[-+].*", ""))
is used to strip off pre-release and build metadata.
So for the test:
assertEquals("3.0.0", extractor.extractLowestVersion("3.0.0-1"));
this still passes because:
- The input is a single version (not a range), so it’s not split.
- The pre-release
-1
is stripped, resulting in3.0.0
.
References:
Few additional tests were added to validate the changes.
… into dev/zahidblackduck/IDETECT-4721
String[] parts = value.split("\\s+|\\|\\||-"); | ||
String lowestVersion = Arrays.stream(parts) | ||
// Split the value into parts by spaces, "||", or " - ". | ||
String[] parts = value.split("\\s+|\\|\\||\\s-\\s"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the \\s-\\s
part never actually gets matched now because \\s+
will get matched first.
This however works out in the end because on line 100 standalone -
characters get transformed to empty string and at line 102 the empty string gets filtered out.
JIRA Ticket
IDETECT-4721
Description
Issue
The NPM Package Json Parse failed with a
NumberFormatException
when parsing version strings like^17.0.0-1551262265873
. The numeric suffix was mistakenly treated as a valid version.Our version parsing logic splits the input string using spaces, ||, or -. This mistakenly accepted long numeric segments (e.g. 1551262265873), which are likely timestamps or hashes, as valid versions due to the regex match. And thus causing a
NumberFormatException
in theSemVerComparator#compare(..)
method.Fix
Added a helper method
isProbableVersion
to validate version tokens:X
,X.Y
,X.Y.Z
.