Fix NumberFormatException in NPM Package Json Parse #1435

zahidblackduck · 2025-05-22T06:07:22Z

JIRA Ticket

IDETECT-4721

Description

Issue

The NPM Package Json Parse failed with a NumberFormatException when parsing version strings like ^17.0.0-1551262265873. The numeric suffix was mistakenly treated as a valid version.

Our version parsing logic splits the input string using spaces, ||, or -. This mistakenly accepted long numeric segments (e.g. 1551262265873), which are likely timestamps or hashes, as valid versions due to the regex match. And thus causing a NumberFormatException in the SemVerComparator#compare(..) method.

Fix

Added a helper method isProbableVersion to validate version tokens:

Accepts standard semver formats: X, X.Y, X.Y.Z.
Rejects purely numeric strings longer than 5 digits based on the intuition that valid version components rarely exceed 5 digits, whereas timestamps or hashes typically do.

dterrybd · 2025-05-22T15:09:22Z

...a/com/blackduck/integration/detectable/detectables/npm/packagejson/PackageJsonExtractor.java

@@ -96,12 +96,20 @@ private String extractLowestVersion(String value) {
            // Remove npm version selection characters that the KB won't match on
            .map(part -> part.replaceAll("[>=<~^]", ""))
            // Filter out parts that don't match the version pattern
-            .filter(part -> part.matches("\\d+\\.\\d+\\.\\d+|\\d+\\.\\d+|\\d+"))


Mostly trying to understand what is going on before reviewing the changes. My understanding was that if ^17.0.0-1551262265873 came into this lambda that the replaceAll call would remove the ^ and then 17.0.0-1551262265873 would fail the part.matches filter because of the - and never get to the min call.

I must be missing something because I believe multiple devs have confirmed this passes the filter and blows up in the semVerComparator check but I guess I'm wondering a) how is it getting by the filter and b) can we perhaps fix the filter's regex instead of imposing the version character limit?

Thanks for the thoughtful query, David. Here’s what’s actually happening:

Input: ^17.0.0-1551262265873

The string is first split using value.split("\\s+|\\|\\||-"), so it's broken into:

"^17.0.0"

"1551262265873"

Then .replaceAll("[>=<~^]", "") is applied to each part:

"^17.0.0" becomes "17.0.0"

"1551262265873" stays the same

Both parts are passed to the version filter. The earlier regex (\\d+\\.\\d+\\.\\d+|\\d+\\.\\d+|\\d+) accepts "1551262265873" because it matches \\d+.

That long numeric value is passed to SemVerComparator, which attempts to parse it using Integer.parseInt — resulting in a NumberFormatException.

To avoid that, I added the isProbableVersion() helper to guard against purely numeric values that are too long (likely timestamps or hashes rather than real versions).

detect/detectable/src/main/java/com/blackduck/integration/detectable/detectables/npm/packagejson/PackageJsonExtractor.java

Lines 108 to 114 in 9b03c9d

private boolean isProbableVersion(String part) {

// If purely numeric and very long, it's likely a timestamp/hash

if (part.matches("\\d{6,}")) return false;

// Accept X, X.Y, X.Y.Z format

return part.matches("\\d+(\\.\\d+){0,2}");

}

Good discussion.

I wonder if we should try to differentiate between situations when - is used as a range operator vs when it's part of the version. It seems (but worth double-checking) like if it's a range operator then there must be spaces around it.

Thanks for the very helpful explanation, I had ignored the split call.

I looked at the semver rules and a version must be a non-negative integer. That's not super helpful as an integer is kind of defined by the system and language.

I wonder if instead of limiting to 5 or 6 characters we could try to assign the variable part to an integer and then catch the NumberFormatException and return false if things don't go well?

That's an interesting idea. I'll try this out.

Good discussion.

I wonder if we should try to differentiate between situations when - is used as a range operator vs when it's part of the version. It seems (but worth double-checking) like if it's a range operator then there must be spaces around it.

The current implementation doesn't consider hyphen (-) as range operator. Rather it just splits the input version based on space or hyphen. And, if the parts are numeric, then it accepts the minimum value among all, using semantic version comparator.

andrian-sevastyanov

Approved but left another comment upon second thought.

andrian-sevastyanov · 2025-05-28T14:33:48Z

...ckduck/integration/detectable/detectables/npm/packagejson/unit/PackageJsonExtractorTest.java

+        assertEquals("1.2.0", extractor.extractLowestVersion("~1.2.x"));
+        assertEquals("1.2.3", extractor.extractLowestVersion("1.2.3-beta.2"));
+        assertEquals("5.0.0", extractor.extractLowestVersion( "5.0.0-alpha+build.1"));
+        assertEquals("3.0.0", extractor.extractLowestVersion("3.0.0-abc123"));


A test like assertEquals("3.0.0", extractor.extractLowestVersion("3.0.0-1")); will fail here because we're treating the - as an operator instead of as part of the version.
This is why I was suggesting treating - as an operator only when it's surrounded by spaces.

I've updated the extractLowestVersion method to treat range separators (" - ") and pre-release hyphens ("-") differently by using this split pattern:

String[] parts = value.split("\\s+|\\|\\||\\s-\\s");

This ensures that we only split on version ranges like "1.0.0 - 2.0.0" (as used in npm), space, and logical ORs (||), but preserve valid pre-release versions like "3.0.0-1".

Additionally, the line:

.map(part -> part.replaceAll("[-+].*", ""))

is used to strip off pre-release and build metadata.

So for the test:

assertEquals("3.0.0", extractor.extractLowestVersion("3.0.0-1"));

this still passes because:

The input is a single version (not a range), so it’s not split.

The pre-release -1 is stripped, resulting in 3.0.0.

References:

SemVer spec - Item 9

npm semver range syntax

Few additional tests were added to validate the changes.

…IDETECT-4721

… into dev/zahidblackduck/IDETECT-4721

andrian-sevastyanov · 2025-06-03T14:35:43Z

...a/com/blackduck/integration/detectable/detectables/npm/packagejson/PackageJsonExtractor.java

-        String[] parts = value.split("\\s+|\\|\\||-");
-        String lowestVersion = Arrays.stream(parts)
+        // Split the value into parts by spaces, "||", or " - ".
+        String[] parts = value.split("\\s+|\\|\\||\\s-\\s");


I believe the \\s-\\s part never actually gets matched now because \\s+ will get matched first.
This however works out in the end because on line 100 standalone - characters get transformed to empty string and at line 102 the empty string gets filtered out.

zahidblackduck added 2 commits May 20, 2025 15:28

extract version regex update in package json extractor

e981617

unit test add for extract lowest version method

9b03c9d

zahidblackduck changed the title ~~Fix NumberFormatException in NPM Package Json Parse for Long Numeric Pre-release or Timestamp Tags~~ Fix NumberFormatException in NPM Package Json Parse May 22, 2025

zahidblackduck requested review from dterrybd, andrian-sevastyanov, devmehtabd and shantyk May 22, 2025 06:12

zahidblackduck self-assigned this May 22, 2025

dterrybd reviewed May 22, 2025

View reviewed changes

andrian-sevastyanov approved these changes May 26, 2025

View reviewed changes

andrian-sevastyanov requested changes May 26, 2025

View reviewed changes

semantic version comparator update to handle number format exception

698830b

zahidblackduck requested review from dterrybd and andrian-sevastyanov May 28, 2025 13:27

andrian-sevastyanov reviewed May 28, 2025

View reviewed changes

andrian-sevastyanov approved these changes May 28, 2025

View reviewed changes

zahidblackduck added 3 commits June 3, 2025 17:48

Merge remote-tracking branch 'origin/master' into dev/zahidblackduck/…

d8c4ebf

…IDETECT-4721

Merge remote-tracking branch 'origin/dev/zahidblackduck/IDETECT-4721'…

86fdbe7

… into dev/zahidblackduck/IDETECT-4721

range operator respected while parsing version from package.json

14ee913

zahidblackduck requested a review from andrian-sevastyanov June 3, 2025 13:33

andrian-sevastyanov approved these changes Jun 3, 2025

View reviewed changes

andrian-sevastyanov reviewed Jun 3, 2025

View reviewed changes

dterrybd approved these changes Jun 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix NumberFormatException in NPM Package Json Parse #1435

Fix NumberFormatException in NPM Package Json Parse #1435

Uh oh!

zahidblackduck commented May 22, 2025 •

edited

Loading

Uh oh!

dterrybd May 22, 2025

Uh oh!

zahidblackduck May 26, 2025

Uh oh!

andrian-sevastyanov May 26, 2025

Uh oh!

dterrybd May 27, 2025 •

edited

Loading

Uh oh!

zahidblackduck May 28, 2025

Uh oh!

zahidblackduck May 28, 2025

Uh oh!

andrian-sevastyanov left a comment

Uh oh!

andrian-sevastyanov May 28, 2025

Uh oh!

zahidblackduck Jun 3, 2025 •

edited

Loading

Uh oh!

andrian-sevastyanov Jun 3, 2025

Uh oh!

Uh oh!

	private boolean isProbableVersion(String part) {
	// If purely numeric and very long, it's likely a timestamp/hash
	if (part.matches("\\d{6,}")) return false;

	// Accept X, X.Y, X.Y.Z format
	return part.matches("\\d+(\\.\\d+){0,2}");
	}

Fix NumberFormatException in NPM Package Json Parse #1435

Are you sure you want to change the base?

Fix NumberFormatException in NPM Package Json Parse #1435

Uh oh!

Conversation

zahidblackduck commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dterrybd May 22, 2025

Choose a reason for hiding this comment

Uh oh!

zahidblackduck May 26, 2025

Choose a reason for hiding this comment

Uh oh!

andrian-sevastyanov May 26, 2025

Choose a reason for hiding this comment

Uh oh!

dterrybd May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zahidblackduck May 28, 2025

Choose a reason for hiding this comment

Uh oh!

zahidblackduck May 28, 2025

Choose a reason for hiding this comment

Uh oh!

andrian-sevastyanov left a comment

Choose a reason for hiding this comment

Uh oh!

andrian-sevastyanov May 28, 2025

Choose a reason for hiding this comment

Uh oh!

zahidblackduck Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrian-sevastyanov Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zahidblackduck commented May 22, 2025 •

edited

Loading

dterrybd May 27, 2025 •

edited

Loading

zahidblackduck Jun 3, 2025 •

edited

Loading