"No match found" when SHA-256 hash is lowercase and starts with 8+ numbers #165
Description
Description
When a lowercase SHA-256 hash that starts with 9 numbers is copied to the clipboard, OpenHashTab is not able to match the algorithm.
Steps to reproduce
-
Configure OpenHashTab with the settings shown at the bottom of this post.
-
Download Logseq-win-x64-0.9.5.exe (direct link) and SHA256SUMS.txt (direct link) from the Logseq 0.9.5 tag page.
-
Open SHA256SUMS.txt and copy the SHA-256 checksum of Logseq-win-x64-0.9.5.exe to the clipboard.
-
Open the Explorer properties for Logseq-win-x64-0.9.5.exe and go to the Hashes tab.
-
Observe that there is "No match found" for the SHA-256 hash in the clipboard. In addition, CRC-32 and XXH-32 (8-character long algorithms) are unexpectedly enabled.
-
Close the properties dialog.
-
Paste the lowercase SHA-256 checksum into Notepad++, highlight the hash, and press
[CTRL]+[SHIFT]+U
to uppercase the checksum. Copy the uppercase checksum into the clipboard. -
Open the Explorer properties again for Logseq-win-x64-0.9.5.exe and go to the Hashes tab.
-
Observe that the uppercase checksum is recognized and valid.
Things I tested
I attempted to isolate the issue in these ways:
- I toggled all four combinations of the "Display hashes in uppercase" and "Export hashes in uppercase" settings and tested against lowercase versions of the checksum. In all cases, the lowercase hash resulted in "No match found".
- I tested other arbitrary files on my computer to reproduce the issue, but found that OpenHashTab correctly matched other files on my computer against their own lowercase SHA-256 checksums.
Suspected source of bug
I noticed that the first 9 characters of the checksum are numbers, and that short 8-character hash algorithms kept getting enabled for this specific checksum (CRC32 and XXH32) which made me think that the source of the bug might be the algorithm detection. This led me to the following location:
OpenHashTab/OpenHashTab/utl.cpp
Lines 36 to 43 in 0263c62
I think that the regular expression used here may be matching against the uppercase variant path and is extracting only the first 8 characters.