Skip to content

T-SQL's CHECKSUM() algorithm reverse engineered and implemented in JavaScript

Notifications You must be signed in to change notification settings

neilodonuts/tsql-checksum-javascript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tsql-checksum-javascript

T-SQL's CHECKSUM() algorithm reverse engineered and implemented in JavaScript

What is the algorithm? It's a simple circular (rotate) shift left by a nibble (4 bits) and a XOR of each character translated into a "code". The code is not ASCII but it is related because lowercase letters bear the same codes as their uppercase counterparts and digits have codes just before "A". To see the codes and the algorithm itself, take a peek at tsql-checksum.js!

Limitations / Disclaimer

My work has focused only on ASCII strings using the (VARCHAR) data type and on SQL Server instances with the default collation setting: SQL_Latin1_General_CP1_CI_AS. I have no doubt the same algorithm is used but with different binary input and XOR codes for other collation and data types.

Background

I got started on this quest after posting what I thought was an innocuous question on StackOverflow:

How to Implement T-SQL CHECKSUM() in JavaScript for BigQuery?

A semi-helpful comment appeared soon after with a link that led me down a rabbit-hole and not too much after a downvote and a comment telling me quite frankly NOT to use this dated and terrible hash function. I then spent the remaining work day attempting to figure out the shoddy algorithm which none-the-less is a practically immortal part of Microsoft SQL Server. Knowing precisely how it works, I reasoned, would be helpful to anyone whether they're maintaining code that uses it or simply needing convincing why not to use it.

Near the end of the day I found that the SQL code posted for the algorithm (from 2006 on sqlteam.com) is actually for BINARY_CHECKSUM() rather than CHECKSUM(). Doh! This repo will not be for that function for which code has already been created that seems to work in my brief tests of it.

Thus I created this project to take over where those left off and to solve the mystery that no one was bothered by. I knew it wouldn't be too complicated when I noticed very specific patterns between one and two character ASCII combinations:

Figure 1-1

The data/pairs.csv contains the CHECKSUM() output for all ASCII pairs but be careful with those code points above 128; that stuff is currently untested in my implementation.

SQL Server in VirtualBox

I'm running SQL Server (Express) in VirtualBox on a Windows 10 VM. This is a free thing provided by Microsoft but it's not without hazards. I routinely mess up the configuration so that I am unable to connect to it. So here's my reference checklist:

  1. NAT port forwarding for 1433 on VirtualBox: use 127.0.0.1 on host and blank on guest so that port is only open to the host itself.
  2. TCP/IP protocol enabled in SQL Server Configuration Manager.
  3. Windows Firewall rule to allow SQL Server (or turned off) on both public and private networks,.
  4. SQL Server Browser service is enabled and automatically started.

I set this up myself and later found a nice tutorial on StackOverflow.

Free Windows VMs provided by Microsoft:

Links

About

T-SQL's CHECKSUM() algorithm reverse engineered and implemented in JavaScript

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published