Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Sha1HashKey to CityHash128Key for generating PreparedStatement handle and metadata cache keys #717

Merged
merged 12 commits into from
Jul 9, 2018
64 changes: 28 additions & 36 deletions src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,6 @@
import javax.sql.XAConnection;

import org.ietf.jgss.GSSCredential;
import org.ietf.jgss.GSSException;

import mssql.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap;
import mssql.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap.Builder;
import mssql.googlecode.concurrentlinkedhashmap.EvictionListener;
Expand Down Expand Up @@ -119,37 +117,31 @@ public class SQLServerConnection implements ISQLServerConnection {

private String originalHostNameInCertificate = null;

static class Sha1HashKey {
private byte[] bytes;
static class CityHash128Key {
private long[] segments;
private int hashCode;

Sha1HashKey(String sql,
CityHash128Key(String sql,
String parametersDefinition) {
this(String.format("%s%s", sql, parametersDefinition));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String.format() is a terrible way to concatenate two Strings. If I wrote this in my original commit, I must have been high. Just change to this(sql + parametersDefinition).

Copy link

@RDearnaley RDearnaley Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having taken another look, yes, this showed up in my YourKit runs -- in fact I saw it roughly as much as the character encoding issue we've been discussing above.

}

Sha1HashKey(String s) {
bytes = getSha1Digest().digest(s.getBytes());
CityHash128Key(String s) {
segments = CityHash.cityHash128(s.getBytes(), 0, s.length());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance of this can be improved, especially in the case of large SQL strings. s.getBytes() ends up calling StringCoding.encode(...), which converts the internal String chars to bytes using the platform default encoding.

I recommend using s.getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin), which, while deprecated (but will never be removed), has substantially higher performance.

CityHash128Key(String s) {
   byte[] bytes = byte[s.length()];
   s.getBytes(0, s.length(), bytes, 0);
   segments = CityHash.cityHash128(bytes, 0, s.length());
}

Copy link

@RDearnaley RDearnaley Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) truncates the upper 8 bits of Java's internal Unicode-16 representation. Since this is SQL before inserting parameter values, it should be mostly 7-bit ASCII -- the only exception I can think of would be database names/table names/column names, at least some of which can be full Unicode. For European or other alphabetic languages, their Unicode-16 spaces tend to be 256 characters wide and the odds of this truncation causing a spurious collision seem very small -- but I'd be less confident if the language in question was, say, Chinese, that you couldn't ever have two SQLs that differed only by one ideogram in a table/column name, and the two ideograms in question unfortunately happened to have the same lower 8 bits. It's not very likely, but it's significantly less astronomically unlikely than a 128-bit hash collision.

Copy link

@RDearnaley RDearnaley Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes, we need to avoid character reencoding, but I think we also need to avoid throwing away hash entropy while we do so. I was wondering about something like getBytes(Charset.UTF_16BE) -- if it matches the internal representation used by String, will it skip reencoding? So maybe something like:

   byte[] bytes = s.getBytes(Charset.UTF_16BE);  // avoid character reencoding
   segments = CityHash.cityHash128(bytes, 0, bytes.length);  // bytes.length = 2*s.length() + 2, due to Byte Order Mark?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RDearnaley Looking at the code of StringCoding.encode() and the various encoder subclasses, I don't find any optimization for UTF_16BE. 😞

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RDearnaley It appears only option would be a custom Charset and CharsetEncoder. It is possible, as there is a CharsetProvider SPI available. The "raw encoder" would simply "encode" chars as pairs of bytes (upper 8-bits, lower 8-bits). The cost is a 2x sized byte array due to the naive approach, but possibly worth it as the byte array is quickly garbage collected, or could potentially be cached as a WeakReference in a ThreadLocal inside of the CityHash128Key class.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How strange -- yes, taking a look in the Java code for them, it appears that none of UTF_16BE, UTF_16 etc make use of the obvious optimization that one of them has to be a no-op. I'm wondering if someone might have already created a raw double byte encoding to speed this up -- it seems a fairly obvious performance optimization for cases like this where you care about the entropy content rather than the specific values. If not, it might make a good addition to the Java version of CityHash

Copy link

@RDearnaley RDearnaley Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another possible approach whose speed would be worth testing:

private static byte[] toBytes( final String str )
{
    final int len = str.length();
    final char[] chars = str.getChars();
    final byte[] buff = new byte[2 * len];
    for (int i = 0, j = 0; i < len; i++) {
        buff[j++] = (byte) chars[i];  // Or str.charAt(i) and skipping str.getChars() might be faster?
        buff[j++] = (byte) (chars[i] >>> 8);  // Or >> might be faster?
    }
    return buff;
}

or just inline this since we only use it once. Old-school, I know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course it would work, but getChars() is going to create an additional copy, incurring a CPU hit as well as increasing GC pressure. Regarding inlining, probably unnecessary as the JIT will inline it when it gets hot.

I think a custom encoder would offer better performance, as the internal char array is passed without copy.

Copy link

@RDearnaley RDearnaley Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the str.charAt(i) approach also be making a copy, or would that just go on the stack/in a register?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str.charAt(i) is just a simple array access, with the value passed on the stack. If the compiler inlines the call, it could be a viable option. You'd have to find it in the output of -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:+PrintCompilation.

}

public boolean equals(Object obj) {
if (!(obj instanceof Sha1HashKey))
if (!(obj instanceof CityHash128Key))
return false;

return java.util.Arrays.equals(bytes, ((Sha1HashKey)obj).bytes);
return java.util.Arrays.equals(segments, ((CityHash128Key) obj).segments);
}

public int hashCode() {
return java.util.Arrays.hashCode(bytes);
}

private java.security.MessageDigest getSha1Digest() {
try {
return java.security.MessageDigest.getInstance("SHA-1");
}
catch (final java.security.NoSuchAlgorithmException e) {
// This is not theoretically possible, but we're forced to catch it anyway
throw new RuntimeException(e);
if (0 == hashCode) {
hashCode = java.util.Arrays.hashCode(segments);
}
return hashCode;
}
}

Expand All @@ -162,9 +154,9 @@ class PreparedStatementHandle {
private boolean isDirectSql;
private volatile boolean evictedFromCache;
private volatile boolean explicitlyDiscarded;
private Sha1HashKey key;
private CityHash128Key key;

PreparedStatementHandle(Sha1HashKey key, int handle, boolean isDirectSql, boolean isEvictedFromCache) {
PreparedStatementHandle(CityHash128Key key, int handle, boolean isDirectSql, boolean isEvictedFromCache) {
this.key = key;
this.handle = handle;
this.isDirectSql = isDirectSql;
Expand Down Expand Up @@ -200,7 +192,7 @@ int getHandle() {
}

/** Get the cache key. */
Sha1HashKey getKey() {
CityHash128Key getKey() {
return key;
}

Expand Down Expand Up @@ -248,21 +240,21 @@ void removeReference() {
static final private int PARSED_SQL_CACHE_SIZE = 100;

/** Cache of parsed SQL meta data */
static private ConcurrentLinkedHashMap<Sha1HashKey, ParsedSQLCacheItem> parsedSQLCache;
static private ConcurrentLinkedHashMap<CityHash128Key, ParsedSQLCacheItem> parsedSQLCache;

static {
parsedSQLCache = new Builder<Sha1HashKey, ParsedSQLCacheItem>()
parsedSQLCache = new Builder<CityHash128Key, ParsedSQLCacheItem>()
.maximumWeightedCapacity(PARSED_SQL_CACHE_SIZE)
.build();
}

/** Get prepared statement cache entry if exists, if not parse and create a new one */
static ParsedSQLCacheItem getCachedParsedSQL(Sha1HashKey key) {
static ParsedSQLCacheItem getCachedParsedSQL(CityHash128Key key) {
return parsedSQLCache.get(key);
}

/** Parse and create a information about parsed SQL text */
static ParsedSQLCacheItem parseAndCacheSQL(Sha1HashKey key, String sql) throws SQLServerException {
static ParsedSQLCacheItem parseAndCacheSQL(CityHash128Key key, String sql) throws SQLServerException {
JDBCSyntaxTranslator translator = new JDBCSyntaxTranslator();

String parsedSql = translator.translate(sql);
Expand All @@ -282,9 +274,9 @@ static ParsedSQLCacheItem parseAndCacheSQL(Sha1HashKey key, String sql) throws
private int statementPoolingCacheSize = DEFAULT_STATEMENT_POOLING_CACHE_SIZE;

/** Cache of prepared statement handles */
private ConcurrentLinkedHashMap<Sha1HashKey, PreparedStatementHandle> preparedStatementHandleCache;
private ConcurrentLinkedHashMap<CityHash128Key, PreparedStatementHandle> preparedStatementHandleCache;
/** Cache of prepared statement parameter metadata */
private ConcurrentLinkedHashMap<Sha1HashKey, SQLServerParameterMetaData> parameterMetadataCache;
private ConcurrentLinkedHashMap<CityHash128Key, SQLServerParameterMetaData> parameterMetadataCache;
/**
* Checks whether statement pooling is enabled or disabled. The default is set to true;
*/
Expand Down Expand Up @@ -5808,39 +5800,39 @@ public void setStatementPoolingCacheSize(int value) {
* @param value
*/
private void prepareCache() {
preparedStatementHandleCache = new Builder<Sha1HashKey, PreparedStatementHandle>().maximumWeightedCapacity(getStatementPoolingCacheSize())
preparedStatementHandleCache = new Builder<CityHash128Key, PreparedStatementHandle>().maximumWeightedCapacity(getStatementPoolingCacheSize())
.listener(new PreparedStatementCacheEvictionListener()).build();

parameterMetadataCache = new Builder<Sha1HashKey, SQLServerParameterMetaData>().maximumWeightedCapacity(getStatementPoolingCacheSize())
parameterMetadataCache = new Builder<CityHash128Key, SQLServerParameterMetaData>().maximumWeightedCapacity(getStatementPoolingCacheSize())
.build();
}

/** Get a parameter metadata cache entry if statement pooling is enabled */
final SQLServerParameterMetaData getCachedParameterMetadata(Sha1HashKey key) {
final SQLServerParameterMetaData getCachedParameterMetadata(CityHash128Key key) {
if(!isStatementPoolingEnabled())
return null;

return parameterMetadataCache.get(key);
}

/** Register a parameter metadata cache entry if statement pooling is enabled */
final void registerCachedParameterMetadata(Sha1HashKey key, SQLServerParameterMetaData pmd) {
final void registerCachedParameterMetadata(CityHash128Key key, SQLServerParameterMetaData pmd) {
if(!isStatementPoolingEnabled() || null == pmd)
return;

parameterMetadataCache.put(key, pmd);
}

/** Get or create prepared statement handle cache entry if statement pooling is enabled */
final PreparedStatementHandle getCachedPreparedStatementHandle(Sha1HashKey key) {
final PreparedStatementHandle getCachedPreparedStatementHandle(CityHash128Key key) {
if(!isStatementPoolingEnabled())
return null;

return preparedStatementHandleCache.get(key);
}

/** Get or create prepared statement handle cache entry if statement pooling is enabled */
final PreparedStatementHandle registerCachedPreparedStatementHandle(Sha1HashKey key, int handle, boolean isDirectSql) {
final PreparedStatementHandle registerCachedPreparedStatementHandle(CityHash128Key key, int handle, boolean isDirectSql) {
if(!isStatementPoolingEnabled() || null == key)
return null;

Expand All @@ -5866,8 +5858,8 @@ final void evictCachedPreparedStatementHandle(PreparedStatementHandle handle) {
}

// Handle closing handles when removed from cache.
final class PreparedStatementCacheEvictionListener implements EvictionListener<Sha1HashKey, PreparedStatementHandle> {
public void onEviction(Sha1HashKey key, PreparedStatementHandle handle) {
final class PreparedStatementCacheEvictionListener implements EvictionListener<CityHash128Key, PreparedStatementHandle> {
public void onEviction(CityHash128Key key, PreparedStatementHandle handle) {
if(null != handle) {
handle.setIsEvictedFromCache(true); // Mark as evicted from cache.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
import java.util.logging.Level;

import com.microsoft.sqlserver.jdbc.SQLServerConnection.PreparedStatementHandle;
import com.microsoft.sqlserver.jdbc.SQLServerConnection.Sha1HashKey;
import com.microsoft.sqlserver.jdbc.SQLServerConnection.CityHash128Key;

/**
* SQLServerPreparedStatement provides JDBC prepared statement functionality. SQLServerPreparedStatement provides methods for the user to supply
Expand Down Expand Up @@ -74,7 +74,7 @@ public class SQLServerPreparedStatement extends SQLServerStatement implements IS
private PreparedStatementHandle cachedPreparedStatementHandle;

/** Hash of user supplied SQL statement used for various cache lookups */
private Sha1HashKey sqlTextCacheKey;
private CityHash128Key sqlTextCacheKey;

/**
* Array with parameter names generated in buildParamTypeDefinitions For mapping encryption information to parameters, as the second result set
Expand Down Expand Up @@ -187,7 +187,7 @@ String getClassNameInternal() {
stmtPoolable = true;

// Create a cache key for this statement.
sqlTextCacheKey = new Sha1HashKey(sql);
sqlTextCacheKey = new CityHash128Key(sql);

// Parse or fetch SQL metadata from cache.
ParsedSQLCacheItem parsedSQL = getCachedParsedSQL(sqlTextCacheKey);
Expand Down Expand Up @@ -617,7 +617,7 @@ boolean onRetValue(TDSReader tdsReader) throws SQLServerException {
// Cache the reference to the newly created handle, NOT for cursorable handles.
if (null == cachedPreparedStatementHandle && !isCursorable(executeMethod)) {
cachedPreparedStatementHandle = connection.registerCachedPreparedStatementHandle(
new Sha1HashKey(preparedSQL, preparedTypeDefinitions), prepStmtHandle, executedSqlDirectly);
new CityHash128Key(preparedSQL, preparedTypeDefinitions), prepStmtHandle, executedSqlDirectly);
}

param.skipValue(tdsReader, true);
Expand Down Expand Up @@ -962,7 +962,7 @@ private boolean reuseCachedHandle(boolean hasNewTypeDefinitions,

// Check for new cache reference.
if (null == cachedPreparedStatementHandle) {
PreparedStatementHandle cachedHandle = connection.getCachedPreparedStatementHandle(new Sha1HashKey(preparedSQL, preparedTypeDefinitions));
PreparedStatementHandle cachedHandle = connection.getCachedPreparedStatementHandle(new CityHash128Key(preparedSQL, preparedTypeDefinitions));
// If handle was found then re-use, only if AE is not on and is not a batch query with new type definitions (We shouldn't reuse handle
// if it is batch query and has new type definition, or if it is on, make sure encryptionMetadataIsRetrieved is retrieved.
if (null != cachedHandle) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import com.microsoft.sqlserver.jdbc.SQLServerConnection.Sha1HashKey;
import com.microsoft.sqlserver.jdbc.SQLServerConnection.CityHash128Key;

/**
* SQLServerStatment provides the basic implementation of JDBC statement functionality. It also provides a number of base class implementation methods
Expand Down Expand Up @@ -786,7 +786,7 @@ final void processResponse(TDSReader tdsReader) throws SQLServerException {
private String ensureSQLSyntax(String sql) throws SQLServerException {
if (sql.indexOf(LEFT_CURLY_BRACKET) >= 0) {

Sha1HashKey cacheKey = new Sha1HashKey(sql);
CityHash128Key cacheKey = new CityHash128Key(sql);

// Check for cached SQL metadata.
ParsedSQLCacheItem cacheItem = getCachedParsedSQL(cacheKey);
Expand Down
Loading