Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gson synchronized map replaced with concurrent hash map #806

Merged
merged 2 commits into from
Mar 5, 2016
Merged

Gson synchronized map replaced with concurrent hash map #806

merged 2 commits into from
Mar 5, 2016

Conversation

qwwdfsad
Copy link
Contributor

@qwwdfsad qwwdfsad commented Mar 4, 2016

Internally Gson class uses Collections#synchronizedMap as a thread-safe map for typeTokenCache.
In a highly-contended environment I'm experiencing unpredictable latency spikes,
while the performance of Gson#toJson degrades significantly even for simple objects.
It's typical for an application thread to hang on typeTokenCache#get average 3-15 ms in such situations.
See this benchmark results to observe this problem (results for patch included).

For a more or less generic serialization code it's impossible to use TypeAdapter directly. One of the solutions might be to use thread-local Gson instances, but from programmer's perspective it makes code more obfuscated, requires an additional explanation on why this was done and floods the heap with duplicate objects. Gson is declared as thread-safe, so it seems reasonable to expect fine-grained concurrency level internally. Moreover, after initial application warmup typeTokenCache becomes a mostly read-only structure (in fact, read-only), so allowing non-blocking reads looks like a perfect fit.

Provided patch replaces synchronized map with j.u.c.ConcurrentHashMap. The main difference between synchronized map and CHM is null keys prohibition, so implicit code-path for null is added.
As you can see in benchmark results, even despite of additional code path for null keys, overall throughput increases even in single-threaded case due to absence of explicit synchronization (though JVM optimizes uncontended locks, some overhead is still present), so there is no risks or performance degradation associated with this patch.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@qwwdfsad
Copy link
Contributor Author

qwwdfsad commented Mar 4, 2016

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

@swankjesse
Copy link
Collaborator

What happens if you extract a TypeAdapter and keep a reference to it? There’s not reason to go through that cache when you can just keep the TypeAdapter you need in a field.

@qwwdfsad
Copy link
Contributor Author

qwwdfsad commented Mar 5, 2016

As I said before, I can't: I'm serializing complex objects which contain maps/collections of other objects, and for every such object custom type adapter is registered (~50 adapters in total), so I need this cache. Otherwise I should write whole Gson class myself to use this adapters directly. There’s not reason to keep obvious performance bottleneck in Gson either :)

@swankjesse
Copy link
Collaborator

Its a mistake to call into the Gson instance multiple times if you can avoid it. What's your top-level object? You want to lookup the adapter for that exactly once, which will recursively fetch all of its dependent adapters.

In other words, rather than removing contention while reading from the map I think your best fix is to not read from the map at all.

@qwwdfsad
Copy link
Contributor Author

qwwdfsad commented Mar 5, 2016

I'm trying to keep Gson usage as simple as I can, so I call Gson#fromJson(object, clazz), which should be very common and obvious use case of Gson. I don't want adapters and other json-related classes leak to my code (at least I want to limit such places). What you are offering is nothing but hack solving problem on wrong level which limits gson usage and forces me to write more duplicate code (which is already written inside of Gson) for reader creation, lenitent check, exception handling and whatever-else-happens-in-fromJson-method (in fact when I got multiple top-level objects in different places I will end up with my own wrapper with gson and concurrent map of adapters inside) and write additional explanation of why I've done this instead of plain gson#fromJson call. Suggestion to rewrite part of gson implementation is not the way such problems should be solved. And my solution just solves the problem in the right place without any performance degradation at all (even for single-threaded case) or incompatible API changes, so everyone will benefit: performance of overall gson usage increases, old code isn't broken, problem solved.

In other words, rather than rewriting gson and propagating it's model classes to my code I think best fix is to remove synchronizedMap usage and replace it with something modern with the same contract.

@inder123
Copy link
Collaborator

inder123 commented Mar 5, 2016

I agree, if synchronized map is not performing well, we should look at alternatives.
My question is what do we lose? Does the performance degrade in normal operations?

@inder123
Copy link
Collaborator

inder123 commented Mar 5, 2016

@swankjesse What is the advantage of using synchronizedMap over ConcurrentMap?
I think we should accept this change.

@swankjesse
Copy link
Collaborator

As with many optimizations, this one is about tradeoffs. Here we have two options:

  • The synchronized map has a small memory footprint. It’s best for mobile.
  • The ConcurrentHashMap has low contention under heavy load. This is best on servers.

And there’s actually a third option that makes everyone happy. Avoid using the cache altogether. Lookup a type adapter instance from the Gson instance once, and avoid all contention. This keeps the memory low for mobile & makes servers even faster than the ConcurrentHashMap.

I think TypeAdapter’s toJson() and fromJson() methods have better behavior than similar methods on Gson; these don’t do weird things for backwards compatibility like force leniency or hide exceptions.

@inder123
Copy link
Collaborator

inder123 commented Mar 5, 2016

@swankjesse Gson.toJson()/fromJson() work really well for most use-cases. Getting an adapter and using it is cumbersome, and sometimes doesn't produce correct results (I have had issues in the past where it will serialize nulls incorrectly, or not pretty print). We should not impose our programming style on others.

How real is the memory overhead? Gson creates a lot of maps internally (and anonymous inner classes with specific type adapters for various java.* types), this specific ConcurrentMap can't be all that big. From what I read, ConcurrentHashMaps perform really well:
http://crunchify.com/hashmap-vs-concurrenthashmap-vs-synchronizedmap-how-a-hashmap-can-be-synchronized-in-java/
http://stackoverflow.com/questions/510632/whats-the-difference-between-concurrenthashmap-and-collections-synchronizedmap

I think we should accept this PR.

@swankjesse
Copy link
Collaborator

Works for me.

@inder123 inder123 self-assigned this Mar 5, 2016
@inder123
Copy link
Collaborator

inder123 commented Mar 5, 2016

Thanks @qwwdfsad for the pull request and @swankjesse for accommodating it.
👍

inder123 added a commit that referenced this pull request Mar 5, 2016
Gson synchronized map replaced with concurrent hash map
@inder123 inder123 merged commit a02f575 into google:master Mar 5, 2016
@inder123
Copy link
Collaborator

inder123 commented Mar 5, 2016

@qwwdfsad Also love the detailed write up PR. I am assuming you ran all the tests and they passed.
Ideally, in the PR you should cut-n-paste command-line print to show that. Something like:

Results :

Tests run: 986, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ gson ---
[INFO] Adding existing MANIFEST to archive. Found under: /Users/inder/tmp/gson/gson/target/classes/META-INF/MANIFEST.MF
[INFO] Building jar: /Users/inder/tmp/gson/gson/target/gson-2.6.3-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ gson ---
[INFO] Installing /Users/inder/tmp/gson/gson/target/gson-2.6.3-SNAPSHOT.jar to /Users/inder/.m2/repository/com/google/code/gson/gson/2.6.3-SNAPSHOT/gson-2.6.3-SNAPSHOT.jar
[INFO] Installing /Users/inder/tmp/gson/gson/pom.xml to /Users/inder/.m2/repository/com/google/code/gson/gson/2.6.3-SNAPSHOT/gson-2.6.3-SNAPSHOT.pom
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Gson Parent ........................................ SUCCESS [  1.671 s]
[INFO] Gson ............................................... SUCCESS [  5.048 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS

@JakeWharton
Copy link
Contributor

Travis CI runs all the tests. You'll see a big red X if they fail.

On Sat, Mar 5, 2016, 4:57 PM inder123 notifications@github.com wrote:

@qwwdfsad https://github.com/qwwdfsad Also love the detailed write up
PR. I am assuming you ran all the tests and they passed.
Ideally, in the PR you should cut-n-paste command-line print to show that.
Something like:

Results :

Tests run: 986, Failures: 0, Errors: 0, Skipped: 0

[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ gson ---
[INFO] Adding existing MANIFEST to archive. Found under: /Users/inder/tmp/gson/gson/target/classes/META-INF/MANIFEST.MF
[INFO] Building jar: /Users/inder/tmp/gson/gson/target/gson-2.6.3-SNAPSHOT.jar
[INFO]
[INFO] --- maven-install-plugin:2.4:install (default-install) @ gson ---
[INFO] Installing /Users/inder/tmp/gson/gson/target/gson-2.6.3-SNAPSHOT.jar to /Users/inder/.m2/repository/com/google/code/gson/gson/2.6.3-SNAPSHOT/gson-2.6.3-SNAPSHOT.jar
[INFO] Installing /Users/inder/tmp/gson/gson/pom.xml to /Users/inder/.m2/repository/com/google/code/gson/gson/2.6.3-SNAPSHOT/gson-2.6.3-SNAPSHOT.pom
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Gson Parent ........................................ SUCCESS [ 1.671 s]
[INFO] Gson ............................................... SUCCESS [ 5.048 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS


Reply to this email directly or view it on GitHub
#806 (comment).

@inder123
Copy link
Collaborator

inder123 commented Mar 5, 2016

@JakeWharton Good point, so yes, not needed.

@qwwdfsad
Copy link
Contributor Author

qwwdfsad commented Mar 6, 2016

Thank you!
Any chances there will be 2.6.3 soon?

sebasjm pushed a commit to sebasjm/gson that referenced this pull request Mar 11, 2018
Gson synchronized map replaced with concurrent hash map
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants