-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JSON.lower for a better serialization API #151
Conversation
Yes please; this looks great |
We still need a way to deprecate these overloads. Unfortunately, |
#152 is caused by trying to reimplement vector serialization for arrays and not catching all the edge cases (such as the zero-column case). I think it's a good idea to lower multi-dimensional arrays to vectors, so that these issues don't occur again in the future. However, that might come at some performance cost—not sure if that's acceptable. |
This is no longer WIP. To make review easier, I've separated the commits into logical units:
I've verified locally that |
Current coverage is 97.05% (diff: 85.29%)@@ master #151 diff @@
==========================================
Files 3 3
Lines 335 339 +4
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 326 329 +3
- Misses 9 10 +1
Partials 0 0
|
I'd love for coverage to not drop, but I don't know how to cover deprecated codepaths without invoking deprecation warnings. |
@malmaud , any thought on this PR? If you like the general direction, I'm happy to do what I can to help out with testing etc. Just tight on time so don't want to test something with no chance of being merged 😅 This seems like a fairly big change to a core package that may be incorporated into Base at some point. |
@essenciary, any thoughts? |
obj[name] = getfield(a, name) | ||
end | ||
obj | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition seems quite broad as a fallback.
For example, if I define a new bitstype, it will fallback here, which is probably not that useful. Maybe this can be restricted to DataTypes
? (Not sure...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, this is the same fallback as before. I'll look into restricting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't realized it was the same as before. FWIW, I doubt custom bitstypes are that common, so it might not matter in a practical sense.
One potential downside of In contrast, Compared to most languages I've worked with, the choice to work directly with @TotalVerb, you've recently sped up the code here in JSON.jl significantly. Have you done any benchmarking here to verify that the introduction of intermediate strings isn't causing much slowdown? Obviously, this would mostly happen for large types which need to be lowered (e.g., Anyway, consider recoding using an |
@kmsquire You are very right about the possible performance impact. However,
if isdefined(Base, :Dates)
function _print(io::IO, state::State, s::Base.Dates.TimeType)
_print(io, state, string(s))
end
end
function _print(io::IO, state::State, s::Symbol)
_print(io, state, string(s))
end
I do have some ideas for how to improve the performance of JSON serialization using IO arguments; however, I don't think that should stop us from deprecating |
Actually, I'm not fully confident myself. Benchmarks would certainly be useful to either confirm that there is no impact, or bring attention to a real performance problem. I'll prepare a few. |
Agree that overloading |
It makes me nervous when you say things like that. As I said, I suspect there are a lot of private repos dependent on JSON (I have lots), so we should really be thoughtful of performance impacts (if any). Get Outlook for iOShttps://aka.ms/o0ukef On Tue, Aug 9, 2016 at 1:20 PM +0800, "Fengyang Wang" <notifications@github.commailto:notifications@github.com> wrote: @kmsquirehttps://github.com/kmsquire You are very right about the possible performance impact. However,
if isdefined(Base, :Dates) function _print(io::IO, state::State, s::Symbol)
I do have some ideas for how to improve the performance of JSON serialization using IO arguments; however, I don't think that should stop us from deprecating JSON._print ASAP. (Note that overloading this is the cause of the only non-test failure that arose from JSON v0.5.2.) You are receiving this because you commented. |
@kmsquire Unfortunately that doesn't solve the following:
I do have a long-term solution in mind that will improve on the current state of affairs. I need to work out the kinks, but expect a PR soon-ish. Note that this will not replace @EricForgy Don't worry. I am creating benchmarks and will make sure that performance does not degrade. Furthermore, I understand that this repository is depended on by a lot of private packages. But private packages overriding and depending on an internal implementation detail is dangerous. Overall, this change is certainly a good thing for private packages. |
@TotalVerb, thanks--that makes sense. If there aren't any comments soon, I would think it fine to merge this. EDIT: To clarify, I've looked through the changes, and LGTM. |
I will finish the benchmarks tonight and report results. If there are any regressions, I suspect they're not difficult to fix. If/when there are no regressions, then I think it's good to merge this. |
I created a benchmark suite with 19 tests for both compact printing and pretty printing (PR incoming). Here are the results: julia> showall(judge(minimum(results["print"]), minimum(last["print"]); time_tolerance=0.05))
19-element BenchmarkTools.BenchmarkGroup:
tags: ["serialize"]
"ascii" => BenchmarkTools.TrialJudgement:
time: +0.00% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"flat-homogenous-array-16" => BenchmarkTools.TrialJudgement:
time: +2.09% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"nested-array-16^2" => BenchmarkTools.TrialJudgement:
time: -4.29% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"unicode" => BenchmarkTools.TrialJudgement:
time: -2.73% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"null" => BenchmarkTools.TrialJudgement:
time: -8.49% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"unicode-1024" => BenchmarkTools.TrialJudgement:
time: -3.42% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"flat-dict-128" => BenchmarkTools.TrialJudgement:
time: -8.25% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"matrix-16" => BenchmarkTools.TrialJudgement:
time: -3.47% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"custom-tree-8" => BenchmarkTools.TrialJudgement:
time: +198.28% => regression (5.00% tolerance)
memory: +751.81% => regression (1.00% tolerance)
"small-dict" => BenchmarkTools.TrialJudgement:
time: +25.97% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"heterogenous-array" => BenchmarkTools.TrialJudgement:
time: -9.40% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"float" => BenchmarkTools.TrialJudgement:
time: -5.99% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"bool" => BenchmarkTools.TrialJudgement:
time: -1.66% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"flat-homogenous-array-1024" => BenchmarkTools.TrialJudgement:
time: -6.17% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"integer" => BenchmarkTools.TrialJudgement:
time: +0.54% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"date" => BenchmarkTools.TrialJudgement:
time: +59.04% => regression (5.00% tolerance)
memory: +1.37% => regression (1.00% tolerance)
"custom-list-128" => BenchmarkTools.TrialJudgement:
time: +167.12% => regression (5.00% tolerance)
memory: +469.86% => regression (1.00% tolerance)
"ascii-1024" => BenchmarkTools.TrialJudgement:
time: +1.04% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"nested-array-16^3" => BenchmarkTools.TrialJudgement:
time: +0.87% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
julia> showall(judge(minimum(results["pretty-print"]), minimum(last["pretty-print"]); time_tolerance=0.05))
19-element BenchmarkTools.BenchmarkGroup:
tags: ["serialize"]
"ascii" => BenchmarkTools.TrialJudgement:
time: -3.26% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"flat-homogenous-array-16" => BenchmarkTools.TrialJudgement:
time: +1.46% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"nested-array-16^2" => BenchmarkTools.TrialJudgement:
time: -2.62% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"unicode" => BenchmarkTools.TrialJudgement:
time: -4.19% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"null" => BenchmarkTools.TrialJudgement:
time: -3.55% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"unicode-1024" => BenchmarkTools.TrialJudgement:
time: +4.10% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"flat-dict-128" => BenchmarkTools.TrialJudgement:
time: -3.74% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"matrix-16" => BenchmarkTools.TrialJudgement:
time: +0.20% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"custom-tree-8" => BenchmarkTools.TrialJudgement:
time: +93.14% => regression (5.00% tolerance)
memory: +332.62% => regression (1.00% tolerance)
"small-dict" => BenchmarkTools.TrialJudgement:
time: +18.99% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"heterogenous-array" => BenchmarkTools.TrialJudgement:
time: +0.18% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"float" => BenchmarkTools.TrialJudgement:
time: +1.30% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"bool" => BenchmarkTools.TrialJudgement:
time: -14.86% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"flat-homogenous-array-1024" => BenchmarkTools.TrialJudgement:
time: -2.45% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"integer" => BenchmarkTools.TrialJudgement:
time: +0.43% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"date" => BenchmarkTools.TrialJudgement:
time: +28.43% => regression (5.00% tolerance)
memory: +1.37% => regression (1.00% tolerance)
"custom-list-128" => BenchmarkTools.TrialJudgement:
time: +39.32% => regression (5.00% tolerance)
memory: +142.96% => regression (1.00% tolerance)
"ascii-1024" => BenchmarkTools.TrialJudgement:
time: -2.59% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
"nested-array-16^3" => BenchmarkTools.TrialJudgement:
time: -3.33% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance) Most of the differences are just noise, but there are definitely real regressions. I don't think they will be too hard to fix, however. Going to add some patches. |
👍 |
I eliminated all the time regressions (memory regressions are a little harder, but those should mostly go away after the next PR I have planned)—however, that required removing the deprecated code path, which alone explains most of the time regressions found above. (Turns out I also slightly improved the string serialization, so some timings (on tests with long strings) are better than before. Here are the timings with the deprecation warning excised; I removed ones that are invariant to save space, and I consider the remaining ones (except for the memory regressions) either noise or very small changes. Because of the level of noise, I set the time threshold to 10% for all tests. Overall, things look pretty good.
|
function _writejson(io::IO, state::State, a) | ||
# FIXME: This fallback is harming performance substantially. | ||
# Remove this fallback when _print removed. | ||
if applicable(_print, io, state, a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this call is costly, how about moving it to a global constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to cache the results of the call? I'm a little worried about cache invalidation, but I suppose it's no worse than JuliaLang/julia#265 is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to get this to work.
Caching the results in a Dict
only saves a tiny portion of the runtime cost, while using more memory, adding compile-time overhead, and also making the entire thing not thread safe. I think we might have to live with the applicable
check for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for checking. One other idea to try would be to put a try-catch around a call to the _print function, temporarily replacing the io parameter with an IOBuffer to capture the output if the call succeeds. But I'm not expecting that to necessarily help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using try
-catch
seems to make it even worse. I'll also try a @noinline
function. If that does not work, then I'm out of ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use applicable
should be the best solution. It's doing a specialized dict lookup anyway. try
-catch
is also very expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant a noinline function to remove the excess code generated by depwarn
. I think that might help some, even if it doesn't remove the applicable cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noinline function doesn't do much either. Anyway, the problem will be resolved when the deprecation warning is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Should be good to merge.
Other than the one comment, I think this is good to merge. |
|
||
Define a custom serialization rule for a particular data type. Must return a | ||
value that can be directly serialized; see help for more details. Do **not** | ||
extend `JSON._print`; that is a clunky API and is now deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some nicer words could be chosen here?
Please note that
JSON._print
is now deprecated in favor ofJSON.lower
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I do apologize for being so hostile to JSON._print
; it was the only option, and this package's documentation is quite sparse, so from a consumer's viewpoint it's a very reasonable solution.
Out of curiosity, why is deprecating a function that was never part of the API necessary? Especially if doing so has performance implications? I overwrote To be honest, looking at the code changes, I might be tempted to just find/replace |
Breaking four registered packages, one of which is Also, performance regressions will be fairly minor, and should disappear by next release. |
if length(a) == 0 | ||
Base.print(io, "{}") | ||
return | ||
end | ||
start_object(io, state, true) | ||
first = true | ||
for (key, value) in a | ||
for key in keys(a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this, and I'm curious about this change. Is iterating with a tuple more expensive than computing a hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarking shows that the tuple is cheaper (though it allocates three times the memory), but the cost of hashing is peanuts compared to the cost of IO in this loop, so the difference isn't detectable. I switched this because for AssociativeWrapper
, the "hash" is just a fast getfield
call; forming the tuple still has allocation issues. After #162, it will be possible to define separate serialization for AssociativeWrapper
and Dict
, so I switched back in that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the clarification! Not a big deal if it doesn't make much of a difference. The larger memory allocation might be misleading, as it's likely that the allocated memory is being released/reused quickly.
Implements the idea in #150. The following is a quick recap (redundant with #150) of why this is needed:
Lots of packages currently overload
JSON._print
to define custom JSON serialization. This is really bad, because:This PR implements:
To make review easier, changes are separated into separate commits.