Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON.lower for a better serialization API #151

Merged
merged 8 commits into from
Aug 16, 2016
Merged

Add JSON.lower for a better serialization API #151

merged 8 commits into from
Aug 16, 2016

Conversation

TotalVerb
Copy link
Collaborator

@TotalVerb TotalVerb commented Jun 19, 2016

Implements the idea in #150. The following is a quick recap (redundant with #150) of why this is needed:

Lots of packages currently overload JSON._print to define custom JSON serialization. This is really bad, because:

  • Packages need to reimplement a lot of serialization code, like how to serialize dicts and arrays.
  • Overloads of JSON._print can write anything — including stuff that isn't JSON. This isn't good.
  • Packages tend to depend on implementation details, like what the name of JSON.separator is.
  • Packages work independently of the existing parser system, so they do not benefit from updates to serialization code here.

This PR implements:

  • Introduce JSON.lower, designed to be overloaded, that converts an object to another "primitive" object that JSON can serialize (Dict, Array, etc.).
  • In the generic JSON._print fallback, call JSON.lower instead of the current dumping behaviour
  • Add a generic JSON.lower fallback that does the current dumping behaviour
  • Rename JSON._print to JSON._writejson, and add an additional deprecated fallback to a new JSON._print with no methods defined

To make review easier, changes are separated into separate commits.

@quinnj
Copy link
Member

quinnj commented Jun 30, 2016

Yes please; this looks great

@TotalVerb
Copy link
Collaborator Author

We still need a way to deprecate these overloads. Unfortunately, @deprecate_binding won't work. I have an idea I'll play around with later.

@TotalVerb
Copy link
Collaborator Author

TotalVerb commented Jul 9, 2016

#152 is caused by trying to reimplement vector serialization for arrays and not catching all the edge cases (such as the zero-column case). I think it's a good idea to lower multi-dimensional arrays to vectors, so that these issues don't occur again in the future. However, that might come at some performance cost—not sure if that's acceptable.

@tkelman tkelman mentioned this pull request Aug 3, 2016
@TotalVerb TotalVerb changed the title WIP/RFC: Add JSON.lower for a better serialization API RFC: Add JSON.lower for a better serialization API Aug 4, 2016
@TotalVerb
Copy link
Collaborator Author

TotalVerb commented Aug 4, 2016

This is no longer WIP. To make review easier, I've separated the commits into logical units:

  • The first commit deprecates extending JSON._print (deprecation warning printed on use); this commit is doesn't actually call _print, but that's in the second commit
  • The second commit implements lower, as described in OP, as a replacement
  • The third commit replaces existing _writejson overloads to lower methods
  • The fourth commit documents the lower function in the README
  • The fifth commit restores v0.3 support and fixes issues with PlotlyJS master. I can squash this into the previous commits if needed, or we could just squash the whole PR.

I've verified locally that JSON tests and Plotly (which overloads JSON._print via PlotlyJS) tests pass on the final commit. Plotly does not pass intermediate commits.

@codecov-io
Copy link

codecov-io commented Aug 4, 2016

Current coverage is 97.05% (diff: 85.29%)

Merging #151 into master will decrease coverage by 0.26%

@@             master       #151   diff @@
==========================================
  Files             3          3          
  Lines           335        339     +4   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            326        329     +3   
- Misses            9         10     +1   
  Partials          0          0          

Powered by Codecov. Last update f298f1b...37b0ae4

@TotalVerb
Copy link
Collaborator Author

I'd love for coverage to not drop, but I don't know how to cover deprecated codepaths without invoking deprecation warnings.

@TotalVerb TotalVerb mentioned this pull request Aug 8, 2016
4 tasks
@EricForgy
Copy link
Contributor

@malmaud , any thought on this PR? If you like the general direction, I'm happy to do what I can to help out with testing etc. Just tight on time so don't want to test something with no chance of being merged 😅

This seems like a fairly big change to a core package that may be incorporated into Base at some point.

@EricForgy
Copy link
Contributor

@essenciary, any thoughts?

obj[name] = getfield(a, name)
end
obj
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition seems quite broad as a fallback.

For example, if I define a new bitstype, it will fallback here, which is probably not that useful. Maybe this can be restricted to DataTypes? (Not sure...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, this is the same fallback as before. I'll look into restricting it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't realized it was the same as before. FWIW, I doubt custom bitstypes are that common, so it might not matter in a practical sense.

@kmsquire
Copy link
Contributor

kmsquire commented Aug 9, 2016

One potential downside of lower (compared with _print) is that it creates a number of intermediate strings.

In contrast, _print accepts an IO argument as its first parameter, and sends the conversion directly there, which is more in line with the way print and friends work in Base Julia.

Compared to most languages I've worked with, the choice to work directly with IO objects instead of constructing and passing around strings seems a little awkward, but it's probably faster... although I don't have any benchmarks to back it up (Cc: @StefanKarpinski, who I believe was the main driver behind the output convention).

@TotalVerb, you've recently sped up the code here in JSON.jl significantly. Have you done any benchmarking here to verify that the introduction of intermediate strings isn't causing much slowdown? Obviously, this would mostly happen for large types which need to be lowered (e.g., Associatives with a lot of members).

Anyway, consider recoding using an IO argument.

@TotalVerb
Copy link
Collaborator Author

@kmsquire You are very right about the possible performance impact. However,

  • The only package on GitHub whose overload of JSON._print will be slowed down is Plots. The other three packages are creating temporary strings anyway.
  • The original code in this package itself was frequently creating temporary strings. I suspect (but have not measured) that serializing the other types that are supported will not in general be slower. Example:
if isdefined(Base, :Dates)
    function _print(io::IO, state::State, s::Base.Dates.TimeType)
        _print(io, state, string(s))
    end
end

function _print(io::IO, state::State, s::Symbol)
    _print(io, state, string(s))
end
  • Associatives, will actually be serialized directly and do not need to be lowered. There is no performance impact to serializing built-in types.
  • I don't see this as the final solution. This is just a way to stop the bleeding, and stop JSON._print overloading from getting worse.

I do have some ideas for how to improve the performance of JSON serialization using IO arguments; however, I don't think that should stop us from deprecating JSON._print ASAP. (Note that overloading this is the cause of the only non-test failure that arose from JSON v0.5.2.)

@TotalVerb
Copy link
Collaborator Author

Actually, I'm not fully confident myself. Benchmarks would certainly be useful to either confirm that there is no impact, or bring attention to a real performance problem. I'll prepare a few.

@kmsquire
Copy link
Contributor

kmsquire commented Aug 9, 2016

I don't see this as the final solution. This is just a way to stop the bleeding, and stop JSON._print overloading from getting worse.

Agree that overloading JSON._print isn't good, and should be deprecated... but after thinking some more, I'm wondering if it wouldn't just be better to rename JSON._print to JSON.print_json, and allow people to override it.

@EricForgy
Copy link
Contributor

It makes me nervous when you say things like that. As I said, I suspect there are a lot of private repos dependent on JSON (I have lots), so we should really be thoughtful of performance impacts (if any).

Get Outlook for iOShttps://aka.ms/o0ukef

On Tue, Aug 9, 2016 at 1:20 PM +0800, "Fengyang Wang" <notifications@github.commailto:notifications@github.com> wrote:

@kmsquirehttps://github.com/kmsquire You are very right about the possible performance impact. However,

  • The only package on GitHub whose overload of JSON._print will be slowed down is Plots. The other three packages are creating temporary strings anyway.
  • The original code in this package itself was frequently creating temporary strings. I suspect (but have not measured) that serializing the other types that are supported will not in general be slower. Example:

if isdefined(Base, :Dates)
function _print(io::IO, state::State, s::Base.Dates.TimeType)
_print(io, state, string(s))
end
end

function _print(io::IO, state::State, s::Symbol)
_print(io, state, string(s))
end

  • Associatives, will actually be serialized directly and do not need to be lowered. There is no performance impact to serializing built-in types.
  • I don't see this as the final solution. This is just a way to stop the bleeding, and stop JSON._print overloading from getting worse.

I do have some ideas for how to improve the performance of JSON serialization using IO arguments; however, I don't think that should stop us from deprecating JSON._print ASAP. (Note that overloading this is the cause of the only non-test failure that arose from JSON v0.5.2.)

You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//pull/151#issuecomment-238456261, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIijap2tMe8tiIMTCIQ9vmIibmZDn39sks5qeA38gaJpZM4I5RQ3.

@TotalVerb
Copy link
Collaborator Author

TotalVerb commented Aug 9, 2016

@kmsquire Unfortunately that doesn't solve the following:

  • Overloads of JSON._print can write anything — including stuff that isn't JSON. This isn't good.
  • Packages tend to depend on implementation details, like what the name of JSON.separator is.
  • Packages work independently of the existing parser system, so they do not benefit from updates to serialization code here.

I do have a long-term solution in mind that will improve on the current state of affairs. I need to work out the kinks, but expect a PR soon-ish. Note that this will not replace lower, which is still most convenient in many cases, but will supplement it will a performant alternative that falls back to lower.

@EricForgy Don't worry. I am creating benchmarks and will make sure that performance does not degrade.

Furthermore, I understand that this repository is depended on by a lot of private packages. But private packages overriding and depending on an internal implementation detail is dangerous. Overall, this change is certainly a good thing for private packages.

@kmsquire
Copy link
Contributor

kmsquire commented Aug 9, 2016

@TotalVerb, thanks--that makes sense.

If there aren't any comments soon, I would think it fine to merge this.

EDIT: To clarify, I've looked through the changes, and LGTM.

@TotalVerb
Copy link
Collaborator Author

TotalVerb commented Aug 9, 2016

I will finish the benchmarks tonight and report results. If there are any regressions, I suspect they're not difficult to fix. If/when there are no regressions, then I think it's good to merge this.

@TotalVerb
Copy link
Collaborator Author

I created a benchmark suite with 19 tests for both compact printing and pretty printing (PR incoming). Here are the results:

julia> showall(judge(minimum(results["print"]), minimum(last["print"]); time_tolerance=0.05))
19-element BenchmarkTools.BenchmarkGroup:
  tags: ["serialize"]
  "ascii" => BenchmarkTools.TrialJudgement: 
  time:   +0.00% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "flat-homogenous-array-16" => BenchmarkTools.TrialJudgement: 
  time:   +2.09% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "nested-array-16^2" => BenchmarkTools.TrialJudgement: 
  time:   -4.29% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "unicode" => BenchmarkTools.TrialJudgement: 
  time:   -2.73% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "null" => BenchmarkTools.TrialJudgement: 
  time:   -8.49% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "unicode-1024" => BenchmarkTools.TrialJudgement: 
  time:   -3.42% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "flat-dict-128" => BenchmarkTools.TrialJudgement: 
  time:   -8.25% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "matrix-16" => BenchmarkTools.TrialJudgement: 
  time:   -3.47% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "custom-tree-8" => BenchmarkTools.TrialJudgement: 
  time:   +198.28% => regression (5.00% tolerance)
  memory: +751.81% => regression (1.00% tolerance)
  "small-dict" => BenchmarkTools.TrialJudgement: 
  time:   +25.97% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "heterogenous-array" => BenchmarkTools.TrialJudgement: 
  time:   -9.40% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "float" => BenchmarkTools.TrialJudgement: 
  time:   -5.99% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "bool" => BenchmarkTools.TrialJudgement: 
  time:   -1.66% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "flat-homogenous-array-1024" => BenchmarkTools.TrialJudgement: 
  time:   -6.17% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "integer" => BenchmarkTools.TrialJudgement: 
  time:   +0.54% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "date" => BenchmarkTools.TrialJudgement: 
  time:   +59.04% => regression (5.00% tolerance)
  memory: +1.37% => regression (1.00% tolerance)
  "custom-list-128" => BenchmarkTools.TrialJudgement: 
  time:   +167.12% => regression (5.00% tolerance)
  memory: +469.86% => regression (1.00% tolerance)
  "ascii-1024" => BenchmarkTools.TrialJudgement: 
  time:   +1.04% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "nested-array-16^3" => BenchmarkTools.TrialJudgement: 
  time:   +0.87% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
julia> showall(judge(minimum(results["pretty-print"]), minimum(last["pretty-print"]); time_tolerance=0.05))
19-element BenchmarkTools.BenchmarkGroup:
  tags: ["serialize"]
  "ascii" => BenchmarkTools.TrialJudgement: 
  time:   -3.26% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "flat-homogenous-array-16" => BenchmarkTools.TrialJudgement: 
  time:   +1.46% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "nested-array-16^2" => BenchmarkTools.TrialJudgement: 
  time:   -2.62% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "unicode" => BenchmarkTools.TrialJudgement: 
  time:   -4.19% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "null" => BenchmarkTools.TrialJudgement: 
  time:   -3.55% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "unicode-1024" => BenchmarkTools.TrialJudgement: 
  time:   +4.10% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "flat-dict-128" => BenchmarkTools.TrialJudgement: 
  time:   -3.74% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "matrix-16" => BenchmarkTools.TrialJudgement: 
  time:   +0.20% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "custom-tree-8" => BenchmarkTools.TrialJudgement: 
  time:   +93.14% => regression (5.00% tolerance)
  memory: +332.62% => regression (1.00% tolerance)
  "small-dict" => BenchmarkTools.TrialJudgement: 
  time:   +18.99% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "heterogenous-array" => BenchmarkTools.TrialJudgement: 
  time:   +0.18% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "float" => BenchmarkTools.TrialJudgement: 
  time:   +1.30% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "bool" => BenchmarkTools.TrialJudgement: 
  time:   -14.86% => improvement (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "flat-homogenous-array-1024" => BenchmarkTools.TrialJudgement: 
  time:   -2.45% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "integer" => BenchmarkTools.TrialJudgement: 
  time:   +0.43% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "date" => BenchmarkTools.TrialJudgement: 
  time:   +28.43% => regression (5.00% tolerance)
  memory: +1.37% => regression (1.00% tolerance)
  "custom-list-128" => BenchmarkTools.TrialJudgement: 
  time:   +39.32% => regression (5.00% tolerance)
  memory: +142.96% => regression (1.00% tolerance)
  "ascii-1024" => BenchmarkTools.TrialJudgement: 
  time:   -2.59% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
  "nested-array-16^3" => BenchmarkTools.TrialJudgement: 
  time:   -3.33% => invariant (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

Most of the differences are just noise, but there are definitely real regressions. I don't think they will be too hard to fix, however. Going to add some patches.

@kmsquire
Copy link
Contributor

👍

@TotalVerb
Copy link
Collaborator Author

TotalVerb commented Aug 10, 2016

I eliminated all the time regressions (memory regressions are a little harder, but those should mostly go away after the next PR I have planned)—however, that required removing the deprecated code path, which alone explains most of the time regressions found above. (Turns out applicable is pretty expensive, especially when it's called frequently!) I think the deprecation will have to stay, unfortunately, so that packages are not adversely affected. So there will still be regressions after this is merged, though on the order of 30%, not 300% as in the last post.

I also slightly improved the string serialization, so some timings (on tests with long strings) are better than before. Here are the timings with the deprecation warning excised; I removed ones that are invariant to save space, and I consider the remaining ones (except for the memory regressions) either noise or very small changes. Because of the level of noise, I set the time threshold to 10% for all tests. Overall, things look pretty good.

[ print ]
  "flat-dict-128" => BenchmarkTools.TrialJudgement: 
  time:   -11.62% => improvement (10.00% tolerance) [not real; this code wasn't changed]
  "custom-tree-8" => BenchmarkTools.TrialJudgement: 
  time:   +11.34% => regression (10.00% tolerance)  [don't think this is real, exaggerated if it is]
  memory: +245.49% => regression (1.00% tolerance)  [real]
  "small-dict" => BenchmarkTools.TrialJudgement: 
  memory: -9.09% => improvement (1.00% tolerance)  [real, hex escapes allocate less]
  "heterogenous-array" => BenchmarkTools.TrialJudgement: 
  time:   -18.01% => improvement (10.00% tolerance)  [possibly real, but definitely exaggerated]
  "date" => BenchmarkTools.TrialJudgement: 
  time:   +17.89% => regression (10.00% tolerance)  [don't think this is real]
  "custom-list-128" => BenchmarkTools.TrialJudgement: 
  memory: +151.87% => regression (1.00% tolerance)  [real]
  "ascii-1024" => BenchmarkTools.TrialJudgement: 
  time:   -22.65% => improvement (10.00% tolerance)  [real]
[ pretty-print ]
  "ascii" => BenchmarkTools.TrialJudgement: 
  time:   -14.05% => improvement (10.00% tolerance)  [real]
  "matrix-16" => BenchmarkTools.TrialJudgement: 
  time:   +11.54% => regression (10.00% tolerance)  [not real]
  "custom-tree-8" => BenchmarkTools.TrialJudgement: 
  memory: +108.61% => regression (1.00% tolerance) [real]
  "small-dict" => BenchmarkTools.TrialJudgement: 
  memory: -7.19% => improvement (1.00% tolerance)  [real]
  "bool" => BenchmarkTools.TrialJudgement: 
  time:   -17.74% => improvement (10.00% tolerance)  [not real]
  "custom-list-128" => BenchmarkTools.TrialJudgement: 
  memory: +46.21% => regression (1.00% tolerance)  [real]
  "ascii-1024" => BenchmarkTools.TrialJudgement: 
  time:   -15.96% => improvement (10.00% tolerance)  [real]

function _writejson(io::IO, state::State, a)
# FIXME: This fallback is harming performance substantially.
# Remove this fallback when _print removed.
if applicable(_print, io, state, a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this call is costly, how about moving it to a global constant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to cache the results of the call? I'm a little worried about cache invalidation, but I suppose it's no worse than JuliaLang/julia#265 is.

Copy link
Collaborator Author

@TotalVerb TotalVerb Aug 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get this to work.

Caching the results in a Dict only saves a tiny portion of the runtime cost, while using more memory, adding compile-time overhead, and also making the entire thing not thread safe. I think we might have to live with the applicable check for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for checking. One other idea to try would be to put a try-catch around a call to the _print function, temporarily replacing the io parameter with an IOBuffer to capture the output if the call succeeds. But I'm not expecting that to necessarily help.

Copy link
Collaborator Author

@TotalVerb TotalVerb Aug 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using try-catch seems to make it even worse. I'll also try a @noinline function. If that does not work, then I'm out of ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use applicable should be the best solution. It's doing a specialized dict lookup anyway. try-catch is also very expensive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a noinline function to remove the excess code generated by depwarn. I think that might help some, even if it doesn't remove the applicable cost.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noinline function doesn't do much either. Anyway, the problem will be resolved when the deprecation warning is removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Should be good to merge.

@kmsquire
Copy link
Contributor

Other than the one comment, I think this is good to merge.


Define a custom serialization rule for a particular data type. Must return a
value that can be directly serialized; see help for more details. Do **not**
extend `JSON._print`; that is a clunky API and is now deprecated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some nicer words could be chosen here?

Please note that JSON._print is now deprecated in favor of JSON.lower.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I do apologize for being so hostile to JSON._print; it was the only option, and this package's documentation is quite sparse, so from a consumer's viewpoint it's a very reasonable solution.

@TotalVerb TotalVerb changed the title RFC: Add JSON.lower for a better serialization API Add JSON.lower for a better serialization API Aug 11, 2016
@EricForgy
Copy link
Contributor

Out of curiosity, why is deprecating a function that was never part of the API necessary? Especially if doing so has performance implications?

I overwrote JSON._print several times in several private repos, but I always understood it could break at some point because it wasn't part of the API. I suspect any other authors would have similar expectations and can adjust accordingly.

To be honest, looking at the code changes, I might be tempted to just find/replace _print -> _writejson 😄

@TotalVerb
Copy link
Collaborator Author

TotalVerb commented Aug 12, 2016

Breaking four registered packages, one of which is Plots, is not good.

Also, performance regressions will be fairly minor, and should disappear by next release.

if length(a) == 0
Base.print(io, "{}")
return
end
start_object(io, state, true)
first = true
for (key, value) in a
for key in keys(a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this, and I'm curious about this change. Is iterating with a tuple more expensive than computing a hash?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarking shows that the tuple is cheaper (though it allocates three times the memory), but the cost of hashing is peanuts compared to the cost of IO in this loop, so the difference isn't detectable. I switched this because for AssociativeWrapper, the "hash" is just a fast getfield call; forming the tuple still has allocation issues. After #162, it will be possible to define separate serialization for AssociativeWrapper and Dict, so I switched back in that PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the clarification! Not a big deal if it doesn't make much of a difference. The larger memory allocation might be misleading, as it's likely that the allocated memory is being released/reused quickly.

@TotalVerb TotalVerb mentioned this pull request Aug 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants