Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Introduce StructuralContext and Serialization for finer-grained serialization control #162

Merged
merged 3 commits into from
Mar 15, 2017
Merged

Conversation

TotalVerb
Copy link
Collaborator

@TotalVerb TotalVerb commented Aug 16, 2016

This fixes the major issues with overriding the old _writejson (née _print):

  • Overloads of show_json are easily constrained to writing valid JSON. Backdoors are available for performance reasons, but situations where invalid JSON can be written are easily caught and fixed.
  • Packages that overloaded _print tended to use low-level implementation details, such as JSON.separator. These packages broke when the low-level implementation changed. This proposal will introduce high-level constructs with a stable, documented API.
  • Upgrades to serialization performance, or bugfixes, are automatically available to packages overriding show_json.
  • Extensions to the formats available to serialize JSON in can be considered in the future, and packages overloading show_json will correctly use those extensions automatically.

What is not covered in this PR is a nicer interface for things. That will come later.

Benchmark results (compared to before #151 was merged; #151 introduced some memory regressions) are looking good so far:

2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "print" => 19-element BenchmarkTools.BenchmarkGroup:
      tags: ["serialize"]
      "ascii" => BenchmarkTools.TrialJudgement: 
  time:   -38.00% => improvement (10.00% tolerance)
  memory: -66.67% => improvement (1.00% tolerance)
      "flat-homogenous-array-16" => BenchmarkTools.TrialJudgement: 
  time:   -19.53% => improvement (10.00% tolerance)
  memory: -5.79% => improvement (1.00% tolerance)
      "nested-array-16^2" => BenchmarkTools.TrialJudgement: 
  time:   -8.37% => invariant (10.00% tolerance)
  memory: -0.39% => invariant (1.00% tolerance)
      "unicode" => BenchmarkTools.TrialJudgement: 
  time:   +7.31% => invariant (10.00% tolerance)
  memory: -66.67% => improvement (1.00% tolerance)
      "null" => BenchmarkTools.TrialJudgement: 
  time:   -59.07% => improvement (10.00% tolerance)
  memory: -77.78% => improvement (1.00% tolerance)
      "unicode-1024" => BenchmarkTools.TrialJudgement: 
  time:   +21.52% => regression (10.00% tolerance)
  memory: -66.67% => improvement (1.00% tolerance)
      "flat-dict-128" => BenchmarkTools.TrialJudgement: 
  time:   +2.15% => invariant (10.00% tolerance)
  memory: +6.72% => regression (1.00% tolerance)
      "matrix-16" => BenchmarkTools.TrialJudgement: 
  time:   +22.99% => regression (10.00% tolerance)
  memory: +5.04% => regression (1.00% tolerance)
      "custom-tree-8" => BenchmarkTools.TrialJudgement: 
  time:   -22.15% => improvement (10.00% tolerance)
  memory: -0.21% => invariant (1.00% tolerance)
      "small-dict" => BenchmarkTools.TrialJudgement: 
  time:   -40.31% => improvement (10.00% tolerance)
  memory: -75.45% => improvement (1.00% tolerance)
      "heterogenous-array" => BenchmarkTools.TrialJudgement: 
  time:   -25.57% => improvement (10.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)
      "float" => BenchmarkTools.TrialJudgement: 
  time:   -40.78% => improvement (10.00% tolerance)
  memory: -80.00% => improvement (1.00% tolerance)
      "bool" => BenchmarkTools.TrialJudgement: 
  time:   -53.16% => improvement (10.00% tolerance)
  memory: -77.78% => improvement (1.00% tolerance)
      "flat-homogenous-array-1024" => BenchmarkTools.TrialJudgement: 
  time:   +12.11% => regression (10.00% tolerance)
  memory: -6.76% => improvement (1.00% tolerance)
      "integer" => BenchmarkTools.TrialJudgement: 
  time:   -67.92% => improvement (10.00% tolerance)
  memory: -43.75% => improvement (1.00% tolerance)
      "date" => BenchmarkTools.TrialJudgement: 
  time:   -3.96% => invariant (10.00% tolerance)
  memory: -9.59% => improvement (1.00% tolerance)
      "custom-list-128" => BenchmarkTools.TrialJudgement: 
  time:   -25.84% => improvement (10.00% tolerance)
  memory: -5.01% => improvement (1.00% tolerance)
      "ascii-1024" => BenchmarkTools.TrialJudgement: 
  time:   +2.64% => invariant (10.00% tolerance)
  memory: -66.67% => improvement (1.00% tolerance)
      "nested-array-16^3" => BenchmarkTools.TrialJudgement: 
  time:   +6.19% => invariant (10.00% tolerance)
  memory: -0.02% => invariant (1.00% tolerance)
  "pretty-print" => 19-element BenchmarkTools.BenchmarkGroup:
      tags: ["serialize"]
      "ascii" => BenchmarkTools.TrialJudgement: 
  time:   -40.90% => improvement (10.00% tolerance)
  memory: -55.56% => improvement (1.00% tolerance)
      "flat-homogenous-array-16" => BenchmarkTools.TrialJudgement: 
  time:   -15.73% => improvement (10.00% tolerance)
  memory: -15.44% => improvement (1.00% tolerance)
      "nested-array-16^2" => BenchmarkTools.TrialJudgement: 
  time:   +22.40% => regression (10.00% tolerance)
  memory: -12.01% => improvement (1.00% tolerance)
      "unicode" => BenchmarkTools.TrialJudgement: 
  time:   -10.27% => improvement (10.00% tolerance)
  memory: -55.56% => improvement (1.00% tolerance)
      "null" => BenchmarkTools.TrialJudgement: 
  time:   -58.58% => improvement (10.00% tolerance)
  memory: -66.67% => improvement (1.00% tolerance)
      "unicode-1024" => BenchmarkTools.TrialJudgement: 
  time:   +9.55% => invariant (10.00% tolerance)
  memory: -55.56% => improvement (1.00% tolerance)
      "flat-dict-128" => BenchmarkTools.TrialJudgement: 
  time:   +12.86% => regression (10.00% tolerance)
  memory: +5.89% => regression (1.00% tolerance)
      "matrix-16" => BenchmarkTools.TrialJudgement: 
  time:   +14.75% => regression (10.00% tolerance)
  memory: -61.11% => improvement (1.00% tolerance)
      "custom-tree-8" => BenchmarkTools.TrialJudgement: 
  time:   +52.81% => regression (10.00% tolerance)
  memory: -55.84% => improvement (1.00% tolerance)
      "small-dict" => BenchmarkTools.TrialJudgement: 
  time:   -28.26% => improvement (10.00% tolerance)
  memory: -79.86% => improvement (1.00% tolerance)
      "heterogenous-array" => BenchmarkTools.TrialJudgement: 
  time:   -15.93% => improvement (10.00% tolerance)
  memory: -26.92% => improvement (1.00% tolerance)
      "float" => BenchmarkTools.TrialJudgement: 
  time:   -38.44% => improvement (10.00% tolerance)
  memory: -70.00% => improvement (1.00% tolerance)
      "bool" => BenchmarkTools.TrialJudgement: 
  time:   -59.65% => improvement (10.00% tolerance)
  memory: -66.67% => improvement (1.00% tolerance)
      "flat-homogenous-array-1024" => BenchmarkTools.TrialJudgement: 
  time:   -4.60% => invariant (10.00% tolerance)
  memory: -6.93% => improvement (1.00% tolerance)
      "integer" => BenchmarkTools.TrialJudgement: 
  time:   -68.68% => improvement (10.00% tolerance)
  memory: -37.50% => improvement (1.00% tolerance)
      "date" => BenchmarkTools.TrialJudgement: 
  time:   -28.60% => improvement (10.00% tolerance)
  memory: -8.22% => improvement (1.00% tolerance)
      "custom-list-128" => BenchmarkTools.TrialJudgement: 
  time:   +181.43% => regression (10.00% tolerance)
  memory: -71.09% => improvement (1.00% tolerance)
      "ascii-1024" => BenchmarkTools.TrialJudgement: 
  time:   -14.48% => improvement (10.00% tolerance)
  memory: -55.56% => improvement (1.00% tolerance)
      "nested-array-16^3" => BenchmarkTools.TrialJudgement: 
  time:   +38.63% => regression (10.00% tolerance)
  memory: -11.78% => improvement (1.00% tolerance)

@codecov-io
Copy link

codecov-io commented Aug 16, 2016

Current coverage is 97.23% (diff: 94.59%)

Merging #162 into master will increase coverage by 0.18%

@@             master       #162   diff @@
==========================================
  Files             3          3          
  Lines           339        325    -14   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits            329        316    -13   
+ Misses           10          9     -1   
  Partials          0          0          

Powered by Codecov. Last update b064040...31e61aa

@TotalVerb
Copy link
Collaborator Author

TotalVerb commented Aug 20, 2016

I have a few syntax proposals to share:

The imperative style

JSON.begin_array(io)
JSON.show_element(io, elt)
JSON.show_element(io, elt)
JSON.show_element(io, elt)
JSON.end_array(io)

This one is not my favourite. It's too clunky and verbose, and too error-prone. Obviously the other proposals are going to end up compiling down to this, but I seriously hope we can find a better API.

The context style

JSON.array(io) do
    JSON.show_element(io, elt)
    JSON.show_element(io, elt)
    JSON.show_element(io, elt)
end

This one is a little nicer to look at than the last one (for one, it nests properly!), but it's still error-prone. It's also potential compilation overhead.

The cute style

JSON.array(io) do show
    show(elt)
    show(elt)
    show(elt)
end

This is clean and least error-prone (I can't accidentally use the wrong version of show). But I'm sure this has compilation overhead. I don't know if I like that show is shadowed in this example, but it's up to the user to pick the name they like. It's a little more limited though—it makes showing a key and then a value separately a little rough. Also, if the code calls helper functions, they will need to be passed both show and io, instead of just io.

The cute style, macro version

JSON.@arr begin
    show(elt)
    show(elt)
    show(elt)
end

Avoids compilation overhead of extra functions, but it's a little more magical. As with earlier, any helper functions will need to be passed both show and io.

Any other suggestions for syntax are welcome.

@TotalVerb TotalVerb mentioned this pull request Aug 20, 2016
@TotalVerb TotalVerb mentioned this pull request Sep 8, 2016
@TotalVerb
Copy link
Collaborator Author

I think I will implement the context style, with support for the imperative style as a documented API.

@TotalVerb TotalVerb changed the title WIP: Replace _writejson with higher-level IOContext-based interface WIP: Replace _writejson with higher-level IOContext-like interface Sep 10, 2016
@TotalVerb
Copy link
Collaborator Author

After playing around with several approaches, I think I may as well implement #165 in this PR. There's no point implementing something halfway only to rip it apart when an implementation of #165 is needed.

To implement #165, I am looking to adopt a system analogous to Base's show, but for JSON. Here, JSONContext extends IO with functions like begin_array and show_key. Like IOContext and IO, it provides the settings for how things are to be printed. In summary,

Base JSON Base Usage JSON Usage JSON Built-ins
show show_json print third argument (Any) to first argument (IO), using format specified by second argument (MIME) serialize third argument (Any) to first argument (JSONContext) using format specified by second argument (Serialization) n/a
IOContext <: IO JSONContext <: IO an abstract stream supporting write for binary data, / print for text data, optionally with additional settings like :compact, :limit, or :displaysize to control printing behaviour an extension of IO with additional methods specific to JSON data printing; controls e.g. whether JSON should be printed flat or indented CompactContext, PrettyContext
MIME Serialization an object specifying to what format Julia objects should be printed, like text/plain an object specifying to what format Julia objects are to be serialized into JSON objects StandardSerialization

I settled on this strategy primarily because of consistency with Base. Since JSON data is much more structured than linear binary or textual data, the JSONContext is a subtype of IO that provides a larger interface. However, JSONContext has strong parallels with IO: IO allows code to be written generically without regard to what kind of stream the data ends up going to, be it a file or terminal window. Similarly, JSONContext allows JSON serializing code to be written without regard to exactly how the JSON is formatted.

MIME in Base is a passive argument that provides little in terms of data manipulation; it mostly exists to pass around as a tag to determine just how the data should be presented. Serialization serves a similar role: it's a small tag that's recursively passed around for serializing JSON. Like MIME, Serialization encourages simple dispatch behaviour, but also allows for more advanced behaviour through traits like istextmime — though no such traits are implemented in JSON itself.

The overarching goals are:

  • keep JSON.jl lightweight; we will still only support serializing Julia types using the standard serialization to standard JSON
  • keep JSON.jl fast: benchmarks show the new system is much faster than the old system; improvements include reducing the amount of copying and temporary allocation going on
  • make JSON.jl powerful: allow requests like Reading NaN and Inf #168 to be implemented on the user-side, and still reuse functionality implemented here

A few test cases for the new generality (I might implement these as tests, once the underlying implementation is done):

  • Reading NaN and Inf #168: it should be dirt easy to create a new Serialization subtype whose only change is to serialize NaN and Inf as in JavaScript (violating the JSON standard)
  • CSON is similar enough to JSON that it should be possible to create a CSON serializer (not a reader yet, unfortunately) by creating a new subtype of JSONContext

@TotalVerb
Copy link
Collaborator Author

An alternative approach is to combine the Serialization into the JSONContext. Truth be told, I don't know why Base chose to keep the MIME type separate from the IOContext; it seems like it could just as easily be put in there, and that may even simplify some forms of recursive printing. This was my initial approach, but I liked having the strong parallel between show and show_json.

@vtjnash
Copy link
Contributor

vtjnash commented Sep 19, 2016

For the "cute" version, I imagine you could do something like the following with call-overloading to pass around the IO object in a way that should be low overhead. We considered doing something like this for Base.read/write, but it would have been unnecessary code-churn and not necessarily better (you could do something like io = write(io); io("foo") instead of write(io, "foo")).

array(json) do json
    json(elt)
    json(elt)
    json(elt)
end

type JSON{T <: IO} <: IO
    io::T
    parent::JSON{T}
end
array(f, json) = print(json.io, "[", f(json), "]")
(json::JSON)(obj) = print(json.io, '"', escape(repr(obj)), '"')
(json::JSON)(arr) = array(body -> for elt in arr; body(elt); end, json)

I'm not sure I understand the purpose of Serialization. I can see you might choose to intercept show(io, ::MIME"text/json", obj), but it's not immediately obvious to mey why that other formatting wouldn't also be part of the JSONContext object.

@TotalVerb
Copy link
Collaborator Author

Thanks for your comments, @vtjnash. Passing around the IO argument does seem like an interesting idea. Since this PR is getting quite big, I think I will implement a minimal imperative interface and then we can discuss the higher-level abstractions later.

On Serialization, one advantage is that it would not be too easy to write a method so generic that it would work for all serializations, as some way want restrictive serializations that throw an error on types they do not expect. Base would have a similar problem if they folded the MIME type into the IO argument; there would be too many methods that immediately worked on all MIME types, even if it does not make sense.

@EricForgy
Copy link
Contributor

Hi @vtjnash ,

I like that idea too. Any other downsides besides churn? I am building something like a baby virtual DOM in Pages.jl https://github.com/EricForgy/Pages.jl/blob/master/src/api.jl#L66-L79

I'm passing IO objects around a lot and, after seeing your note, am considering making the IO part of the Element type. It's still pretty clumsy, but functional. Feedback/suggestions welcome. I haven't updated the README for the new API stuff. To do 😅

@vtjnash
Copy link
Contributor

vtjnash commented Sep 27, 2016

The other downside is that you only really get one verb, and the name is picked by the user (since it is the argument to the do block on the above example). But it mostly also just shows that there's nothing all that privileged about the first argument - it's symmetric multi-dispatch after all.

@TotalVerb TotalVerb changed the title WIP: Replace _writejson with higher-level IOContext-like interface RFC: Introduce StructuralContext and Serialization for finer-grained serialization control Sep 30, 2016
@TotalVerb
Copy link
Collaborator Author

TotalVerb commented Sep 30, 2016

This is open for review now. According to codecov, this is actually a net reduction in lines of code 😃. I did have to strip out the _print deprecation; hopefully now that all downstream users have moved away, this will not be that big of a deal.

cc @stevengj, @kmsquire (if you have time, of course)

# issue #168: Print NaN and Inf as Julia would
immutable NaNSerialization <: CS end
JSON.show_json(io::SC, ::NaNSerialization, f::AbstractFloat) =
Base.print(io, f)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, It's hard to imagine custom support for NaN and Inf being simpler than this :-)

@TotalVerb
Copy link
Collaborator Author

Bumping this.

be printed; otherwise there will be no trailing newline.
"""
function show_json(io::IO, s::Serializations.Serialization, obj; indent=nothing)
ctx = indent === nothing ? CompactContext(io) : PrettyContext(io, indent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't ctx type-unstable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous version, using State{true} or State{false}, was type unstable. This one is actually not (in the sense that the type of ctx only depends on the type of indent, instead of its value). However, the keyword argument sorter is type unstable anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a non-kwargs function, it's type stable:

julia> f(x=nothing) = x === nothing ? 1 : 1.0
f (generic function with 2 methods)

julia> @code_warntype f(nothing)
Variables:
  #self#::#f
  x::Void

Body:
  begin 
      (x::Void === Main.nothing)::Bool
      return 1
  end::Int64

julia> @code_warntype f(1)
Variables:
  #self#::#f
  x::Int64

Body:
  begin 
      (x::Int64 === Main.nothing)::Bool
      goto 3
      3: 
      return 1.0
  end::Float64

@randyzwitch
Copy link
Contributor

Is there anything preventing this from being merged and tagged? I want to use the JS function literals from #170, but I can't without being able to set a minimum JSON package value in REQUIRES

@TotalVerb
Copy link
Collaborator Author

@randyzwitch Because of the scope of this change, I'd like at least one review. Bumping this again.

@TotalVerb TotalVerb changed the base branch from master to fw/common-submodule October 25, 2016 03:01
@TotalVerb
Copy link
Collaborator Author

To slightly simplify review, I split off #177 (which is a non-functional refactoring) into a separate PR.

@TotalVerb TotalVerb changed the base branch from fw/common-submodule to master October 26, 2016 18:56
@TotalVerb
Copy link
Collaborator Author

Sorry for letting this stagnate so long. I've rebased it.

@TotalVerb
Copy link
Collaborator Author

I would like to move forward on this.

Since this does not currently affect any observable behaviour, and since it results in a performance improvement, I plan to merge it in about a week or so unless there are any objections.

@TotalVerb
Copy link
Collaborator Author

Let's do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants