Skip to content

Use from_reader_eager instead of from_reader when deserialize Json #547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

messense
Copy link
Contributor

serde_json::from_reader is considerably slow for some reason.

serde-rs/json#160

.take(size_limit)
.read_to_string(&mut buf)
.map_err(|e| { error_!("IO Error: {:?}", e); e })
.unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to deal with this unwrap().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it doesn't look like Serde's error type can be created from an io:Error, so we'll have to change the error type here so that it returns an io::Error instead. We can then use impl From<serde::Error> for io::Error. Alternatively, @dtolnay: would it be possible to add a From<io::Error> for serde_json::error::Error implementation?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to have From<io::Error> for serde_json::Error. JSON errors are for Serde to create.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to io::Error

@messense
Copy link
Contributor Author

Before this patch:

image

After this patch:

image

I was using Rocket v0.3.6 when profiling this.

@Dowwie
Copy link

Dowwie commented Jan 22, 2018

@messense what library are you using to profile?

@messense
Copy link
Contributor Author

messense commented Jan 22, 2018

@Dowwie Instruments.app bundled within Xcode

Relevant post: http://carol-nichols.com/2015/12/09/rust-profiling-on-osx-cpu-time/

@messense
Copy link
Contributor Author

Just did a profile on using BufReader:

diff --git a/contrib/src/json.rs b/contrib/src/json.rs
index a461127..ff2e441 100644
--- a/contrib/src/json.rs
+++ b/contrib/src/json.rs
@@ -1,5 +1,5 @@
 use std::ops::{Deref, DerefMut};
-use std::io::Read;
+use std::io::{Read, BufReader};

 use rocket::outcome::{Outcome, IntoOutcome};
 use rocket::request::Request;
@@ -98,7 +98,8 @@ impl<T: DeserializeOwned> FromData for Json<T> {
         }

         let size_limit = request.limits().get("json").unwrap_or(LIMIT);
-        serde_json::from_reader(data.open().take(size_limit))
+        let reader = BufReader::new(data.open().take(size_limit));
+        serde_json::from_reader(reader)
             .map(|val| Json(val))
             .map_err(|e| { error_!("Couldn't parse JSON body: {:?}", e); e })
             .into_outcome(Status::BadRequest)

image

Much slower than serde_json::from_str

@SergioBenitez
Copy link
Member

SergioBenitez commented Jan 23, 2018

That is...surprising. Have you investigated why serde performs so differently? The returned DataStream is already buffered, so I wouldn't expect any gains there. But it's surprising to see such a difference between from_str and from_reader.

@dtolnay
Copy link

dtolnay commented Jan 23, 2018

We optimize from_str in lots of ways that it is not possible to optimize from_reader. For example from_reader needs to track line and column position as it goes, in case a parsing error happens later and we need to report the position; from_str can figure out line and column lazily only in the case that an error happened.

@dtolnay
Copy link

dtolnay commented Jan 24, 2018

I filed serde-rs/json#407 to support this in serde_json. Let me know if anyone has API suggestions.

if !request.content_type().map_or(false, |ct| ct.is_json()) {
error_!("Content-Type is not JSON.");
return Outcome::Forward(data);
}

let size_limit = request.limits().get("json").unwrap_or(LIMIT);
serde_json::from_reader(data.open().take(size_limit))
let mut buf = String::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use with_capacity here with size_limit to optimize this a little more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to over-allocate memory. Perhaps we could choose a safe lower-bound: something like 1k or 4k.

@SergioBenitez
Copy link
Member

SergioBenitez commented Feb 16, 2018

@messense Lets copy the code from serde-rs/json#407 (https://docs.rs/eager_json/0.1.0/src/eager_json/lib.rs.html#13-24) into Rocket. Then we could do this correctly. I'd be fine with using with_capacity with a safe lower bound as well. Perhaps you could measure the performance implications with different lower bounds and choose a good value.

`serde_json::from_reader` is considerably slow for some reason.

serde-rs/json#160
@messense messense force-pushed the feature/serde-json-from-str branch from 203a26e to 4aed87b Compare February 17, 2018 02:37
@messense messense changed the title Use serde_json::from_str instead of from_reader when deserialize Json Use from_reader_eager instead of from_reader when deserialize Json Feb 17, 2018
SergioBenitez pushed a commit that referenced this pull request Feb 19, 2018
Issue #547 identified a performance issue when serde's 'from_reader' is
used to deserialize incoming data. Using 'from_str' resolves the issue.
This commit swaps a use of 'from_reader' in favor of 'from_str' in
rocket_contrib's 'Json' implementation.

Additionally, this commit ensures that un-deserialized JSON data is
discarded as long as it is within the JSON data limit.

Closes #562.
@SergioBenitez
Copy link
Member

Merged in ae8e902. Thanks so much for finding, reporting, and fixing this @messense! Awesome work. :)

@SergioBenitez SergioBenitez added the pr: merged This pull request was merged manually. label Feb 19, 2018
@messense messense deleted the feature/serde-json-from-str branch February 20, 2018 12:30
@messense
Copy link
Contributor Author

I will measure the performance implications with different lower bounds when I have the time.

Would be great to have this patch backported to 0.3.x version!

@messense
Copy link
Contributor Author

messense commented Feb 22, 2018

Regarding the buffer capacity question, I just did a simple instrument with the json example crate.

I am using the following Python script to send some requests the server.

# -*- coding: utf-8 -*-
from concurrent.futures import ThreadPoolExecutor

from requests import Session

session = Session()


def post_json(index):
    return session.post(
        'http://localhost:8000/message/{}'.format(index),
        json={
            'contents': 'Hello World!' * 1000
        }
    )


def bench():
    with ThreadPoolExecutor() as executor:
        fs = [executor.submit(post_json, i) for i in range(10000)]
        _rs = [r.result() for r in fs]

if __name__ == '__main__':
    bench()

String::new()

image

String::with_capacity(1024)

image

String::with_capacity(4096)

image

Though I am not sure this is the correct way to measure its performance. To me it seems that use 1K buffer is reasonable.

SergioBenitez pushed a commit that referenced this pull request Feb 26, 2018
Issue #547 identified a performance issue when serde's 'from_reader' is
used to deserialize incoming data. Using 'from_str' resolves the issue.
This commit swaps a use of 'from_reader' in favor of 'from_str' in
rocket_contrib's 'Json' implementation.

Additionally, this commit ensures that un-deserialized JSON data is
discarded as long as it is within the JSON data limit.

Closes #562.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr: merged This pull request was merged manually.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants