Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor header parsing and data structure #61

Merged
merged 6 commits into from
Sep 20, 2024

Conversation

bgreni
Copy link
Contributor

@bgreni bgreni commented Sep 16, 2024

@saviorand I figured I might post a draft PR for this so I can see what you think about my approach before I get too much deeper into applying the changes and doing more thorough unit testing

Main points to review:

  • Replaced RequestHeader and ResponseHeader with a common Headers struct
    • Headers is essentially a Dict wrapper with parsing logic included
    • The first line info (method, protocol, etc) have been extracted out into HTTPRequest and HTTPResponse
  • Replace use of gojo Reader with an internal implementation. I have opinions about the usage of gojo in general in terms of mixing Go style semantics into a Mojo codebase, but also for something as performance sensitive as request parsing we probably want full control of it to make specific optimizations later?

Obviously this isn't done yet so it doesn't mean much but so far some light benchmarking I've done shows significant performance improvements

BEFORE:
Benchmark Report (ns)
---------------------
Mean: 30527.143607311646
Total: 2441622000.0
Iters: 79982
Warmup Mean: 54500.0
Warmup Total: 109000.0
Warmup Iters: 2
Fastest Mean: 30527.143607311646
Slowest Mean: 30527.143607311646

AFTER:
Benchmark Report (ns)
---------------------
Mean: 1626.186904822687
Total: 2356927000.0
Iters: 1449358
Warmup Mean: 7000.0
Warmup Total: 14000.0
Warmup Iters: 2
Fastest Mean: 1626.186904822687
Slowest Mean: 1626.186904822687

Fixes: #44

@saviorand
Copy link
Owner

@bgreni this is very impressive!! Love to see how the ~940 line header implementation is condense to ~80 lines. And the code now feels much more pythonic, too.

Definitely onboard with moving towards a more "idiomatic Mojo" approach, gojo was helpful to move fast but ideally I'd like to gradually refactor the whole codebase to a more Mojo-native code style. Although how "idiomatic" is defined in Mojo is still uncertain -- but I think there stdlib code and the Python ecosystem could be a good reference.

Also good point about the importance of control; I'm a bit surprised by the performance increase though, intuitively I'd expect a Dict to be slower than directly accessing a struct. Do you think the performance gain is due to switching to a different Reader implementation, or am I missing something?

@bgreni
Copy link
Contributor Author

bgreni commented Sep 16, 2024

Although how "idiomatic" is defined in Mojo is still uncertain

Yeah the idiomatic ways of doing things in Mojo is unfortunately still evolving, an example being that using Formatter is currently the idiomatic way to write out long strings like we do during the request encoding, but frankly it might not stay that way. Personally from my experience contributing to the stdlib I think staying as Pythonic as possible is desirable since thats what mojo target audience is expecting/used to.

Also good point about the importance of control; I'm a bit surprised by the performance increase though, intuitively I'd expect a Dict to be slower than directly accessing a struct. Do you think the performance gain is due to switching to a different Reader implementation, or am I missing something?

I think it comes down to skipping the work of actually parsing the fields, the original code does a lot string comparisons whereas now we simply split on the colon char and dump the key/value into a dict. And note in this case we're measuring in nanoseconds so the difference isn't that big, but it could be meaningful for server-side performance.

@bgreni
Copy link
Contributor Author

bgreni commented Sep 17, 2024

@saviorand I've just pushed some new changes, let me know what you think!

  • Applied changes to header, request, and response structs to places that depend on them
    • Applied similar changes to URI as well
  • Updated tests
  • Updated benchmarks copying the pattern currently frequently used in the stdlib

I've done away with most of the String/Bytes conversions since String has some fun optimizations like using vectorized operations for string comparison and heavily simplifies the interfaces.

I haven't touch PythonClient yet since I'm actually not sure I understand what it's for?

@saviorand
Copy link
Owner

@bgreni I added both the PythonClient and the PythonServer to Lightbug initially to have a working implementation quickly, they both use python libraries for e.g. socket operations instead of C calls. Still keeping them around in case someone would prefer to use Python-based client/server instead of the default one, but they're already a bit outdated. Feel free to ignore them!

Re: the string vs bytes, I understand there's a lot of work in the stdlib happening around string optimizations, curious what the performance difference is for the changes you made compared to previous implementation. Previously in an end-to-end benchmark with wrk (sending a bunch of requests at a server within a given timeframe) working with bytes directly was faster, but could well be that this changed now. I can try benchmarking myself, not to distract you. Thanks for the updates!

@bgreni bgreni force-pushed the header-refactor branch 2 times, most recently from 4602255 to 4cea92b Compare September 18, 2024 20:12
@bgreni
Copy link
Contributor Author

bgreni commented Sep 18, 2024

@saviorand Cool thanks for bringing wrk to my attention, I'm getting similar req/s numbers it seems, but the latency numbers with my changes are a bit concerning, let me see if I can work those down a bit

Using wrk -t12 -c12 -d10s http://127.0.0.1:8080/ against the server in lightbug.🔥

Before:
  12 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.95ms    5.71ms 100.00ms   99.13%
    Req/Sec     1.44k   768.31     1.97k    73.68%
  16353 requests in 10.09s, 24.94MB read
Requests/sec:   1620.35
Transfer/sec:      2.47MB

After:
Running 10s test @ http://127.0.0.1:8080/
  12 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.59ms   43.30ms 400.83ms   96.88%
    Req/Sec     1.63k   735.81     2.21k    83.84%
  16342 requests in 10.09s, 24.92MB read
Requests/sec:   1619.67
Transfer/sec:      2.47MB

EDIT:
Ok upon further testing I'm getting results like this sometimes so I'm not sure what the deal but it could be reasonable to claim the performance before/after is comparable?

Running 10s test @ http://127.0.0.1:8080/
  12 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   430.73us   78.15us   2.75ms   97.75%
    Req/Sec     1.86k   521.93     2.17k    81.82%
  16344 requests in 10.09s, 24.92MB read
Requests/sec:   1619.60
Transfer/sec:      2.47MB

@bgreni
Copy link
Contributor Author

bgreni commented Sep 18, 2024

@bgreni I added both the PythonClient and the PythonServer to Lightbug initially to have a working implementation quickly, they both use python libraries for e.g. socket operations instead of C calls. Still keeping them around in case someone would prefer to use Python-based client/server instead of the default one, but they're already a bit outdated. Feel free to ignore them!

With these changes they don't build anymore so in that case maybe we have to remove them entirely?

@saviorand
Copy link
Owner

@bgreni sure, I can remove the Python stuff, will push some changes here tomorrow.
For benchmarking I could also recommend running against bench.mojo, since lightbug.🔥 serves an image and is a bit less representative of a usual plaintext/json output one would expect from a web server . The numbers there are a bit higher overall though; for example, when we were running the benchmark against the handler in the version of Lightbug of June this year, the results were:

wrk -t1 -c1 -d1s http://0.0.0.0:8080/
Running 1s test @ http://0.0.0.0:8080/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    33.68us   11.31us 368.00us   96.26%
    Req/Sec    29.48k     2.38k   30.59k    90.91%
  32251 requests in 1.10s, 5.04MB read
Requests/sec:  29320.58
Transfer/sec:      4.59MB

Could be a good baseline to compare against . Or we can just try running the same against main, -t1 -c1 is because there is no concurrency yet in Lightbug so it's just one connection at a time

@bgreni
Copy link
Contributor Author

bgreni commented Sep 19, 2024

@saviorand Ah looks like part of the problem was I didn't notice the default Connection header for HTTPResponse is expected to be keep-alive in the server so tcp_keep_alive wasn't having any affect with my changes. After fixing that here's some preliminary results on my machine. My results with the existing implementation were similar to what you posted.

Running 1s test @ http://0.0.0.0:8080/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    28.67us    4.61us 199.00us   87.30%
    Req/Sec    33.96k     1.53k   35.05k    90.91%
  37186 requests in 1.10s, 5.82MB read
Requests/sec:  33807.79
Transfer/sec:      5.29MB

@saviorand
Copy link
Owner

@bgreni that's great! Marked increase to the previous performance 👏 👏
I think changing the logic to not go through the whole parsing is also completely justified.

@bgreni
Copy link
Contributor Author

bgreni commented Sep 19, 2024

@saviorand Ok awesome, I'll do some housekeeping (proper comments and such) and undraft the PR when it's ready

@bgreni bgreni force-pushed the header-refactor branch 2 times, most recently from a21430f to 97666f3 Compare September 20, 2024 00:02
…e more Pythonic

Signed-off-by: Brian Grenier <grenierb96@gmail.com>
@bgreni
Copy link
Contributor Author

bgreni commented Sep 20, 2024

  • Added more benchmark cases
  • Moved bench server into bench_server.mojo
  • Normalize header keys to lower case to make them case-insensitive
  • Removed Python client/server implementation
  • Added more magic tasks for running tests and benchmarks

@bgreni bgreni marked this pull request as ready for review September 20, 2024 00:07
@saviorand
Copy link
Owner

@bgreni everything looks great, I can feel the quality of this codebase improving with every commit you're making ! Looks like you were faster than me with removing the Python Client/Server, thanks for that.
Pushed a commit to enable the client test which is just a simple request to httpbin; related code wasn't compiling previously due to the PythonClient being outdated.
Thinking I'll also run mojo format for all files now, see if there are any changes

var value = r.read_line()
self._inner[to_string(key^).lower()] = to_string(value^)

return (to_string(first^), to_string(second^), to_string(third^))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think I caught a bug when I uncommented test_client.mojo - it hangs indefinitely when trying to return this tuple. I'm not 100% sure but my first guess is it's related to a bug in Mojo I was experiencing before where returning tuples hang and the program never exits.
We could file an issue in stdlib and for now replace this return with something else? What do you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an error in the creation of the HTTPRequest in that test which Seemingly caused httpbin to return nothing, which broke the parsing and caused an infinite loop, so still a bug that should be addressed at some point

@bgreni
Copy link
Contributor Author

bgreni commented Sep 20, 2024

@bgreni everything looks great, I can feel the quality of this codebase improving with every commit you're making !

Thank you for indulging me on this refactor, I'm glad you like the result!

@saviorand
Copy link
Owner

@bgreni pushed one more fix for the client.mojo and updated the README, ready to merge

@saviorand saviorand merged commit 805372e into saviorand:main Sep 20, 2024
2 checks passed
@saviorand
Copy link
Owner

Merged! Huge thanks again for the contribution 🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Core structs (Request, Response, Header, ResponseHeader, URI) should conform to Stringable
2 participants