Refactor header parsing and data structure #61

bgreni · 2024-09-16T03:35:19Z

@saviorand I figured I might post a draft PR for this so I can see what you think about my approach before I get too much deeper into applying the changes and doing more thorough unit testing

Main points to review:

Replaced RequestHeader and ResponseHeader with a common Headers struct
- Headers is essentially a Dict wrapper with parsing logic included
- The first line info (method, protocol, etc) have been extracted out into HTTPRequest and HTTPResponse
Replace use of gojo Reader with an internal implementation. I have opinions about the usage of gojo in general in terms of mixing Go style semantics into a Mojo codebase, but also for something as performance sensitive as request parsing we probably want full control of it to make specific optimizations later?

Obviously this isn't done yet so it doesn't mean much but so far some light benchmarking I've done shows significant performance improvements

BEFORE:
Benchmark Report (ns)
---------------------
Mean: 30527.143607311646
Total: 2441622000.0
Iters: 79982
Warmup Mean: 54500.0
Warmup Total: 109000.0
Warmup Iters: 2
Fastest Mean: 30527.143607311646
Slowest Mean: 30527.143607311646

AFTER:
Benchmark Report (ns)
---------------------
Mean: 1626.186904822687
Total: 2356927000.0
Iters: 1449358
Warmup Mean: 7000.0
Warmup Total: 14000.0
Warmup Iters: 2
Fastest Mean: 1626.186904822687
Slowest Mean: 1626.186904822687

Fixes: #44

saviorand · 2024-09-16T20:23:55Z

@bgreni this is very impressive!! Love to see how the ~940 line header implementation is condense to ~80 lines. And the code now feels much more pythonic, too.

Definitely onboard with moving towards a more "idiomatic Mojo" approach, gojo was helpful to move fast but ideally I'd like to gradually refactor the whole codebase to a more Mojo-native code style. Although how "idiomatic" is defined in Mojo is still uncertain -- but I think there stdlib code and the Python ecosystem could be a good reference.

Also good point about the importance of control; I'm a bit surprised by the performance increase though, intuitively I'd expect a Dict to be slower than directly accessing a struct. Do you think the performance gain is due to switching to a different Reader implementation, or am I missing something?

bgreni · 2024-09-16T21:31:15Z

Although how "idiomatic" is defined in Mojo is still uncertain

Yeah the idiomatic ways of doing things in Mojo is unfortunately still evolving, an example being that using Formatter is currently the idiomatic way to write out long strings like we do during the request encoding, but frankly it might not stay that way. Personally from my experience contributing to the stdlib I think staying as Pythonic as possible is desirable since thats what mojo target audience is expecting/used to.

Also good point about the importance of control; I'm a bit surprised by the performance increase though, intuitively I'd expect a Dict to be slower than directly accessing a struct. Do you think the performance gain is due to switching to a different Reader implementation, or am I missing something?

I think it comes down to skipping the work of actually parsing the fields, the original code does a lot string comparisons whereas now we simply split on the colon char and dump the key/value into a dict. And note in this case we're measuring in nanoseconds so the difference isn't that big, but it could be meaningful for server-side performance.

bgreni · 2024-09-17T22:05:04Z

@saviorand I've just pushed some new changes, let me know what you think!

Applied changes to header, request, and response structs to places that depend on them
- Applied similar changes to URI as well
Updated tests
Updated benchmarks copying the pattern currently frequently used in the stdlib

I've done away with most of the String/Bytes conversions since String has some fun optimizations like using vectorized operations for string comparison and heavily simplifies the interfaces.

I haven't touch PythonClient yet since I'm actually not sure I understand what it's for?

saviorand · 2024-09-18T17:39:32Z

@bgreni I added both the PythonClient and the PythonServer to Lightbug initially to have a working implementation quickly, they both use python libraries for e.g. socket operations instead of C calls. Still keeping them around in case someone would prefer to use Python-based client/server instead of the default one, but they're already a bit outdated. Feel free to ignore them!

Re: the string vs bytes, I understand there's a lot of work in the stdlib happening around string optimizations, curious what the performance difference is for the changes you made compared to previous implementation. Previously in an end-to-end benchmark with wrk (sending a bunch of requests at a server within a given timeframe) working with bytes directly was faster, but could well be that this changed now. I can try benchmarking myself, not to distract you. Thanks for the updates!

bgreni · 2024-09-18T21:37:19Z

@saviorand Cool thanks for bringing wrk to my attention, I'm getting similar req/s numbers it seems, but the latency numbers with my changes are a bit concerning, let me see if I can work those down a bit

Using wrk -t12 -c12 -d10s http://127.0.0.1:8080/ against the server in lightbug.🔥

Before:
  12 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.95ms    5.71ms 100.00ms   99.13%
    Req/Sec     1.44k   768.31     1.97k    73.68%
  16353 requests in 10.09s, 24.94MB read
Requests/sec:   1620.35
Transfer/sec:      2.47MB

After:
Running 10s test @ http://127.0.0.1:8080/
  12 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.59ms   43.30ms 400.83ms   96.88%
    Req/Sec     1.63k   735.81     2.21k    83.84%
  16342 requests in 10.09s, 24.92MB read
Requests/sec:   1619.67
Transfer/sec:      2.47MB

EDIT:
Ok upon further testing I'm getting results like this sometimes so I'm not sure what the deal but it could be reasonable to claim the performance before/after is comparable?

Running 10s test @ http://127.0.0.1:8080/
  12 threads and 12 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   430.73us   78.15us   2.75ms   97.75%
    Req/Sec     1.86k   521.93     2.17k    81.82%
  16344 requests in 10.09s, 24.92MB read
Requests/sec:   1619.60
Transfer/sec:      2.47MB

bgreni · 2024-09-18T23:08:06Z

@bgreni I added both the PythonClient and the PythonServer to Lightbug initially to have a working implementation quickly, they both use python libraries for e.g. socket operations instead of C calls. Still keeping them around in case someone would prefer to use Python-based client/server instead of the default one, but they're already a bit outdated. Feel free to ignore them!

With these changes they don't build anymore so in that case maybe we have to remove them entirely?

saviorand · 2024-09-19T18:19:11Z

@bgreni sure, I can remove the Python stuff, will push some changes here tomorrow.
For benchmarking I could also recommend running against bench.mojo, since lightbug.🔥 serves an image and is a bit less representative of a usual plaintext/json output one would expect from a web server . The numbers there are a bit higher overall though; for example, when we were running the benchmark against the handler in the version of Lightbug of June this year, the results were:

wrk -t1 -c1 -d1s http://0.0.0.0:8080/
Running 1s test @ http://0.0.0.0:8080/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    33.68us   11.31us 368.00us   96.26%
    Req/Sec    29.48k     2.38k   30.59k    90.91%
  32251 requests in 1.10s, 5.04MB read
Requests/sec:  29320.58
Transfer/sec:      4.59MB

Could be a good baseline to compare against . Or we can just try running the same against main, -t1 -c1 is because there is no concurrency yet in Lightbug so it's just one connection at a time

bgreni · 2024-09-19T20:21:24Z

@saviorand Ah looks like part of the problem was I didn't notice the default Connection header for HTTPResponse is expected to be keep-alive in the server so tcp_keep_alive wasn't having any affect with my changes. After fixing that here's some preliminary results on my machine. My results with the existing implementation were similar to what you posted.

Running 1s test @ http://0.0.0.0:8080/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    28.67us    4.61us 199.00us   87.30%
    Req/Sec    33.96k     1.53k   35.05k    90.91%
  37186 requests in 1.10s, 5.82MB read
Requests/sec:  33807.79
Transfer/sec:      5.29MB

saviorand · 2024-09-19T20:26:30Z

@bgreni that's great! Marked increase to the previous performance 👏 👏
I think changing the logic to not go through the whole parsing is also completely justified.

bgreni · 2024-09-19T21:01:28Z

@saviorand Ok awesome, I'll do some housekeeping (proper comments and such) and undraft the PR when it's ready

…e more Pythonic Signed-off-by: Brian Grenier <grenierb96@gmail.com>

bgreni · 2024-09-20T00:06:59Z

Added more benchmark cases
Moved bench server into bench_server.mojo
Normalize header keys to lower case to make them case-insensitive
Removed Python client/server implementation
Added more magic tasks for running tests and benchmarks

saviorand · 2024-09-20T17:50:36Z

@bgreni everything looks great, I can feel the quality of this codebase improving with every commit you're making ! Looks like you were faster than me with removing the Python Client/Server, thanks for that.
Pushed a commit to enable the client test which is just a simple request to httpbin; related code wasn't compiling previously due to the PythonClient being outdated.
Thinking I'll also run mojo format for all files now, see if there are any changes

saviorand · 2024-09-20T17:57:14Z

lightbug_http/header.mojo

+            var value = r.read_line()
+            self._inner[to_string(key^).lower()] = to_string(value^)
+
+        return (to_string(first^), to_string(second^), to_string(third^))


Here I think I caught a bug when I uncommented test_client.mojo - it hangs indefinitely when trying to return this tuple. I'm not 100% sure but my first guess is it's related to a bug in Mojo I was experiencing before where returning tuples hang and the program never exits.
We could file an issue in stdlib and for now replace this return with something else? What do you think

There was an error in the creation of the HTTPRequest in that test which Seemingly caused httpbin to return nothing, which broke the parsing and caused an infinite loop, so still a bug that should be addressed at some point

bgreni · 2024-09-20T18:19:38Z

@bgreni everything looks great, I can feel the quality of this codebase improving with every commit you're making !

Thank you for indulging me on this refactor, I'm glad you like the result!

saviorand · 2024-09-20T18:22:20Z

@bgreni pushed one more fix for the client.mojo and updated the README, ready to merge

saviorand · 2024-09-20T18:29:11Z

Merged! Huge thanks again for the contribution 🔥

bgreni force-pushed the header-refactor branch from c00d7fa to 4ed5ec9 Compare September 17, 2024 21:37

bgreni force-pushed the header-refactor branch 2 times, most recently from 4602255 to 4cea92b Compare September 18, 2024 20:12

bgreni force-pushed the header-refactor branch from 4cea92b to 4f87531 Compare September 19, 2024 18:06

bgreni force-pushed the header-refactor branch from 4f87531 to 70a7307 Compare September 19, 2024 20:21

bgreni force-pushed the header-refactor branch 2 times, most recently from a21430f to 97666f3 Compare September 20, 2024 00:02

Refactor request, response and header parsing and data structure to b…

cf22edc

…e more Pythonic Signed-off-by: Brian Grenier <grenierb96@gmail.com>

bgreni force-pushed the header-refactor branch from 97666f3 to cf22edc Compare September 20, 2024 00:04

bgreni marked this pull request as ready for review September 20, 2024 00:07

uncomment client test

7932d9c

mojo format all files

de0f946

saviorand reviewed Sep 20, 2024

View reviewed changes

Fix client test headers

0e79892

update readme and client example

0814562

fix the client test

1483253

saviorand merged commit 805372e into saviorand:main Sep 20, 2024
2 checks passed

This was referenced Sep 21, 2024

Running the PythonClient Throws a Compilation Error #41

Closed

Client tests don't work with changes in Mojo 24.3 #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor header parsing and data structure #61

Refactor header parsing and data structure #61

bgreni commented Sep 16, 2024 •

edited

Loading

saviorand commented Sep 16, 2024

bgreni commented Sep 16, 2024

bgreni commented Sep 17, 2024

saviorand commented Sep 18, 2024

bgreni commented Sep 18, 2024 •

edited

Loading

bgreni commented Sep 18, 2024

saviorand commented Sep 19, 2024

bgreni commented Sep 19, 2024

saviorand commented Sep 19, 2024

bgreni commented Sep 19, 2024

bgreni commented Sep 20, 2024 •

edited

Loading

saviorand commented Sep 20, 2024

saviorand Sep 20, 2024

bgreni Sep 20, 2024

bgreni commented Sep 20, 2024

saviorand commented Sep 20, 2024

saviorand commented Sep 20, 2024

Refactor header parsing and data structure #61

Refactor header parsing and data structure #61

Conversation

bgreni commented Sep 16, 2024 • edited Loading

saviorand commented Sep 16, 2024

bgreni commented Sep 16, 2024

bgreni commented Sep 17, 2024

saviorand commented Sep 18, 2024

bgreni commented Sep 18, 2024 • edited Loading

bgreni commented Sep 18, 2024

saviorand commented Sep 19, 2024

bgreni commented Sep 19, 2024

saviorand commented Sep 19, 2024

bgreni commented Sep 19, 2024

bgreni commented Sep 20, 2024 • edited Loading

saviorand commented Sep 20, 2024

saviorand Sep 20, 2024

Choose a reason for hiding this comment

bgreni Sep 20, 2024

Choose a reason for hiding this comment

bgreni commented Sep 20, 2024

saviorand commented Sep 20, 2024

saviorand commented Sep 20, 2024

bgreni commented Sep 16, 2024 •

edited

Loading

bgreni commented Sep 18, 2024 •

edited

Loading

bgreni commented Sep 20, 2024 •

edited

Loading