refactor: better python performance#500
Conversation
|
Do you believe running Pypy would improve Python performance? Pypy + FastAPI = fastapi/fastapi#3944 |
|
Maybe @emerson-proenca; If you make a PR I'd love to run it. I've not had a lot of experience with it, so I consider it to be esoteric at the moment; but maybe it's very easy and I'm just unclear about approach. I do know AI was telling me I was barking up the wrong tree with this, but then when I looked at all my latencies, they seemed to support me, more than AI review of my work. |
Here are some logs from K6 test included with PR, run on an M2 Air laptop /\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: ../script.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
* default: 100 looping VUs for 30s (gracefulStop: 30s)
✓ is status 201
checks.........................: 100.00% ✓ 137626 ✗ 0
data_received..................: 44 MB 1.5 MB/s
data_sent......................: 26 MB 878 kB/s
http_req_blocked...............: avg=2.77µs min=0s med=1µs max=18.46ms p(90)=2µs p(95)=2µs
http_req_connecting............: avg=1.22µs min=0s med=0s max=6.77ms p(90)=0s p(95)=0s
http_req_duration..............: avg=21.58ms min=6.09ms med=18.25ms max=1.45s p(90)=28.56ms p(95)=37.09ms
{ expected_response:true }...: avg=21.58ms min=6.09ms med=18.25ms max=1.45s p(90)=28.56ms p(95)=37.09ms
http_req_failed................: 0.00% ✓ 0 ✗ 137626
http_req_receiving.............: avg=16.37µs min=4µs med=10µs max=5.15ms p(90)=30µs p(95)=40µs
http_req_sending...............: avg=5.25µs min=1µs med=3µs max=4.96ms p(90)=8µs p(95)=12µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=21.56ms min=6.02ms med=18.23ms max=1.45s p(90)=28.53ms p(95)=37.08ms
http_reqs......................: 137626 4585.523346/s
iteration_duration.............: avg=21.79ms min=6.47ms med=18.45ms max=1.45s p(90)=28.77ms p(95)=37.32ms
iterations.....................: 137626 4585.523346/s
vus............................: 100 min=100 max=100
vus_max........................: 100 min=100 max=100
|
|
Apologies, I realised after re-running the base-line I'd committed something I should not have. I've commented it out, but basically, I doubled the number of queries being run as part of a failed experiment and then accidentally |
antonputra
left a comment
There was a problem hiding this comment.
Thank you, @Lewiscowles1986, for the PR. Sorry I don't have clear guidelines for pull requests. Would you be able to just create a fastapi-app-v2 folder and keep the original as is? It's just easier to compare in the future.
|
To be honest, it merging isnt' the most important thing as you'll only get at most a few thousand requests (not per second, overall 😄 ) Thank you for the tutorial; if you'd still like the changes I'll make them |
|
@Lewiscowles1986 Yes please, I'll keep it. When it's time to refresh the benchmark, I'll go over and test everything again. |
acf0486 to
c57dc3f
Compare
|
It now points to a separate folder; sorry I've been unwell; it's also simpler than the original patch I'd sent over. Before% k6 run ./script.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: ./script.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
* default: 100 looping VUs for 30s (gracefulStop: 30s)
✓ is status 201
checks.........................: 100.00% ✓ 124525 ✗ 0
data_received..................: 40 MB 1.3 MB/s
data_sent......................: 24 MB 794 kB/s
http_req_blocked...............: avg=2.93µs min=0s med=1µs max=7.66ms p(90)=2µs p(95)=2µs
http_req_connecting............: avg=1.78µs min=0s med=0s max=7.23ms p(90)=0s p(95)=0s
http_req_duration..............: avg=23.87ms min=6.17ms med=19.8ms max=1.36s p(90)=32.42ms p(95)=42.4ms
{ expected_response:true }...: avg=23.87ms min=6.17ms med=19.8ms max=1.36s p(90)=32.42ms p(95)=42.4ms
http_req_failed................: 0.00% ✓ 0 ✗ 124525
http_req_receiving.............: avg=17.23µs min=4µs med=10µs max=12.88ms p(90)=32µs p(95)=43µs
http_req_sending...............: avg=5.55µs min=1µs med=3µs max=6.5ms p(90)=9µs p(95)=13µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=23.84ms min=6.15ms med=19.78ms max=1.36s p(90)=32.4ms p(95)=42.37ms
http_reqs......................: 124525 4149.135507/s
iteration_duration.............: avg=24.08ms min=6.39ms med=20ms max=1.36s p(90)=32.65ms p(95)=42.69ms
iterations.....................: 124525 4149.135507/s
vus............................: 100 min=100 max=100
vus_max........................: 100 min=100 max=100
running (0m30.0s), 000/100 VUs, 124525 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs 30sAfter% k6 run ./script.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: ./script.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
* default: 100 looping VUs for 30s (gracefulStop: 30s)
✓ is status 201
checks.........................: 100.00% ✓ 133066 ✗ 0
data_received..................: 43 MB 1.4 MB/s
data_sent......................: 26 MB 849 kB/s
http_req_blocked...............: avg=6.84µs min=0s med=1µs max=17.94ms p(90)=1µs p(95)=2µs
http_req_connecting............: avg=5.47µs min=0s med=0s max=14.93ms p(90)=0s p(95)=0s
http_req_duration..............: avg=22.34ms min=8.13ms med=19.96ms max=231.87ms p(90)=28.29ms p(95)=33.88ms
{ expected_response:true }...: avg=22.34ms min=8.13ms med=19.96ms max=231.87ms p(90)=28.29ms p(95)=33.88ms
http_req_failed................: 0.00% ✓ 0 ✗ 133066
http_req_receiving.............: avg=14.72µs min=4µs med=9µs max=5.26ms p(90)=28µs p(95)=37µs
http_req_sending...............: avg=4.94µs min=1µs med=4µs max=4.22ms p(90)=7µs p(95)=12µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=22.32ms min=8.03ms med=19.94ms max=231.85ms p(90)=28.26ms p(95)=33.85ms
http_reqs......................: 133066 4434.119736/s
iteration_duration.............: avg=22.53ms min=8.32ms med=20.12ms max=232.2ms p(90)=28.52ms p(95)=34.15ms
iterations.....................: 133066 4434.119736/s
vus............................: 100 min=100 max=100
vus_max........................: 100 min=100 max=100
running (0m30.0s), 000/100 VUs, 133066 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs 30s |
|
@emerson-proenca I've read up on this and apparently, this should be as good as it gets using FastAPI under python without pre-compiling to something else. While python can have good IO bound perf, at this scale it seems the overhead to maintain that is CPU bound. Pypy would apparently not help; although if you fancy forking any of this code, please do. One thing I do like about the new version is no need for a class, just using the python module instance for the database pool, and letting the library manage that. |
Eventually I'll try Pypy and others python interpreters/compilers to see if it can improve performance, to be honest I agree with you and I don't think it'll make a difference, maybe if we remove logging it will? |
|
Many thanks to everyone! I would recommend setting workers explicitly for both grainian and bun to ensure better fairness in benchmark; I noticed guvicron was set to 1 worker in the original test. It's always important to have a benchmark that has "average code" rather than performing esoteric operations or configuration. |
|
@Spitfire1900 no idea what you felt was esoteric, so happy to have you explicitly point out what is about reducing the code; but also workers defaults to 1 for granian... |
|
I don't think anything is esoteric at this point, but it's something to keep in mind. It looks like Bun auto-manages thread size, but supports explicitly setting threads via a hidden env var: https://github.com/oven-sh/bun/pull/14401/files#diff-52c167d9c0701a90829218516da76eefd4a0a05a2c57dd216df4599105c921e9 |
|
Another consideration, https://github.com/antonputra/tutorials/pull/500/files#diff-de1516dcd4f2ef07f2807fc25a08c5741d9bda870629052f59f969a9e5b12fd7R37 uses |
|
@Spitfire1900 FastAPI and Starlette do not support |
So I tried a few things, which get the minimum time in k6 down to microseconds for a request. They set slightly more aggressive timeouts, use pydantic to marshal the response from the POST endpoint to the DB.
and set a latency on every request DB calls of 1 second.(the set local DB call, actually creates some overhead)Ive not re-run the entire test, instead I've contributed a k6 test, which I used to verify on my local machine; within the also modified docker-compose yaml.
The migrations are still manual; and you'll probably notice I tried increasing server connections. I Think adjusting the
min_sizeof db connections could be another good way to gain performance.I'm pretty sure it still wont top bun, but you could get a few more thousand requests per second