API for serving values from the Fibonacci sequence.
If you have go installed, you can clone the repo, build it, and run it.
git clone git@github.com:macintoshpie/fibapi.git &&
cd fibapi &&
go build . &&
./fibapi -port 8080
Use a GET
request to any of the following endpoints. A JSON response is provided:
"index": (uint32) index of API request
"value": (string) value of fibonacci sequence at index
Get current index and value
Increment index and get the value
Decrement index and get the value
The server is a basic net/http
server, all in main.go
. Atomic methods are used for manipulating the current index, and the json encoder for writing responses. The api supports recovery by occasionally writing the current index to a file, which it attempts to read on each startup. Debugging endpoints were added to get info about the cache as well as profiling info.
A few different caches were tried including fastcache, hashicorp's lru, and a simple slice. After doing some profiling, it was determined the slice was more performant.
For the caching strategy, I decided that only caching the "result values" (those returned in API response) was a problematic approach. For example, if our cache holds n
values and we've made k
calls, k >> n
, to the /next
endpoint we'd have the n
most recent values in cache. If we then make n
calls to /previous
, any following calls to /previous
will result in a full cache miss and the value must be calculated. This could be very expensive if our current index is a very high number.
I thought the best approach would be to cache values in pairs because you only need fib(i)
and fib(i-1)
to calculate any of the following values. As a result, the cache is more resistent to "holes" in the sequence that would otherwise require calculating a value from the beginning. All that's required is searching for the largest index cached that's less than your index and you can start from there.
I also added cachePadding
so that values could be cached at intervals rather than every single one. This reduces memory usage by some factor and also gives the cache a higher tolerance for /previous
calls, avoiding having to recalculate values all the way from the beginning. The downside is that you'll rarely get direct cache hits, so a bit of computation is still required from some starting pair. I'd be interested in writing some full tests to determine the ideal cache size and cachePadding
, but I ran out of time.
I wrote some basic tests in fib_test.go
for the fibonacci functionality.
For performance testing, I put the app in a docker container with limited resources (1 CPU, 512mb Memory). I then used wrk, and http_load for load testing. Looking at docker stats
to view container resource usage, all tests kept the CPU at 100%, and remained under the 512mb memory limit.
I started a server at index 0, then ran wrk for 90 seconds. My API was able to handle ~1,000 requests per second:
(base) Teds-MacBook-Pro:wrk tedsummer$ ./wrk -t12 -c12 -d90s http://127.0.0.1:8080/next
Running 2m test @ http://127.0.0.1:8080/next
12 threads and 12 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 19.94ms 22.63ms 228.14ms 80.45%
Req/Sec 86.46 59.18 470.00 84.84%
92993 requests in 1.50m, 0.86GB read
Requests/sec: 1032.45
Transfer/sec: 9.73MB
(base) Teds-MacBook-Pro:wrk tedsummer$ curl http://127.0.0.1:8080/current
{"index":93005,"value":"3515575...<edited out>"}
This is the worst case for my API since it results in the most cache misses, meaning we have to brute force calculate large values. For this test, I set the starting index to 100,000 (with an empty cache) and hit the /previous
endpoint for 90 seconds. It averaged out about 700 requests per second.
(base) Teds-MacBook-Pro:wrk tedsummer$ ./wrk -t12 -c12 -d90s http://127.0.0.1:8080/previous
Running 2m test @ http://127.0.0.1:8080/previous
12 threads and 12 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 29.61ms 72.49ms 1.65s 99.01%
Req/Sec 61.97 26.66 220.00 73.28%
65904 requests in 1.50m, 0.87GB read
Requests/sec: 731.52
Transfer/sec: 9.89MB
(base) Teds-MacBook-Pro:wrk tedsummer$ curl http://127.0.0.1:8080/current
{"index":34086,"value":"159668982821...<edited out>"}
Again I set the starting index to 100,000 with an empty cache, then hit /current
with wrk for 90 seconds. The results were pretty poor, about 400 rps. This explained below in the Bottlenecks section.
(base) Teds-MacBook-Pro:wrk tedsummer$ ./wrk -t12 -c12 -d90s http://127.0.0.1:8080/current
Running 2m test @ http://127.0.0.1:8080/current
12 threads and 12 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 36.44ms 67.20ms 1.31s 99.04%
Req/Sec 38.22 10.79 110.00 70.83%
40876 requests in 1.50m, 821.17MB read
Requests/sec: 453.65
Transfer/sec: 9.11MB
For this test, a random sequence of 1000 requests to each endpoint was used with the http_load tool. Unfortunately http_load doesn't seem to work very well. As a result it reported about 500 requests per second. I really believe these poor results are due to the tool, and that it's probably capable of at least 1000 rps. I'd like to find a better tool for testing multiple URLs (siege seems like a good option).
49134 fetches, 12 max parallel, 1.74852e+06 bytes, in 90.0042 seconds
35.5867 mean bytes/connection
545.908 fetches/sec, 19427.1 bytes/sec
msecs/connect: 1.23999 mean, 4095.29 max, 0.06 min
msecs/first-response: 6.07299 mean, 2380.89 max, 0.926 min
44134 bad byte counts
HTTP response codes:
code 200 -- 49110
Using the debugging endpoints, it was easy to see what was slowing it down. See the top10 below and the next.cpu.svg
file (9 sec sample from /next
load test)
Showing nodes accounting for 2960ms, 89.97% of 3290ms total
Dropped 48 nodes (cum <= 16.45ms)
Showing top 10 nodes out of 56
flat flat% sum% cum cum%
1080ms 32.83% 32.83% 1080ms 32.83% math/big.mulAddVWW
650ms 19.76% 52.58% 650ms 19.76% math/big.subVV
310ms 9.42% 62.01% 2210ms 67.17% math/big.nat.divLarge
280ms 8.51% 70.52% 280ms 8.51% math/big.divWVW
200ms 6.08% 76.60% 200ms 6.08% syscall.Syscall
150ms 4.56% 81.16% 2690ms 81.76% math/big.nat.convertWords
150ms 4.56% 85.71% 150ms 4.56% time.now
50ms 1.52% 87.23% 50ms 1.52% encoding/json.(*encodeState).string
50ms 1.52% 88.75% 50ms 1.52% internal/poll.runtime_pollSetDeadline
40ms 1.22% 89.97% 40ms 1.22% math/big.greaterThan
Converting the big.Ints into strings was the most costly operation.
After running some tests, I found that writing []byte
could be more than 2x faster and implemented my own basic addition on the slices. But in the end my method of adding digits was too slow so I had to scrap it.
I think the solution to this problem could be a smarter implementation of the []byte
storing and addition, but I'd have to take a deeper look at go's big.nat implementation to see how they accomplish fast addition.