-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory usage and out-of-memory exception #502
Comments
Hi, Thanks for reporting this issue. It's pretty standard to use Tile38 in the way that you describe. But I generally see TTLs in the 30 seconds to 1-hour range. So this makes me think that the longer 24-hour TTL might be the problem. I'm going to do some testing on my side, but I want to make sure I use GeoJSON objects that are similar to yours. Could you share an example object? |
Hi, really appriciate the fast response :) The points we store usually look like this:
Here are the commands to insert those points:
Regarding the TTL length: I've run a small trial with a TTL of 1 hour which did run smoothly over the weekend. |
The fact that the 1 hour ran smoothly is a good hint. Thanks for sharing the geojson. I'll goof around and see what I can dig up. |
I’ve been able to reproduce the issue and it’s absolutely related to long TTLs. I know the cause and I plan a fix in the next day or so. I’ll keep you posted. |
You're awesome! Thanks! :) |
This commit fixes an issue where Tile38 was using lots of extra memory to track objects that are marked to expire. This was creating problems with applications that set big TTLs. How it worked before: Every collection had a unique hashmap that stores expiration timestamps for every object in that collection. Along with the hashmaps, there's also one big server-wide list that gets appended every time a new SET+EX is performed. From a background routine, this list is looped over at least 10 times per second and is randomly searched for a potential candidates that might need expiring. The routine then removes those entries from the list and tests if the objects matching the entries have actually expired. If so, these objects are deleted them from the database. When at least 25% of the 20 candidates are deleted the loop is immediately continued, otherwise the loop backs off with a 100ms pause. Why was this was a problem: The list grows one entry for every SET+EX. When TTLs are long, like 24-hours or more, it would take at least that long before the entry is removed. So if you have objects that use TTLs and are updated often this could lead to a very large list. Issue #501
I just pushed a fix. |
This commit fixes an issue where Tile38 was using lots of extra memory to track objects that are marked to expire. This was creating problems with applications that set big TTLs. How it worked before: Every collection had a unique hashmap that stores expiration timestamps for every object in that collection. Along with the hashmaps, there's also one big server-wide list that gets appended every time a new SET+EX is performed. From a background routine, this list is looped over at least 10 times per second and is randomly searched for potential candidates that might need expiring. The routine then removes those entries from the list and tests if the objects matching the entries have actually expired. If so, these objects are deleted them from the database. When at least 25% of the 20 candidates are deleted the loop is immediately continued, otherwise the loop backs off with a 100ms pause. Why this was a problem. The list grows one entry for every SET+EX. When TTLs are long, like 24-hours or more, it would take at least that much time before the entry is removed. So for databased that have objects that use TTLs and are updated often this could lead to a very large list. How it was fixed. The list was removed and the hashmap is now search randomly. This required a new hashmap implementation, as the built-in Go map does not provide an operation for randomly geting entries. The chosen implementation is a robinhood-hash because it provides open-addressing, which makes for simple random bucket selections. Issue #502
And here's the test program I used to monitor the growing heap. |
That's great news! |
I'm happy to help. I think everything looks in order now. I'll release a new version shortly. |
I just released a new version that includes this fix . 🎉 |
Hi, we're currently evaluation Tile38 as in-memory geo-index for monitoring a world-wide fleet.
We ingest ~1500 events/min, most of them updates to existing objects.
Objects are stored as GeoJSON points with some additional JSON attributes and 6 fields, with an tile38_avg_point_size of 22'242.
The insert happens in batches of 500 events and all objects have a TTL of 24h.
This setup runs fine with ~100'000 to 130'000 objects in the database, consuming 2-5GB of RAM.
But then crashes after about 26h with an out-of-memory exception.
The machine running the server is a 32GB Hetzner VM, so the non-linear consumption seems a bit excessive.
I also tried the
collection-optz
branch, which worked slightly better but still crashed.Are there any limitations I might not be aware of? Could the problem be related to expiring objects?
Would it be possible to add a fail-safe mechanism when the memory limit is reached, so it does not crash the entire server?
The text was updated successfully, but these errors were encountered: