-
Notifications
You must be signed in to change notification settings - Fork 326
2D shadowcasting optimizations #1864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2D shadowcasting optimizations #1864
Conversation
Well that's annoying. Turns out GCC does not like the idea of checking if a template argument is a nullptr. I've just added pragmas to suppress it but there might be a nicer way to write it. |
@@ -780,6 +815,31 @@ bool map::pl_line_of_sight( const tripoint &t, const int max_range ) const | |||
map_cache.camera_cache[t.x][t.y] > 0.0f; | |||
} | |||
|
|||
//This algorithm is highly inaccurate and only suitable for the low (<60) values use in shadowcasting | |||
//A starting constant of 21 and 4 iterations matches 2d 60^2. 16 and 5 matches 3d 60^3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If speed is so important here, it may be better to use distance^2
in lookups and have a 90^2
wide cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested this so obviously performance wise it's always a huge guess but I think this will slow things down due to memory latency. As much as the calculation is a large portion of the work of a single tile, it's a very small operation overall, even including the square roots most processors could calculate a good few table entries faster than they can look one up from main memory. We're relying on the L1 cache for the speedup, even being evicted to lower layers hurts a lot. 90^2 is often more than a full L1 by itself and it being sparse is a problem.
That said, I can't help but feel you're onto something. I haved racked my brain and you could do a 60x60 cache that's keyed by offset and avoid squaring as well but I think it's still too big. Whatever we use to get a key it needs to be that much faster than pythagoras to justify however much bigger the cache needs to be.
This is another guess but if you just want to get rid of the new distance algorithm it won't mean much, the difference is pretty minimal. Low enough that somehow it's slower when not using the lookup. The real reason it's here is because I hoped it would help more than it did so I tried it and even if it's only 1-2% it does help.
You can always go further with optimization but at some point we're going to hit the fact that shadowcasting is highly optimized for 1980's processors and a lot of that makes it bad for 2020 processors. All of the clever logic to avoid touching obscured tiles means more data dependencies and unpredictable branches. It's funny but even the first unoptimized non-recursive version of shadowcasting might run better these days.
I think that's a problem for another day though. I am going to do an ad-hoc 2d->3d extension fairly soon. I took a look at a 0d->nd version and making it efficient while handling some dimensions having floors or vehicle holes is gonna be a hot mess of templates. Instead I'm gonna add just a couple more template functions to 2d's ever growing list that let 3d keep track of what it's doing. I imagine at that point 3d's performance will be determined almost entirely by 2d's and hopefully it'll end up quite a bit faster.
src/lightmap.cpp
Outdated
last_intensity = calc( numerator, cumulative_transparency, dist ); | ||
#pragma GCC diagnostic push | ||
#pragma GCC diagnostic ignored "-Waddress" | ||
if( lookup != nullptr ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no better way to do it than giant pragma spam?
Maybe an inline function would work.
8727906
to
234b882
Compare
Good idea. That's done. |
a = ( a + b ) / 2; | ||
} | ||
|
||
return a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I spent a longer while plotting this approximation vs real sqrt
and it looks like up to 72 tiles of distance, it overshoots by 1 exactly 3 times, assuming integer truncation, at distances that are squared to [80, 1088, 1224]
.
So the only potential problem is at points at coords like (8, 4)
from player (8*8 + 4*4 = 64 + 16 = 80
). If it looks OK at those 4 points, then it's good good enough. Will check those.
Not sure how much detail are we losing with the float purge, but it's probably not visible to the naked eye, except to particularly skilled texture enjoyers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the constants via brute forcing the full 60*60 range to find ones that match exactly. The truncation to an integer happens in the old algorithm. It assigns it to "const int dist" and the template argument is an int. The results should be identical. Actually that's not 100% true, there are two exp algorithms. An accurate one for sight and a fast one for light. The tables, which light will still use, use the accurate version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the points and didn't notice a difference. It's subtle enough that it shouldn't matter.
Alright, it looks good. I couldn't find any bugs. Readability of the shadowcasting bits has been in shambles for a while now, but it's not worsened that much here. Once the feature freeze is over, it can probably go in with just minimal re-testing ("game loads, vision doesn't look broken"). Unless something big changes in vision code or this PR. |
Another optimization added, this simplifies the calculation of delta.x at the beginning of each row a little and adds a similar calculation for the limit of the row so we can remove the end > trailingEdge condition as well. This means we don't need to calculate the edges of each tile, only the ones that cause splits. |
Conflicts with current |
7ca9b33
to
1d6db84
Compare
is this still in the making or even ready to go? |
Yeah this should be good to go. |
Summary
SUMMARY: Performance "Optimizations to 2d shadowcasting"
Purpose of change
Optimizations for 2d shadowcasting. 2d shadowcasting now supports lookup tables that let it avoid transparency calculations. In practice this means that light and vision now run about 30-40% faster in most situations.
Describe the solution
Two lookup tables are maintained, a constant one for open air and one that changes with the weather. They store a table of distance -> attenuation assuming their constant transparency. When a cast starts the first tile is sampled to see if it's open air or weather to select a table. After that the table will be used until a tile of a different transparency is found. The recursive cast caused by this change will use normal transparency calculations from there on out. So so long as a cast moves through one of open air or weather it can skip a lot of calculation.
There are also a few other small things. There's a faster square root algorithm that exploits the fact that it rounds distances to integers anyway, also moving a tile doesn't always increase the distance and so recalculation is skipped in those cases. Note that both these optimizations are only enabled with lookups on and off respectively, they're slower otherwise.
There is one very small change here. You no longer ignore weather effects on the player's tile. This means casts that only move through the weather always have a cumulative transparency equal to the weather's transparency. This could be changed back at the expense of a little more complexity. You need to factor it in when calculating the tables and use the distance to calculate the real cumulative transparency when you move off the lookup, but it makes very little difference to gameplay.
Describe alternatives you've considered
I did this a long while back but I intended to apply it to 3d shadowcasting before pushing it, afaik 2d isn't really a performance problem for anyone. I've had a bit of a brainwave around shadowcasting though, at the least I'm going to write a 3d version that extends out from the current 2d version, but I might end up redoing the whole thing in a way that builds out from 0d. Either way it's a fairly big job and I'm gonna finish up the monster vision stuff first.