Skip to content

Conversation

joveeater
Copy link
Collaborator

Summary

SUMMARY: Performance "Optimizations to 2d shadowcasting"

Purpose of change

Optimizations for 2d shadowcasting. 2d shadowcasting now supports lookup tables that let it avoid transparency calculations. In practice this means that light and vision now run about 30-40% faster in most situations.

Describe the solution

Two lookup tables are maintained, a constant one for open air and one that changes with the weather. They store a table of distance -> attenuation assuming their constant transparency. When a cast starts the first tile is sampled to see if it's open air or weather to select a table. After that the table will be used until a tile of a different transparency is found. The recursive cast caused by this change will use normal transparency calculations from there on out. So so long as a cast moves through one of open air or weather it can skip a lot of calculation.

There are also a few other small things. There's a faster square root algorithm that exploits the fact that it rounds distances to integers anyway, also moving a tile doesn't always increase the distance and so recalculation is skipped in those cases. Note that both these optimizations are only enabled with lookups on and off respectively, they're slower otherwise.

There is one very small change here. You no longer ignore weather effects on the player's tile. This means casts that only move through the weather always have a cumulative transparency equal to the weather's transparency. This could be changed back at the expense of a little more complexity. You need to factor it in when calculating the tables and use the distance to calculate the real cumulative transparency when you move off the lookup, but it makes very little difference to gameplay.

Describe alternatives you've considered

I did this a long while back but I intended to apply it to 3d shadowcasting before pushing it, afaik 2d isn't really a performance problem for anyone. I've had a bit of a brainwave around shadowcasting though, at the least I'm going to write a 3d version that extends out from the current 2d version, but I might end up redoing the whole thing in a way that builds out from 0d. Either way it's a fairly big job and I'm gonna finish up the monster vision stuff first.

@joveeater
Copy link
Collaborator Author

Well that's annoying. Turns out GCC does not like the idea of checking if a template argument is a nullptr. I've just added pragmas to suppress it but there might be a nicer way to write it.

@@ -780,6 +815,31 @@ bool map::pl_line_of_sight( const tripoint &t, const int max_range ) const
map_cache.camera_cache[t.x][t.y] > 0.0f;
}

//This algorithm is highly inaccurate and only suitable for the low (<60) values use in shadowcasting
//A starting constant of 21 and 4 iterations matches 2d 60^2. 16 and 5 matches 3d 60^3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If speed is so important here, it may be better to use distance^2 in lookups and have a 90^2 wide cache.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this so obviously performance wise it's always a huge guess but I think this will slow things down due to memory latency. As much as the calculation is a large portion of the work of a single tile, it's a very small operation overall, even including the square roots most processors could calculate a good few table entries faster than they can look one up from main memory. We're relying on the L1 cache for the speedup, even being evicted to lower layers hurts a lot. 90^2 is often more than a full L1 by itself and it being sparse is a problem.

That said, I can't help but feel you're onto something. I haved racked my brain and you could do a 60x60 cache that's keyed by offset and avoid squaring as well but I think it's still too big. Whatever we use to get a key it needs to be that much faster than pythagoras to justify however much bigger the cache needs to be.

This is another guess but if you just want to get rid of the new distance algorithm it won't mean much, the difference is pretty minimal. Low enough that somehow it's slower when not using the lookup. The real reason it's here is because I hoped it would help more than it did so I tried it and even if it's only 1-2% it does help.

You can always go further with optimization but at some point we're going to hit the fact that shadowcasting is highly optimized for 1980's processors and a lot of that makes it bad for 2020 processors. All of the clever logic to avoid touching obscured tiles means more data dependencies and unpredictable branches. It's funny but even the first unoptimized non-recursive version of shadowcasting might run better these days.

I think that's a problem for another day though. I am going to do an ad-hoc 2d->3d extension fairly soon. I took a look at a 0d->nd version and making it efficient while handling some dimensions having floors or vehicle holes is gonna be a hot mess of templates. Instead I'm gonna add just a couple more template functions to 2d's ever growing list that let 3d keep track of what it's doing. I imagine at that point 3d's performance will be determined almost entirely by 2d's and hopefully it'll end up quite a bit faster.

@Coolthulhu Coolthulhu self-assigned this Sep 22, 2022
src/lightmap.cpp Outdated
last_intensity = calc( numerator, cumulative_transparency, dist );
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Waddress"
if( lookup != nullptr ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no better way to do it than giant pragma spam?

Maybe an inline function would work.

@joveeater joveeater force-pushed the shadowcast-optimization branch from 8727906 to 234b882 Compare September 23, 2022 22:28
@joveeater
Copy link
Collaborator Author

Good idea. That's done.

a = ( a + b ) / 2;
}

return a;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I spent a longer while plotting this approximation vs real sqrt and it looks like up to 72 tiles of distance, it overshoots by 1 exactly 3 times, assuming integer truncation, at distances that are squared to [80, 1088, 1224].
So the only potential problem is at points at coords like (8, 4) from player (8*8 + 4*4 = 64 + 16 = 80). If it looks OK at those 4 points, then it's good good enough. Will check those.

Not sure how much detail are we losing with the float purge, but it's probably not visible to the naked eye, except to particularly skilled texture enjoyers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the constants via brute forcing the full 60*60 range to find ones that match exactly. The truncation to an integer happens in the old algorithm. It assigns it to "const int dist" and the template argument is an int. The results should be identical. Actually that's not 100% true, there are two exp algorithms. An accurate one for sight and a fast one for light. The tables, which light will still use, use the accurate version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the points and didn't notice a difference. It's subtle enough that it shouldn't matter.

@Coolthulhu
Copy link
Member

Alright, it looks good. I couldn't find any bugs. Readability of the shadowcasting bits has been in shambles for a while now, but it's not worsened that much here.

Once the feature freeze is over, it can probably go in with just minimal re-testing ("game loads, vision doesn't look broken"). Unless something big changes in vision code or this PR.

@Coolthulhu Coolthulhu removed their assignment Sep 24, 2022
@Coolthulhu Coolthulhu assigned Coolthulhu and unassigned Coolthulhu Oct 5, 2022
@joveeater
Copy link
Collaborator Author

Another optimization added, this simplifies the calculation of delta.x at the beginning of each row a little and adds a similar calculation for the limit of the row so we can remove the end > trailingEdge condition as well. This means we don't need to calculate the edges of each tile, only the ones that cause splits.

@scarf005 scarf005 added src changes related to source code. tests changes related to tests labels Nov 3, 2022
@Coolthulhu Coolthulhu self-assigned this Nov 15, 2022
@Coolthulhu
Copy link
Member

Conflicts with current upload.

@Coolthulhu Coolthulhu removed their assignment Nov 15, 2022
@joveeater joveeater force-pushed the shadowcast-optimization branch from 7ca9b33 to 1d6db84 Compare November 16, 2022 09:23
@Lamandus
Copy link
Contributor

is this still in the making or even ready to go?

@joveeater
Copy link
Collaborator Author

Yeah this should be good to go.

@joveeater joveeater merged commit 2e2f35d into cataclysmbnteam:upload Apr 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
src changes related to source code. tests changes related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants