-
Notifications
You must be signed in to change notification settings - Fork 290
Clip stack support #581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clip stack support #581
Conversation
☔ The latest upstream changes (presumably #568) made this pull request unmergeable. Please resolve the merge conflicts. |
@kvark Some bugs that may have some relevant test cases:
These may not be 100% relevant to the current task, and they may be caused by other unrelated bugs, but I think this work should be a step towards solving these issues. |
Thanks @glennw ! I'll check out these specifically after making sure the basic tests work as expected. I figured the easiest way to test now is to modify the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to run this branch locally to get a better understanding of how it fits together - once it's rebased tomorrow I'll pull it locally and have a more detailed look over it :)
let scrollbar_rect = Rect::new(Point2D::zero(), Size2D::new(10.0, 70.0)); | ||
context.builder.add_solid_rectangle(&scrollbar_rect, | ||
&ClipRegion::simple(&scrollbar_rect), | ||
context.builder.add_solid_rectangle(&stacking_context.bounds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the change from a constant rect here deliberate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! I shouldn't have changed this one
@@ -1723,6 +1794,26 @@ impl ScreenTile { | |||
alpha_task_stack.push(prev_task); | |||
} | |||
} | |||
|
|||
// Create a task for the layer mask, if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this do renders for axis aligned regions, that can be handled without a mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clip_cache_info
would be None
for stuff with just rectangle clips and no images/rounded borders. That's what "if needed" part implies here. I'll specify it in more detail.
I'm facing a problem here with cleared tiles. Now that I have clip regions assigned to stacking contexts, they don't draw anything (even their own background) to areas outside of the clip regions. So the pixels in otherwise affected tiles stay the clear color (black), and the pixels in tiles outside the clipping regions are filled up with white by |
I didn't get a chance to take a detailed look at this today, sorry. I did build it though and run a few tests - all the tests I've run so far do pass. Anecdotally it seems like sites are quite a bit slower with this patch than before - it could be worth testing a few sites locally with/without this patch and seeing if you see the same result. I'm not sure I understand how the clip copies stuff works - doesn't it mean that you have the same texture bound for reading from at the same time as you are writing via FBO. If that's the case, although it often kind of works, it's undefined behaviour - but perhaps I just haven't read the code closely enough yet. I'll take another look in the morning and we can discuss further. |
Oh, disregard the above question about the FBO - I can see how it adds a dependent task now. But, like we discussed earlier, that means it's not available to the correct render pass, doesn't it? How does the current patch handle that case? |
Weird, I don't expect the workload change with this PR, at least not until we start using the new API. One thing that it does differently is - pushing/popping a dummy context for the level zero's background rectangle. I'll try to measure the perf difference myself and see.
The clip generation uses the task system API and specifies the dependencies correctly. I consider the extra work that may occur to be an implementation problem of the task system itself. Perhaps, we can relax the constraints of the tasks hashing a bit (i.e. allow tasks reusal from different passes under certain conditions) in order to optimize this and potentially other cases? |
@kvark I've been thinking about a modification to this implementation a bit, which I've described below. Let's discuss over IRC / Vidyo tomorrow and see if it makes sense. First, I need you to check my understanding of the current implementation, which might be wrong. I think that as it stands now, we will require the same number of render target passes as the depth of the worst case clip stack - is that right? If so, I think that could probably lead to some cases that require a lot of render target passes (which are esp. bad on mobile). I suspect it could also lead to us having an extra render target on a lot of pages (which is not ideal since each render target introduces an additional fixed overhead). The other option we talked about briefly was having a shader that can loop over multiple clips and write them in one pass. This has some advantages, but also some disadvantages, as we've discussed. But there's a third option - which sounds bad on the face of it - but hear me out. We render the clip stack for a primitive in a single render task, but multi-pass. So if a primitive has its own clip mask and also clip masks from two nested (rotated) stacking contexts, we would render them all into a single render task spot (so only one render target), but with multiple draw calls (that are batched where possible, of course). So the downside of this is that the stacking context mask ends up getting drawn multiple times, instead of once in a different pass. But consider this (and disagree as you see fit):
I think it's pretty important to minimize the number of render target switches, esp. on mobile / tiled GPU architecturues, and this seems to me like a compromise that gives us the ability to do the clips masks as either single / multi-pass. If we were to do something like this, I'd imagine something like the following:
Thoughts? Does that make any sense? |
@glennw Thanks for thinking about it!
I don't see it "requiring" this number of render targets. As I said in the previous comment, this is up to the task system implementation to decide. It may have N targets, or it may be bouncing between 2 targets, or even have a single target with texture barriers to allow sampling and rendering to it at the same time.
I like the idea. I wish we could have it as an optional path when the clip stack is not too deep, but it sounds like it would be easier to just put it as the only option. I agree (yes, good points there!) with everything except these two points:
If a stacking context happens to accumulate N clips, we'll have to redo them for every child primitive in this 3rd approach, which is certainly incomparable with a simple
While true in general, we aren't talking about a hundred or even ten target switches that currently happen. It only adds up a few switches to what we already had, so I wouldn't worry about the RT switch cost in particular. So, I like your 3rd approach because it closes the gap between the 1st (which is implemented) and the 2nd (uber-clip shader). There is also an optimization on the table that I mentioned - have non-blended clip instances that avoid issuing a Here is what I propose. I'll try to get that non-blended clip version running and update this PR. The idea is that by the time you get in today we can merge this PR at last. And then I'll proceed to implementing your 3rd approach. The benefit of landing now comparing to just rewriting the code straight away to the 3rd is that we'll have more time to adopt the API changes in servo and maybe find some issues that are common between 1st and 3rd paths. How does this sound? |
Sounds like a good plan, thanks! :) |
@kvark I did some initial testing - the tests so far that I've run pass - but there seems to be a timing issue, which will expose those Servo intermittents again. Could you see if you can reproduce the results below locally? Hopefully I was just testing something incorrectly... If you run:
The results I get without this patch:
The results with this patch:
|
Reviewed 5 of 13 files at r1, 3 of 8 files at r2. webrender/src/tiling.rs, line 158 at r2 (raw file):
Why remove this? webrender/src/tiling.rs, line 526 at r2 (raw file):
Ah, this will be causing the slowdown I'm seeing, I expect webrender/src/tiling.rs, line 1042 at r2 (raw file):
nit: spelling (I think) Comments from Reviewable |
Partial review only above - will do a more detailed look once we've got the test performance issue sorted. |
Review status: 8 of 14 files reviewed at latest revision, 5 unresolved discussions. webrender/src/tiling.rs, line 158 at r2 (raw file): Previously, glennw (Glenn Watson) wrote…> Why remove this?webrender/src/tiling.rs, line 526 at r2 (raw file): Previously, glennw (Glenn Watson) wrote…> Ah, this will be causing the slowdown I'm seeing, I expectwebrender/src/tiling.rs, line 1042 at r2 (raw file):
Comments from Reviewable |
…gle. Renamed prim_store::PrimitiveClipSource to mask_cache::ClipSource. Creating dependent tasks for the layers. clip copy shader. Nested clipping and some text for the wr-sample.
…all. Fixed the needs_clipping flag to avoid blending where not necessary.
Time before my PR, before your rectangle no-clip fix, with laptop on battery:
Time before my PR, with your fix, on power:
Time with my PR, after fixing the
Time with my PR, after the clip fix, with your fix, on power:
This PR is now updated, looks like all the issues are resolved. @glennw please have a look. |
☔ The latest upstream changes (presumably #592) made this pull request unmergeable. Please resolve the merge conflicts. |
Clip stack collapse path Based on #581 Implements the 3rd option explained in #581 (comment) as an optional code path, controlled by `CLIP_TASK_COLLAPSE`, enabled by default. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/595) <!-- Reviewable:end -->
Shadowed by #595 |
Closes #498
Removes
StackingContext::overflow
in favour ofDisplayItem::clip
.Generates clip tasks for the layers that the primitives may depend on when generating their masks, copies over the affected sub-regions instead of clearing (new
cs_clip_copy
shader).Current status: base tests are passing, new logic test cases are pending
This change is