Clip stack support #581

kvark · 2016-11-21T17:33:04Z

Closes #498

Removes StackingContext::overflow in favour of DisplayItem::clip.
Generates clip tasks for the layers that the primitives may depend on when generating their masks, copies over the affected sub-regions instead of clearing (new cs_clip_copy shader).

Current status: base tests are passing, new logic test cases are pending

This change is

bors-servo · 2016-11-22T00:46:13Z

☔ The latest upstream changes (presumably #568) made this pull request unmergeable. Please resolve the merge conflicts.

glennw · 2016-11-22T02:05:20Z

@kvark Some bugs that may have some relevant test cases:

These may not be 100% relevant to the current task, and they may be caused by other unrelated bugs, but I think this work should be a step towards solving these issues.

kvark · 2016-11-22T04:50:31Z

Thanks @glennw ! I'll check out these specifically after making sure the basic tests work as expected. I figured the easiest way to test now is to modify the wr-sample, so I found a few spots with missing logic and working on it now.

glennw

I think I need to run this branch locally to get a better understanding of how it fits together - once it's rebased tomorrow I'll pull it locally and have a more detailed look over it :)

glennw · 2016-11-22T04:58:08Z

webrender/src/frame.rs

-            let scrollbar_rect = Rect::new(Point2D::zero(), Size2D::new(10.0, 70.0));
-            context.builder.add_solid_rectangle(&scrollbar_rect,
-                                                &ClipRegion::simple(&scrollbar_rect),
+            context.builder.add_solid_rectangle(&stacking_context.bounds,


Was the change from a constant rect here deliberate?

good catch! I shouldn't have changed this one

glennw · 2016-11-22T05:00:42Z

webrender/src/tiling.rs

@@ -1723,6 +1794,26 @@ impl ScreenTile {
                            alpha_task_stack.push(prev_task);
                        }
                    }
+
+                    // Create a task for the layer mask, if needed


Does this do renders for axis aligned regions, that can be handled without a mask?

The clip_cache_info would be None for stuff with just rectangle clips and no images/rounded borders. That's what "if needed" part implies here. I'll specify it in more detail.

kvark · 2016-11-22T16:27:24Z

I'm facing a problem here with cleared tiles. Now that I have clip regions assigned to stacking contexts, they don't draw anything (even their own background) to areas outside of the clip regions. So the pixels in otherwise affected tiles stay the clear color (black), and the pixels in tiles outside the clipping regions are filled up with white by Frame::clear_tiles. This is certainly not desirable, but it's not obvious to me how the semantics of the background color and/or clear tiles has to be adapted. @glennw ?

glennw · 2016-11-23T06:16:37Z

I didn't get a chance to take a detailed look at this today, sorry.

I did build it though and run a few tests - all the tests I've run so far do pass. Anecdotally it seems like sites are quite a bit slower with this patch than before - it could be worth testing a few sites locally with/without this patch and seeing if you see the same result.

I'm not sure I understand how the clip copies stuff works - doesn't it mean that you have the same texture bound for reading from at the same time as you are writing via FBO. If that's the case, although it often kind of works, it's undefined behaviour - but perhaps I just haven't read the code closely enough yet.

I'll take another look in the morning and we can discuss further.

glennw · 2016-11-23T06:27:16Z

Oh, disregard the above question about the FBO - I can see how it adds a dependent task now. But, like we discussed earlier, that means it's not available to the correct render pass, doesn't it? How does the current patch handle that case?

kvark · 2016-11-23T22:12:09Z

@glennw

Anecdotally it seems like sites are quite a bit slower with this patch than before

Weird, I don't expect the workload change with this PR, at least not until we start using the new API. One thing that it does differently is - pushing/popping a dummy context for the level zero's background rectangle. I'll try to measure the perf difference myself and see.

But, like we discussed earlier, that means it's not available to the correct render pass, doesn't it? How does the current patch handle that case?

The clip generation uses the task system API and specifies the dependencies correctly. I consider the extra work that may occur to be an implementation problem of the task system itself. Perhaps, we can relax the constraints of the tasks hashing a bit (i.e. allow tasks reusal from different passes under certain conditions) in order to optimize this and potentially other cases?

glennw · 2016-11-23T23:48:57Z

@kvark I've been thinking about a modification to this implementation a bit, which I've described below. Let's discuss over IRC / Vidyo tomorrow and see if it makes sense.

First, I need you to check my understanding of the current implementation, which might be wrong. I think that as it stands now, we will require the same number of render target passes as the depth of the worst case clip stack - is that right? If so, I think that could probably lead to some cases that require a lot of render target passes (which are esp. bad on mobile). I suspect it could also lead to us having an extra render target on a lot of pages (which is not ideal since each render target introduces an additional fixed overhead).

The other option we talked about briefly was having a shader that can loop over multiple clips and write them in one pass. This has some advantages, but also some disadvantages, as we've discussed.

But there's a third option - which sounds bad on the face of it - but hear me out. We render the clip stack for a primitive in a single render task, but multi-pass. So if a primitive has its own clip mask and also clip masks from two nested (rotated) stacking contexts, we would render them all into a single render task spot (so only one render target), but with multiple draw calls (that are batched where possible, of course).

So the downside of this is that the stacking context mask ends up getting drawn multiple times, instead of once in a different pass. But consider this (and disagree as you see fit):

Rendering a clip mask is (almost) as fast as the cs_copy shader anyway, so it's minimal performance difference in terms of the rendering of the masks.
We save extra render target allocations, clears and remove that fixed overhead, so it's a performance and memory win in that sense.
We can, at our choice, choose whether to render the mask as a special case shader that can render multiple clip masks in one pass, or do it multi-pass (which would be the default).
It fits in nicely to the existing task system :)

I think it's pretty important to minimize the number of render target switches, esp. on mobile / tiled GPU architecturues, and this seems to me like a compromise that gives us the ability to do the clips masks as either single / multi-pass.

If we were to do something like this, I'd imagine something like the following:

Primitives that have their own clip (e.g. rounded corners) use (PrimitiveIndex, TileId) as a key for the render task.
Primitives that don't need their own specific clip, but have clips from stacking contexts would use (StackingContextIndex, TileId) as a key for the task.
The render task data would include (equivalent to) a Vec<> of stacking contexts that they need to draw into this mask region.
Primitives without specific or stacking context clip get a normal dummy item and no render task.

Thoughts? Does that make any sense?

kvark · 2016-11-24T16:12:26Z

@glennw Thanks for thinking about it!

I think that as it stands now, we will require the same number of render target passes as the depth of the worst case clip stack - is that right?

I don't see it "requiring" this number of render targets. As I said in the previous comment, this is up to the task system implementation to decide. It may have N targets, or it may be bouncing between 2 targets, or even have a single target with texture barriers to allow sampling and rendering to it at the same time.

But there's a third option

I like the idea. I wish we could have it as an optional path when the clip stack is not too deep, but it sounds like it would be easier to just put it as the only option. I agree (yes, good points there!) with everything except these two points:

Rendering a clip mask is (almost) as fast as the cs_copy shader anyway, so it's minimal performance difference in terms of the rendering of the masks.

If a stacking context happens to accumulate N clips, we'll have to redo them for every child primitive in this 3rd approach, which is certainly incomparable with a simple cs_clip_copy call.

I think it's pretty important to minimize the number of render target switches, esp. on mobile / tiled GPU architectures

While true in general, we aren't talking about a hundred or even ten target switches that currently happen. It only adds up a few switches to what we already had, so I wouldn't worry about the RT switch cost in particular.

So, I like your 3rd approach because it closes the gap between the 1st (which is implemented) and the 2nd (uber-clip shader). There is also an optimization on the table that I mentioned - have non-blended clip instances that avoid issuing a cs_clip_clear prior to them. It could fix the performance of 90% of cases without modifying any major logic paths.

Here is what I propose. I'll try to get that non-blended clip version running and update this PR. The idea is that by the time you get in today we can merge this PR at last. And then I'll proceed to implementing your 3rd approach. The benefit of landing now comparing to just rewriting the code straight away to the 3rd is that we'll have more time to adopt the API changes in servo and maybe find some issues that are common between 1st and 3rd paths. How does this sound?

glennw · 2016-11-24T18:57:01Z

Sounds like a good plan, thanks! :)

glennw · 2016-11-24T23:30:44Z

@kvark I did some initial testing - the tests so far that I've run pass - but there seems to be a timing issue, which will expose those Servo intermittents again. Could you see if you can reproduce the results below locally? Hopefully I was just testing something incorrectly...

If you run:

time ./mach test-wpt --release --processes=8 --log-raw=wpt.log tests/wpt/mozilla/tests/css/

The results I get without this patch:

real	1m25.866s
user	8m48.928s
sys	0m58.704s

The results with this patch:

real	2m33.484s
user	17m6.348s
sys	0m57.728s

glennw · 2016-11-25T04:09:43Z

Reviewed 5 of 13 files at r1, 3 of 8 files at r2.
Review status: 8 of 14 files reviewed at latest revision, 5 unresolved discussions.

webrender/src/tiling.rs, line 158 at r2 (raw file):

                         device_pixel_ratio: f32) -> bool {
        let metadata = self.get_metadata(prim_index);

Why remove this?

webrender/src/tiling.rs, line 526 at r2 (raw file):

                        let prim_metadata = ctx.prim_store.get_metadata(prim_index);
                        let transform_kind = layer.xf_rect.as_ref().unwrap().kind;
                        let needs_clipping = prim_metadata.clip_cache_info.is_some() || true; //TODO

Ah, this will be causing the slowdown I'm seeing, I expect

webrender/src/tiling.rs, line 1042 at r2 (raw file):

    fn new_mask(actual_rect: DeviceIntRect,
                cache_info: &MaskCacheInfo,
                dependant: Option<&RenderTask>,

nit: spelling (I think)

Comments from Reviewable

glennw · 2016-11-25T04:10:46Z

Partial review only above - will do a more detailed look once we've got the test performance issue sorted.

kvark · 2016-11-25T15:25:45Z

Review status: 8 of 14 files reviewed at latest revision, 5 unresolved discussions.

webrender/src/tiling.rs, line 158 at r2 (raw file):

Previously, glennw (Glenn Watson) wrote…

> Why remove this?

No longer necessary as the culling is now moved [higher up](https://github.com//pull/581/files#diff-a465a0e8f2a0c6ff2d8a8efcf2696279R2583) to `assign_prims_to_screen_tiles`

webrender/src/tiling.rs, line 526 at r2 (raw file):

Previously, glennw (Glenn Watson) wrote…

> Ah, this will be causing the slowdown I'm seeing, I expect

Wow, that did slip by me. That was meant to be temporary. Thanks for spotting! I'll fix it and do a proper benchmark with rebasing on your non-clip rectangle patch.

webrender/src/tiling.rs, line 1042 at r2 (raw file):
Seems to be a difference between British and American English:

According to every British reference resource we checked, British English treats dependant as the noun and dependent as the adjective.

Comments from Reviewable

…gle. Renamed prim_store::PrimitiveClipSource to mask_cache::ClipSource. Creating dependent tasks for the layers. clip copy shader. Nested clipping and some text for the wr-sample.

…all. Fixed the needs_clipping flag to avoid blending where not necessary.

kvark · 2016-11-25T16:01:48Z

Time before my PR, before your rectangle no-clip fix, with laptop on battery:

real	6m18.830s
user	18m52.097s
sys	1m5.355s

Time before my PR, with your fix, on power:

real	3m16.849s
user	9m20.058s
sys	1m7.691s

Time with my PR, after fixing the needs_clipping bug, and without your fix, with laptop on battery:

real	5m58.585s
user	18m50.510s
sys	1m2.438s

Time with my PR, after the clip fix, with your fix, on power:

real	3m17.273s
user	9m12.677s
sys	1m8.231s

This PR is now updated, looks like all the issues are resolved. @glennw please have a look.

bors-servo · 2016-11-28T08:23:03Z

☔ The latest upstream changes (presumably #592) made this pull request unmergeable. Please resolve the merge conflicts.

Clip stack collapse path Based on #581 Implements the 3rd option explained in #581 (comment) as an optional code path, controlled by `CLIP_TASK_COLLAPSE`, enabled by default.  --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/595)

kvark · 2016-11-28T21:11:44Z

Shadowed by #595

glennw reviewed Nov 22, 2016

View reviewed changes

kvark changed the title ~~Clip stack support~~ [WIP] Clip stack support Nov 22, 2016

kvark force-pushed the clip_stack branch from a9983c3 to e8b9cde Compare November 22, 2016 22:27

kvark changed the title ~~[WIP] Clip stack support~~ Clip stack support Nov 22, 2016

kvark added 2 commits November 25, 2016 10:37

Added ClipRegion to StackingContext/Layer instead of the local rectan…

954f72a

…gle. Renamed prim_store::PrimitiveClipSource to mask_cache::ClipSource. Creating dependent tasks for the layers. clip copy shader. Nested clipping and some text for the wr-sample.

Extended clip batcher to combine the first clear+rect into a single c…

7f179b9

…all. Fixed the needs_clipping flag to avoid blending where not necessary.

kvark force-pushed the clip_stack branch from 1c4cc34 to 7f179b9 Compare November 25, 2016 15:39

kvark mentioned this pull request Nov 25, 2016

Clip stack collapse path #595

Merged

kvark closed this Nov 28, 2016

kvark deleted the clip_stack branch November 28, 2016 21:11

Clip stack support #581

Clip stack support #581

Uh oh!

Conversation

kvark commented Nov 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bors-servo commented Nov 22, 2016

Uh oh!

glennw commented Nov 22, 2016

Uh oh!

kvark commented Nov 22, 2016

Uh oh!

glennw left a comment

Choose a reason for hiding this comment

Uh oh!

glennw Nov 22, 2016

Choose a reason for hiding this comment

Uh oh!

kvark Nov 22, 2016

Choose a reason for hiding this comment

Uh oh!

glennw Nov 22, 2016

Choose a reason for hiding this comment

Uh oh!

kvark Nov 22, 2016

Choose a reason for hiding this comment

Uh oh!

kvark commented Nov 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glennw commented Nov 23, 2016

Uh oh!

glennw commented Nov 23, 2016

Uh oh!

kvark commented Nov 23, 2016

Uh oh!

glennw commented Nov 23, 2016

Uh oh!

kvark commented Nov 24, 2016

Uh oh!

glennw commented Nov 24, 2016

Uh oh!

glennw commented Nov 24, 2016

Uh oh!

glennw commented Nov 25, 2016

Uh oh!

glennw commented Nov 25, 2016

Uh oh!

kvark commented Nov 25, 2016

Uh oh!

kvark commented Nov 25, 2016

Uh oh!

bors-servo commented Nov 28, 2016

Uh oh!

kvark commented Nov 28, 2016

Uh oh!

Uh oh!

kvark commented Nov 21, 2016 •

edited

Loading

kvark commented Nov 22, 2016 •

edited

Loading