Skip to content

Commit

Permalink
Bug 1014518 - Document shared MozSurfaces in a separate design docume…
Browse files Browse the repository at this point in the history
…nt. r=sotaro
  • Loading branch information
nical committed Jun 6, 2014
1 parent 5e22a5f commit 8adbb2a
Show file tree
Hide file tree
Showing 2 changed files with 149 additions and 108 deletions.
110 changes: 2 additions & 108 deletions gfx/doc/MozSurface.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ Nicolas Silva

## Definitions

* Client and Host: In Gecko's compositing architecture, the client process is the producer, while the host process is the consumer side, where compositing takes place.

## Use cases

Drawing web content into a surface and share it with the compositor process to display it on the screen without copies.
Expand All @@ -31,32 +29,9 @@ Drawing web content into a surface and share it with the compositor process to d

## TextureClient and TextureHost

TextureClient and TextureHost are the closest abstractions we currently have to MozSurface.
Inline documentation about TextureClient and TextureHost can be found in:

* [gfx/layers/client/TextureClient.h](http://dxr.mozilla.org/mozilla-central/source/gfx/layers/client/TextureClient.h)
* [gfx/layers/composite/TextureHost.h](http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/TextureHost.h)

TextureClient is the client-side handle on a MozSurface, while TextureHost is the equivalent host-side representation. There can only be one TextureClient for a given TextureHost, and one TextureHost for a given TextureClient. Likewise, there can only be one shared object for a given TextureClient/TextureHost pair.

A MozSurface containing data that is shared between a client process and a host process exists in the following form:
TextureClient and TextureHost are the closest abstractions we currently have to MozSurface. The current plan is to evolve TextureClient into MozSurface. In its current state, TextureClient doesn't meet all the requirements and desisgn decisions of MozSurface yet.

```
.
Client process . Host process
.
________________ ______________ ______________
| | | | | |
| TextureClient +----+ <SharedData> +----+ TextureHost |
|________________| |______________| |______________|
.
.
.
Figure 1) A Surface as seen by the client and the host processes
```

The above figure is a logical representation, not a class diagram.
`<SharedData>` is a placeholder for whichever platform specific surface type we are sharing, for example a Gralloc buffer on Gonk or a D3D11 texture on Windows.
In particular, TextureClient/TextureHost are designed around cross-process sharing specifically. See the SharedMozSurface design document for more information about TextureClient and TextureHost.

## Locking semantics

Expand Down Expand Up @@ -109,67 +84,6 @@ MozSurface (TextureClient/TextureHost in its current form) defines ownership rul

These rules rely on the fact that the underlying shared data is strictly owned by the MozSurface. This means that keeping direct references to the shared data is illegal and unsafe.

## Deallocation protocol

The shared data is accessible by both the client-side and the host-side of the MozSurface. A deallocation protocol must be defined to handle which side deallocates the data, and to ensure that it doesn't cause any race condition.
The client side, which contains the web content's logic, always "decides" when a surface is needed or not. So the life time of a MozSurface is driven by the reference count of it's client-side handle (TextureClient).
When a TextureClient's reference count reaches zero, a "Remove" message is sent in order to let the host side that the shared data is not accessible on the client side and that it si safe for it to be deleted. The host side responds with a "Delete" message.


```
client side . host side
.
(A) Client: Send Remove -. .
\ .
\ . ... can receive and send ...
\
Can receive `--> (B) Host: Receive Remove
Can't send |
.-- (C) Host: Send Delete
/
/ . ... can't receive nor send ...
/ .
(D) Client: Receive Delete <--' .
.
Figure 2) MozSurface deallocation handshake
```

This handshake protocol is twofold:

* It defines where and when it is possible to deallocate the shared data without races
* It makes it impossible for asynchronous messages to race with the destruction of the MozSurface.

### Deallocating on the host side

In the common case, the shared data is deallocated asynchronously on the host side. In this case the deallocation takes place at the point (C) of figure 2.

### Deallocating on the client side

In some rare cases, for instance if the underlying implementation requires it, the shared data must be deallocated on the client side. In such cases, deallocation happens at the point (D) of figure 2.

In some exceptional cases, this needs to happen synchronously, meaning that the client-side thread will block until the Delete message is received. This is supported but it is terrible for performance, so it should be avoided as much as possible.
Currently this is needed when shutting down a hardware-decoded video stream with libstagefright on Gonk, because the libstagefright unfortunately assumes it has full ownership over the shared data (gralloc buffers) and crashes if there are still users of the buffers.

### Sharing state

The above deallocation protocol of a MozSurface applies to the common case that is when the surface is shared between two processes. A Surface can also be deallocated while it is not shared.

The sharing state of a MozSurface can be one of the following:

* (1) Uninitialized (it doesn't have any shared data)
* (2) Local (it isn't shared with the another thread/process)
* (3) Shared (the state you would expect it to be most of the time)
* (4) Invalid (when for some rare cases we needed to force the deallocation of the shared data before the destruction of the TextureClient object).

Surfaces can move from state N to state N+1 and be deallocated in any of these states. It could be possible to move from Shared to Local, but we currently don't have a use case for it.

The deallocation protocol above, applies to the Shared state (3).
In the other cases:

* (1) Unitilialized: There is nothing to do.
* (2) Local: The shared data is deallocated by the client side without need for a handshake, since it is not shared with other threads.
* (4) Invalid: There is nothing to do (deallocation has already happenned).

## Internal buffers / direct texturing

Some MozSurface implementations use CPU-side shared memory to share the texture data accross processes, and require a GPU texture upload when interfacing with a TextureSource. In this case we say that the surface has an internal buffer (because it is implicitly equivalent to double buffering where the shared data is the back buffer and the GPU side texture is the front buffer). We also say that it doesn't do "direct texturing" meaning that we don't draw directly into the GPU-side texture.
Expand All @@ -183,18 +97,6 @@ While direct texturing is usually the most efficient way, it is not always avail

## Alternative solutions

### Sending ownership back and forth between the client and host sides through message passing, intead of sharing.

The current design of MozSurface makes the surface accessible from both sides at the same time, forcing us to do Locking and have a hand shake around deallocating the shared data, while using pure message passing and making the surface accessible only from one side at a time would avoid these complications.

Using pure message passing was actually the first approach we tried when we created the first version of TextureClient and TextureHost. This strategy failed in several places, partly because of some legacy in Gecko's architecture, and partly because of some of optimizations we do to avoid copying surfaces.

We need a given surface to be accessible on both the client and host for the following reasons:

* Gecko can at any time require read access on the client side to a surface that is shared with the host process, for example to build a temporary layer manager and generate a screenshot. This is mostly a legacy problem.
* We do some copy-on-write optimizations on surfaces that are shared with the compositor in order to keep invalid regions as small as possible. Out tiling implementation is an example of that.
* Our buffer rotation code on scrollable non-tiled layers also requires a synchronization on the client side between the front and back buffers, while the front buffer is used on the host side.

## Backends

We have MozSurface implementaions (classes inheriting from TextureClient/TextureHost) for OpenGL, Software, D3D9, and D3D11 backends.
Expand All @@ -210,14 +112,6 @@ TODO - How can we make MozSurface more testable and what should we test?

## Future work

### Rename TextureClient/TextureHost

The current terminology is very confusing.

### Unify TextureClient and TextureHost

TextureClient and TextureHost should live under a common interface to better hide the IPC details. The base classe should only expose the non-ipc related methods such as Locking, access through a DrawTarget, access to a TextureSource.

### Using a MozSurface as a source for Drawing

MozSurface should be able to expose a borrowed Moz2D SourceSurface that is valid between Lock and Unlock similarly to how it exposes a DrawTarget.
Expand Down
147 changes: 147 additions & 0 deletions gfx/doc/SharedMozSurface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
Shared MozSurface {#mozsurface}
==========

**This document is work in progress. Some information may be missing or incomplete.**

Shared MozSurfaces represent an important use case of MozSurface, anything that is in the MozSurface design document also applies to shared MozSurfaces.

## Goals

We need to be able to safely and efficiently render web content into surfaces that may be shared accross processes.
MozSurface is a cross-process and backend-independent Surface API and not a stream API.

## Owner

Nicolas Silva

## Definitions

* Client and Host: In Gecko's compositing architecture, the client process is the producer, while the host process is the consumer side, where compositing takes place.

## Use cases

Drawing web content into a surface and share it with the compositor process to display it on the screen without copies.

## Requirement

Shared MozSurfaces represent an important use case of MozSurface, it has the same requirements as MozSurface.

## TextureClient and TextureHost

TextureClient and TextureHost are the closest abstractions we currently have to MozSurface.
Inline documentation about TextureClient and TextureHost can be found in:

* [gfx/layers/client/TextureClient.h](http://dxr.mozilla.org/mozilla-central/source/gfx/layers/client/TextureClient.h)
* [gfx/layers/composite/TextureHost.h](http://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/TextureHost.h)

TextureClient is the client-side handle on a MozSurface, while TextureHost is the equivalent host-side representation. There can only be one TextureClient for a given TextureHost, and one TextureHost for a given TextureClient. Likewise, there can only be one shared object for a given TextureClient/TextureHost pair.

A MozSurface containing data that is shared between a client process and a host process exists in the following form:

```
.
Client process . Host process
.
________________ ______________ ______________
| | | | | |
| TextureClient +----+ <SharedData> +----+ TextureHost |
|________________| |______________| |______________|
.
.
.
Figure 1) A Surface as seen by the client and the host processes
```

The above figure is a logical representation, not a class diagram.
`<SharedData>` is a placeholder for whichever platform specific surface type we are sharing, for example a Gralloc buffer on Gonk or a D3D11 texture on Windows.

## Deallocation protocol

The shared data is accessible by both the client-side and the host-side of the MozSurface. A deallocation protocol must be defined to handle which side deallocates the data, and to ensure that it doesn't cause any race condition.
The client side, which contains the web content's logic, always "decides" when a surface is needed or not. So the life time of a MozSurface is driven by the reference count of it's client-side handle (TextureClient).
When a TextureClient's reference count reaches zero, a "Remove" message is sent in order to let the host side that the shared data is not accessible on the client side and that it si safe for it to be deleted. The host side responds with a "Delete" message.


```
client side . host side
.
(A) Client: Send Remove -. .
\ .
\ . ... can receive and send ...
\
Can receive `--> (B) Host: Receive Remove
Can't send |
.-- (C) Host: Send Delete
/
/ . ... can't receive nor send ...
/ .
(D) Client: Receive Delete <--' .
.
Figure 2) MozSurface deallocation handshake
```

This handshake protocol is twofold:

* It defines where and when it is possible to deallocate the shared data without races
* It makes it impossible for asynchronous messages to race with the destruction of the MozSurface.

### Deallocating on the host side

In the common case, the shared data is deallocated asynchronously on the host side. In this case the deallocation takes place at the point (C) of figure 2.

### Deallocating on the client side

In some rare cases, for instance if the underlying implementation requires it, the shared data must be deallocated on the client side. In such cases, deallocation happens at the point (D) of figure 2.

In some exceptional cases, this needs to happen synchronously, meaning that the client-side thread will block until the Delete message is received. This is supported but it is terrible for performance, so it should be avoided as much as possible.
Currently this is needed when shutting down a hardware-decoded video stream with libstagefright on Gonk, because the libstagefright unfortunately assumes it has full ownership over the shared data (gralloc buffers) and crashes if there are still users of the buffers.

### Sharing state

The above deallocation protocol of a MozSurface applies to the common case that is when the surface is shared between two processes. A Surface can also be deallocated while it is not shared.

The sharing state of a MozSurface can be one of the following:

* (1) Uninitialized (it doesn't have any shared data)
* (2) Local (it isn't shared with the another thread/process)
* (3) Shared (the state you would expect it to be most of the time)
* (4) Invalid (when for some rare cases we needed to force the deallocation of the shared data before the destruction of the TextureClient object).

Surfaces can move from state N to state N+1 and be deallocated in any of these states. It could be possible to move from Shared to Local, but we currently don't have a use case for it.

The deallocation protocol above, applies to the Shared state (3).
In the other cases:

* (1) Unitilialized: There is nothing to do.
* (2) Local: The shared data is deallocated by the client side without need for a handshake, since it is not shared with other threads.
* (4) Invalid: There is nothing to do (deallocation has already happenned).

## Alternative solutions

### Sending ownership back and forth between the client and host sides through message passing, intead of sharing.

The current design of MozSurface makes the surface accessible from both sides at the same time, forcing us to do Locking and have a hand shake around deallocating the shared data, while using pure message passing and making the surface accessible only from one side at a time would avoid these complications.

Using pure message passing was actually the first approach we tried when we created the first version of TextureClient and TextureHost. This strategy failed in several places, partly because of some legacy in Gecko's architecture, and partly because of some of optimizations we do to avoid copying surfaces.

We need a given surface to be accessible on both the client and host for the following reasons:

* Gecko can at any time require read access on the client side to a surface that is shared with the host process, for example to build a temporary layer manager and generate a screenshot. This is mostly a legacy problem.
* We do some copy-on-write optimizations on surfaces that are shared with the compositor in order to keep invalid regions as small as possible. Out tiling implementation is an example of that.
* Our buffer rotation code on scrollable non-tiled layers also requires a synchronization on the client side between the front and back buffers, while the front buffer is used on the host side.

## Testing

TODO - How can we make shared MozSurfaces more testable and what should we test?

## Future work

### Rename TextureClient/TextureHost

The current terminology is very confusing.

### Unify TextureClient and TextureHost

TextureClient and TextureHost should live under a common interface to better hide the IPC details. The base classe should only expose the non-ipc related methods such as Locking, access through a DrawTarget, access to a TextureSource.

## Comparison with other APIs

0 comments on commit 8adbb2a

Please sign in to comment.