Description
WebDriver is currently unable to simulate the action of an IME in user input.
These are widely used, particuarly when inputting scripts where there are far more available characters than keys on a keyboard. That means it's impossible to use WebDriver to adequetely test how web applications behave when inputting these scripts. The behaviour in the face of IME input is also an interopability problem for web browsers, and fixing this is seen as a high priority area for the web.
Conceptually an IME sits between the physical input layer and the application. Typically the IME is activated with some device input e.g. pressing a key on the keyboard. Once triggered the IME generates a candidate composed string that may be updated based on further input, and is at some point committed. During this time, the composed string is typically displayed in application, but styled in a way that makes it distinct from the final input. There is also typically IME-specific UI to suggest different possible completions, but this is quite platform-specific and will be considered out of scope.
In WebDriver the low-level input handling is done through the actions API. This models user input as a set of virtual input devices, which each have an internal state. At each point in time ("tick") an input device can either do nothing ("pause"), or can have an associated action that updates its internal state and causes the relevant events to be emitted to content (e.g. a keyDown
action on a key
input device will update the internal WebDriver state to signify that the key is depressed, and emit a keydown
event to content).
For a given input on a given device the IME can do nothing (i.e. just let the event pass through) or can intercept the event, update its internal state, and cause different events to be emitted instead. For example, consider pressing the "a" key. In the absence of an IME this will cause a keydown
event with keyCode
65
, a keypress
event, possibly various input
events, and finally a keyup
event also with keyCode
65
. However if the IME is activated, we get a keydown
event with keycode
229
, a compositionstart
event, a compositionupdate
event with data
corresponding to the current IME input selection, input
events, and finally a keyup
event with keycode
65
. Note in this example that the content never sees a keydown
event with keycode
65: the fact that the IME intercepted the event changes the key events visible on the page.
Later an input (or something not visible to the web page) may cause the composition to be committed, which corresponds to a compositionend
event.
IMEs can apply to non key input e.g. handwriting recognition is a form of IME that depends on pointer input. It may also depend on multiple kinds of input
In terms of the implementation inside the WebDriver spec, the obvious thing would be to add IME as a new kind of input source for actions. However, the fact that it's a layer between the "physical" input devices and the application makes this more complex; to handle cases like "key is pressed and intercepted by IME, other events happen, key is released" we need to a) specify which other inputs in a given tick are being intercepted by the IME and b) Handle the IME-generated state changes after all other inputs (maybe even right at the end of the tick: for something like pointerMove
which can be spread out into multiple events over time it's not clear how things should work).
So a possible proposal is as follows:
We add a new input source type ime
. That has internal state which is the current composition string.
The ime
input source has two assocaited actions: compositionUpdate
and compositionEnd
.
compositionUpdate
is the main action for updating the composed string. It has the following properties:
data
- A string containing the updated value of the composition string. If this is null (or the empty string?) we end the composition.clauses
(optional) - These represent sub-parts of the composition string. Each clause has alength
and atype
. The lengths must add up to the total length ofdata
. Suggested value oftype
are “caret
”, “rawInput
”, “converted
”, “notConverted
”, “targetConverted
” (TODO: clarify the semantics of these). In addition formatting hints may be specified accorind to how the IME would like the range to be handled. These areunderlineColor
,underlineStyle
,backgroundColor
,textColor
. If this is ommitted it's assumed that there's a single clause (TODO: details)handles
(optional) - The input source id for the input source that caused this change in the IME state. If this is provided the internal state of the referenced input source is updated, but the DOM events emitted are those appropriate to the IME instead (e.g. for a keyboard thekeyCode
property becomes229
). If this property is omitted the update to the IME state is not connected to any application-visible input source change (this corresponds to the situation where e.g. the user clicks on a composition string option in a window outside their browser window).
A compositionEnd
action causes the composed string to be emitted has the following properties:
data
(optional) - The final composed string to insert. If omitted this is given by thedata
property of the previouscompositionupdate
action.- handles
(optional) - As for
compositionupdate`, if committing the composition happens in response to a content-visible input action, this is a reference to the device id for that action.
An example of what it looks like on the wire when we press "a" on the keyboard, it generates a composed string "abc", it gets updated to "ABC" by something outside the browser, and it's committed with the space key:
{"actions": [
{"type": "key",
"id": "keyboard-1",
"actions": [
{"type": "keyDown",
"value": "a"},
{"type": "keyUp",
"value": "a"},
{"type": "pause"},
{"type": "keyDown",
"value": " "},
{"type": "keyUp",
"value": " "},
]
} ,
{"type": "ime",
"id": "ime-1",
"actions": [
{"type": "compositionUpdate",
"handles": "keyboard-1",
"data": "abc",
},
{"type": "pause"},
{"type": "compositionUpdate"
"data": "ABC"},
{"type": "compositionEnd",
"handles": "keyboard-1"},
{"type": "pause"},
]
}
]
}