-
Notifications
You must be signed in to change notification settings - Fork 27
Symbols implementation tests
To demonstrate displaying symbols as text annotation using <ruby>, a WOFF/WOFF2 SVG colour web test font was created. Containing vector graphic SVG-based glyphs for each supported code point, the images are effectively infinitely scaleable without loss of image quality and without the need for more than one version of the image as is the case with bitmap/raster fonts. As a web font, the font can be accessed from the local machine but can alternatively be located on a web server, making it available to any device connected to the web eliminating the need for local installation.
The test font was created using software available for Apple Macintosh called Glyphs. The process is quite simple, whereby existing SVG images can be dragged and dropped into an SVG layer for a given code point thereby creating the SVG scaleable glyph for that code point. Proper positioning must be attended to. Other software exists that provides this functionality such as icomoon.io, fontastic.me, and fontello.com, however the popular open source application FontForge does not support SVG colour fonts at this time.
ARASAAC does not at this time make SVG versions of their symbols freely available on their website. All images they provide are PNG raster (colour bitmap) format. In order to demonstrate symbols with more complex and/or detailed graphics than Blissymbols such as ARASAAC, Jellow and Mulberry CC licensed SVG symbols were used.
Blissymbols are freely available in SVG format from the BCI website at https://blissymbolics.org/index.php/symbol-files-2024. Some examples were included in the test font.
The current Bliss Unicode draft proposal by Michael Everson is published at https://www.unicode.org/wg2/docs/n5228-blissymbols.pdf. This is not finalized so Michael requests that people do not start using the code points published there until it is. This discussion shifts the code points into the private use area (PUA) in order to comply with that request.
This section will document examples of making the mapping between Unicode code points corresponding to Bliss word parts (Bliss-characters) to Bliss-words (Blissymbols) for the particular concepts encoded by those Bliss-words.
The pros and cons of using (proposed) Blissymbol Unicode code points as the method of identifying symbols and mapping between AAC symbol sets is discussed. In order to demonstrate this, we will assume we are using these code points along with WOFF2 fonts for displaying symbols and provide some use cases to illustrate the problems that will arise.
The Bliss-word "drink"
is identified by the BCI-ID 13881. For the Unicode identifier we use the PUA code point of U+E20E. A font containing glyphs for all Bliss-characters would correctly display the Blissymbol for drink. In this case, the Blissymbol for drink is both a Bliss-character and a Bliss-word. An analogy would be the letter "a" in English which is also a meaningful word on its own.
An AAC symbol user who prefers ARASAAC symbols, for example, would have the same ruby annotation displayed as an ARASAAC symbol simply by using a font with the glyph for the Bliss character "drink" replaced by the ARASAAC symbol for "drink". So far, so good.
The Bliss-word "tea"
is identified by the BCI-ID 17511. It is composed of two Bliss-characters, "drink" followed by "leaf". Using U+E451 as the code point for leaf, the Unicode representation of "tea" is a two character string of the code points U+E20E and U+E451. In HTML it can be represented as "".
The ARASAAC symbol set — in common with all AAC symbol sets other than Bliss — has a single pictographic symbol for tea. As an aside, there are often multiple alternate symbols for the same concept in non-Bliss AAC symbol sets but a particular one would normally be selected for a particular user. So in an ARASAAC font, how do we map a single image to a sequence of two or more code points?
A mechanism exists in Unicode for character composition whereby a base character followed by one or more combining characters is equivalent to and may be replaced by what is called a precomposed character. This is typically used for diacritical marks, superscripts, subscripts, etc. At first glance this may seem a plausible solution but it would require almost every Bliss-character to be used as a combining character as well as non-combining and is not how the Bliss encoding is designed and will not work without extensive modifications to the current proposed encoding. In addition, pre-composed characters would have to be added to fonts with every new addition to the Bliss vocabulary...in short, this is not how this Unicode mechanism is intended to be used.
Another Unicode mechanism exists that is used in the implementation of Emoji combinations is zero width joiner (ZWJ) sequence ligatures. This approach could conceivably work for AAC symbol annotation. By inserting a ZWJ code point between elements in a sequence, the rendering engine is instructed to consider the code points in the sequence as a group and display an image that it retrieves from a ligature lookup table. Usually a ligature is a glyph that occurs in some languages when a certain sequence of characters occurs that when written together become joined or overlapped in some way. Examples in Latin scripted languages are character combinations such as OE becoming Œ in Old English or IJ becoming IJ in Dutch. But there is no reason this cannot be used to convert Bliss-words composed of multiple characters to ARASAAC or other AAC symbol sets with a single image for the concept being represented and this is, in fact, how custom emojis are represented. In this use case example, the Bliss-character sequence of "drink"+ZWJ+"leaf" would be replaced with a ligature glyph associated with that sequence in the particular font being used. In an ARASAAC font for example, the sequence would be replaced with the image of a cup of tea with a tea bag in it.
There are about 1,200 Bliss-characters and currently about 6,400 Blissymbols. Approximately 5,200 Blissymbols (Bliss-words) are composed of sequences of two or more Bliss-characters that would need to have ligatures provided for them for the full current vocabulary to be covered in a non-Bliss AAC annotation font. Just over 600 characters serve as initial characters in multi-character symbols giving an average of about 8.5 ligatures per base character. About 2,000 symbols are composed of three characters, 2,000 have four characters and 900 have five characters. With WAI-Adapt targeting core annotation i.e. not expecting 100% coverage, this could be reduced to a core vocabulary but if there are no technical limitations then the entire vocabulary could be implemented.
In Blissymbolics there is a Bliss-word for "chocolate drink"
with BCI-ID 20772 that is made up of characters "drink" + "bean" + "up". Using HTML hexadecimal notation this would be written as "‍‍" where x200D is the hex value of the ZWJ non-spacing code point.
Why is this a more troublesome case? The reason is not that it is more difficult to create a ligature with a single glyph to be displayed in place of the string of glyphs. The reason is to do with the nature of Bliss as a living language. The symbol for "chocolate drink" illustrates a relatively common occurrence with Bliss that may cause problems in the future. The problem is that Bliss spellings can change over time. In this case, "chocolate" was formerly spelled completely differently as "powder" + "brown" which actually resolves to "powder" + "colour" + "ground" and thus "chocolate drink" was
with hex string "‍&#E3D7;‍&#E29F;&#E4A4;". The language undergoes constant review and revision by users, teachers, and other participants in its development. For backward compatibility the previous spellings are retained and marked with _OLD to indicate that they are deprecated.
However, the implications of this for non-Bliss symbol set developers is that whenever a Bliss spelling changes, they may have to modify their font to support the new spelling in addition to the deprecated one. Also, due to the nature of Bliss, there would be a cascading effect where every other symbol that includes "chocolate" such as "chocolate sauce", "chocolate spread", etc, would also have to be modified.
Having the identifier for each concept independent of a particular representation of that concept, so that no symbol set is dependent on — or, in software design terminology "coupled to" — any other symbol set would be preferrable.
The BCI Authorized Vocabulary currently contains about 6,400 entries. It is being expanded all the time but coming up with new Blissymbols is a time consuming process. The quality of the language depends on consistent, well thought out strategies for representing increasingly complex concepts. ARASAAC has over 13,000. Other symbol sets likely have more. At first glance you might think that one of these sets would contain all Blissymbol concepts with many others as well but this is not the case. Bliss tends to have a single symbol for a concept — with the exception of deprecated symbols — whereas most other, purely pictographic symbol sets often have many alternative images for the same concept with different alternatives often representing the idea in a way that is more relateable to for a particular user. This is much along the lines of alternative Emoji of humans with different skin tone. Accurate counts of how much duplication within a symbol set and how much overlap between symbol sets are not available but it can be safely assumed that there are significant percentages of both.
So...what happens when an ARASAAC symbol is required for a user and a corresponding Blissymbol does not exist? What gets coded into the HTML? How long will the ARASAAC user have to wait for the corresponding Blissymbol to be created so there is a spelling to be entered in the ruby annotation so the ARASAAC providers can provide a ligature for it in their font? That could be a very long wait. The people developing Blissymbols naturally respond to the Blissymbol user community with priority. This is the ultimate example of how coupling to the Blissymbolics Unicode encoding would create barriers to other symbol set providers. They would essentially be blocked from providing a particular symbol if an equivalent did not exist in the BCI AV. Browser vendors could provide a workaround but with accessibility already a low priority and the tiny subset of AAC users being an even lower priority it is in the best interests of those users to design a standard from the outset without such barriers to overcome.
The preceding discussion has progressed from the simplest cases to more difficult ones. However, it may be that the vocabulary required for a WAI Adapt Symbols implementation is restricted to the simplest cases and thus the issues raised above are for the most part not going to come into play. At the very least, it is safe to say that a very useful core vocabulary could be defined and implemented using the Bliss Unicode encoding, WOFF/WOFF2 fonts, and .
A quick comparison of the set of proposed Unicode Bliss characters with Ogden's Basic English shows an exact match of 372 Bliss characters. Not a bad start although many ligature glyphs, i.e. multi-character strings, will still be required for many symbols.
At face value it seems attractive to use the proposed Blissymbol Unicode encoding as a registry reference for AAC symbol annotation. There is a definite positive in that the registry would effectively be moved into Unicode and thus would not have to be maintained by W3C.
The other advantage of using the proposed Bliss Unicode encoding along with a Ruby-like mechanism and fonts for display is that most of the work is already done. SVG fonts with ligatures would have to be created by AAC symbol set vendors but for the browser AAC annotation display mechanism, very minimal work by browser vendors would be required.
The main negative for this approach is the coupling of all other AAC symbol sets to Blissymbols. But with the goal being support of just a core vocabulary, this becomes less of a barrier.
An alternative approach that would still rely on the Unicode-font-Ruby implementation approach, would be to actually use the Unicode Private Use Area (PUA). This would mean that every symbol would be assigned one — and only one — code point. This lexicon based approach entirely removes the problems discussed in the previous sections. The registry would provide the mapping of concepts to code points and would then immediately support all 6400+ Blissymbols. In fact, because the BCI-IDs are based on a previous ISO/IEC 2022 standard (ISO-IR 169), it would simply be a matter of agreeing on an offset into the PUA and the work is done. In addition, a large number of symbol sets have already been mapped to Bliss and each other through the work of GlobalSymbols.com. This would accomplish the goal with a minimum of effort. The W3C registry would need to be maintained but this would also be naturally mirrored and collaborated on by BCI and Global Symbols as well as others. This also removes the requirement for a Blissymbol to actually be created before another symbol set can use the concept because PUA code points can be assigned for a concept before any implementation of it exists, unlike the situation when using the proposed Bliss Unicode spellings for a concept.
The demonstration HTML using ruby elements to annotate text with symbols implemented as a web font works well. There could be a practical issue with layout when using symbols for annotation of text that was not anticipated to be annotated when the original layout was done. This could possibly also be an issue for ruby annotation of Japanese text for pronunciation but this use of ruby is more likely to be as designed as opposed to after-the-fact as will be the case for symbol annotation. In addition, the size of the symbols will, instead of being smaller than the base text which is usually the case for ruby, will usually be required to be larger than the base text thereby creating a higher probability of layout and formatting issues.
If ruby is not typically used in a language then there are no potential collision issues of using ruby both as originally intended and as displaying symbol annotation as suggested.
In languages that do use ruby — usually for pronunciation annotation — there is the potential for conflict with simultaneously using ruby for symbol annotation. However, ruby has been designed to support multiple annotations so there may very well not be a conflict. Further testing is required but from rudimentary inital experimentation, reasonably satisfactory results were obtained. Multiple annotations are definitely possible, the only questions would be regarding what is acceptable layout and how would an algorithm achieve that when annotating with symbols.
This is needed so that users who are not using symbols do not see such content.