Special Handling of Directions and their Aliases #24
Replies: 8 comments 2 replies
-
Thanks, @tajmone, for a very detailed description. I know you often think in broad and general terms, but I wish you had started with a few very concrete examples of the problem ;-) It took me some time to examine and understand the tables, but some guesswork and induction later I think I have gotten to the core of your issue, so let me know if I'm completely of track, right? Command ParserThe Alan command parser is not a complete natural language parser so there are of course some shortcuts taken that prevents some constructs to be parsed correctly or even constructed. Some combinations of word usages being allowed, some not, is one such shortcut. Word Classes and ClashesSo I'm guessing therse are the things that you are referring to that prevents you from doing what you wanted (again, would really like concrete examples):
Some word types are possible to combine, some are not. A quick look in the compiler sources show these are the ones it prohibits (this should be a valuable addition to the manual, too):
Other combinations are (presumably) allowed. E,g, a quick check shows that a directional word works perfectly as an adjective. So again, I would appreciate some more concrete examples of what does not work they way you want, or is not possible. There are also errors for redefinition of identifiers in the Alan game source. But I think those are not a part of the problem here as they are "author symbols", not "player words". Note that there is a small overlap here though, since directions automatically define themselves as directional player words. (Actually I don't think they really need to be "author symbols" since the author cannot use them for anything, Or am I forgetting something here...) DIrectional Command PrecedenceThe core of the parsing algorithm is
If you think directional commands should have precendence it would (probably) be a small change. I have not tried it yet so I don't know the impact of such a change w.r.t. game play compatibility. I could easily do that with the games that are in the test suite, but first I'd like to see the reasoning behind your preference. RispostaIs this a problem with synonyms only? Assuming you could not use any shorthands, synonyms, at all (and all players were content with typing full words all the time), would there still be a problem? I think this is an important question to answer, because if not, then we can focus on the problem with not being able to define synonyms in a good way, and not be derailed by the red herring of directions ;-) |
Beta Was this translation helpful? Give feedback.
-
Sorry for that, it was because I wasn't quite sure of the underlying mechanisms, on the one hand, and the correct terminology to adopt on the other. A practical example would be trying to define the Italian directions as in the above table (first one, first column), and then creating synonyms (as in 2nd column) — here is how ALAN Italian defines directions and their synonyms:
and this is the definition of the Italian
If I were to follow the directions shorthands proposed in the table, it wouldn't compile due to the reported conflicts (last column).
Well, various types:
If we can't define a direction shorthand via
Well the problem would be that players won't be willing to type the full words all the time, because directions are quite verbose in Italian, and playing would become quite tiresome. Also, IF players have strong expectations that all adventures should share very similar commands conventions, regardless of the system they were created with, so chances are that they will automatically keep using the shorthands they are accustomed to (over and over again).
Honestly, I've never fully understood the logic behind why ALAN sometime let you use same words in some constructs while not in others (didn't find much explanations in the Manual for this). Another practical example... The preposition from in Italian can take many forms, depending on number and gender, but to simplify
but in the above definition I had to omit one specific form, "dai", because it would clash the common
which, in practical terms, means that every time authors define a
Again, all this is because if I defined preposition "dai" as a synonym of "da" (as it should be) then I wouldn't be able to implement the verb "dai" (give) because it wouldn't compile. So, the problem with synonyms not allowing other types of same-named IDs or words spans across the whole system, is not specific to directions only. But for directions I simply couldn't find a solution, whereas for prepositions it's possible to work around them by burdening authors with the need of manually writing syntax alternatives in each verb (not elegant, but not a tragedy either). I hope this might help clarify the problem to you — since it's not fully clear to me I struggle to define it well (it's like a black-box to me, right now). |
Beta Was this translation helpful? Give feedback.
-
Thank you, Tristano! Those are good examples and shows that my feeling of this being primarily a synonyms problem was (probably) right.
This was not meant as a suggestion, but as a way to get to the core of the problem by stating a hypothetical situation. I certainly know that not using shorthands is not viable. I was hoping that you could imagine that it was possible... But the rest of what you have answered have given me the information I hoped for anyway... There are a couple of nuances here that cloud the vision, but I think we need to focus first on the problem that you cannot define words that are synonyms and another type of word. I don't know how versed you are in parsing techniques, but the simple answer to why synonyms becomes a problem, is, as it is expressed in the manual, "synonyms are always interchangable". The interpreter knows about synonyms, and if one is encountered in the player input by the interpreter, it always replaces it with the "original" word. In short, this is done already in the "scanning" phase in the interpreter. Then the actual parsing of the command starts. By re-imagining the handling of synonyms it might be possible to at least alleviate the problem. I think there are alternatives to the current approach were synonyms are propagated to the interpreter. E.g. it might be possible to make synonyms strictly a concern for the compiler, not propagating synonyms to the interpreter. This is just an early idea, so no commitments yet. If this is possible it would mean that the compiler just creates extra words of the correct types (which would in principle be the same as expanding the original word with all its synonyms in the syntax, exit or where ever the definition was made). I'll try to construct an example using the da/dai problematic example, so that we have some concrete to discuss, and see where that leads us. I'll get back to you. |
Beta Was this translation helpful? Give feedback.
-
I've created a gist that tries to mimic an Italian game with the example of "dai" not being possible to use as a synonym for "da". My knowledge of Italian grammar is not good enough to figure out an example noun that would warrant the use of "dai" in the Once we have that I can experiment with various solutions to the problem that it is not allowed to add "dai" to the synonyms for "da". (For my curiosity, are most of those synonyms also contractions? I was thinking that e.g. "dalla" would be a contraction for "da la" when the noun is femnine, because you don't use the definitive article in that context, right? If so, would "dai" be used when the noun is plural? Kind of guessing here, but interested since I have a long term plan to learn Italian...) |
Beta Was this translation helpful? Give feedback.
-
I have a general understanding of parsing techniques, but fail to understand the details of everything that deal with manipulating the generated AST (still studying the topic, and finding it hard).
I was aware of that, but I still don't know the details about the way that ALAN stores the different types of words (objects names, verbs IDs, What I fail to understand is why some duplicate words are allowed, while other types are not. Also, most IF parsers allow directions with same names as other objects types, so I believe that usually directions are handled as a special case of one-word player inputs in most IF systems.
This sounds like a significant bloat in the generated story files. Keep in mind how many prepositions there are in languages like Italian, which have variations based on gender, noun and number (GNA). I originally thought that the interpreter manages all these synonyms as independent lists/maps, one for each different word category/type, and that the parser would handle alternative interpretation by some disambiguating mechanism, with some scores concerning the likelihood of each input being the pertinent one. Also, I was wondering if The Inform Designer’s Manual (aka DM4) might contain some clues in this respect, for it usually contains very detailed accounts of how the various parts of IF systems have to deal with the challenges of the variety of different languages — and usually does so with very practical examples. It's thank to DM4 that I learned how the GNA system works, which guided me in laying the bases of the ALAN Italian module. So, it might be worth looking into it for inspirations, especially regarding the potential pitfalls of some languages which we don't know but which end users might wish to create a new i18n ALAN library in the future. |
Beta Was this translation helpful? Give feedback.
-
Basically the interpreter dictionary structure for "real words" combines information for multiple types in one structure which also indicates which types of word it can be. This e.g. allows a noun and an adjective to occupy "the same data space". Obviously we also need to store both the strings otherwise we can't recognize them in the player input. As I mentioned above synonyms are a completely different breed of words. It is not a "word class" but simple-minded substitutions that are carried out before command parsing begins. And since the synonyms replacement is completely separate phase with no parsing information available, words that are synonyms cannot be any other type of word because they would always be substituted for its "original" word. So the compiler currently prohibits this. This is why I currently think that getting rid of synonyms substitution in the interpreter would be a good idea.
It would not. Remember that the bulk of the information in a game is the texts, including the strings in the dictionary. The number, and size, of such strings would actually be exactly the same in both cases since the same strings would have to be stored. Also in some cases having a synonym expand into two words would avoid a "syntax synonym" which is more expensive. And we are talking about a few bytes per word here, so even if we did just add the dictionary information for them (disregarding the strings) it would add a few kB in a game that is probably at least 1MB (Wyldkynd) in size. I'm not sure how big the current synonym table that is a part of the interpreter data is, put that would also not be needed anymore.
That's a good pointer. I have "always" wanted to read the DM, but never got around to it. Maybe it is time. Although I don't want to inflate this issue to "let's make Alans command parsing fit any language". If there is a problem that can be fixed with reasonable effort, I'm all for it. If we don't see an example of a potential problem in the wild (yet), let's leave it for now. That's why I'm more interested in the concrete cases. |
Beta Was this translation helpful? Give feedback.
-
Here are some additions that can fit into the original example:
It's worth noting here that "dei tubi" (some pipes, when described with indefinite article, e.g. "You can see some pipes") also conflicts with "dei" as in belonging to, e.g. "Il libro dei maestri" (the teachers' book). In this case "dei" is one of the Synonyms of the
They indeed are, but in modern Italian you can't use their uncontracted form anymore, unless you're referring to a name that starts with a definite article — e.g. "La Stampa" (a famous Italian newspaper), you don't say "della La Stampa" but "de La Stampa", which sounds very odd and archaic, in fact many people today would just say/write "della Stampa" instead (but it's incorrect). The complex part here is that some nouns that have same GNA can take different articles, based on their initial syllable. E.g. "giganti" and "studenti" (giants and students, both masc. plur.):
Well don't hesitate to ask if you need help. Also, if you like I could send you some Italian magazines, comics and books, which are helpful tools to learn the language (especially magazines, which have photos, captions and titles, which help grasping the context intuitively). This excellent article by Max Bianchi (aka torredifuoco) covers the topic from an IF point of view, and in an excellent manner: Note that I had written some Wiki pages on the various problems and techniques in porting ALAN to Italian, a long time ago, all of which are thoroughly commented, with source examples and external links:
Even if they are a bit old, and might not match 100% the current state of the Italian module, the key concepts are still the same and valid. |
Beta Was this translation helpful? Give feedback.
-
I thought that in
does every word get stored? i.e. "climb", "out" and "of"?
How would the new system feel on the authors' side work, would it be just as before, except from the fact that I often wondered whether having a special notation for inline text-alternatives might mitigate the problem on the author's side. E.g.:
This could be just syntactic sugar to avoid having to define a
It's really worth it, because it gather real case examples from the various languages for which there are actually many IF games (English, French, German, Italian, Spanish, etc.) and that were contributed by the IF community over decades of Inform development. So it does contain some rare i18n gems which are based on practical examples and development strategies. Also, Nelson is very good at explaining things, both from a linguistic grammatical point of view, as well as from the implementation perspective. Enable Discussions and Move There This IssuePS: I think you should enable Discussions on this repository, so we can move there Issues which have grown too long, and keep Issues uncluttered, using them only for practical maintenance/dev task. Discussions are a fairly new feature, but you can now freely convert any Issue into a discussion, and you can manage different categories too, which makes Discussion easier to navigate. An added benefit of Discussionss is that they are more forum like, allowing structured replies, whereas Issue are linear, making it hard to keep related posts together. |
Beta Was this translation helpful? Give feedback.
-
@thoni56, I wanted to expose a problem I've encountered with the ALAN Italian project, regarding the fact that ALAN doesn't allow having same-worded aliases, verbs, directions or special classes of known words (e.g. noise words, etc.).
The Problem
The problem affects mostly directions shorthand, which clash with other common verbs or special words. To better illustrate my example, here's a table with all the main direction in Italian and English.
AND_WORD
.Beside the above mentioned clashes, which affect ALAN Italian practically, the direction shorthands "se" and "o" could also potentially clash with other Italian constructs — "se" means if, and "o" means or. Although I didn't encounter practical cases of the latter conflicts in real IF games, it does highlight the extent to which directions shorthand can collide with other useful and common Italian words.
I couldn't come up with any practical solution to provide directions shorthands in Italian adventures — every possible solution seems to solve some conflicts but introduce newer ones. E.g. if I decide to include an extra letter:
I've tried many other solutions, but it simply doesn't seem possible to avoid clashes while attempting to use a coherent system. Since in most languages short words represent commonly used particles, adverbs, etc., it might just as well be possible that similar problems affect other locales too — and that it might be a lucky exception that in English those directions shorthands don't clash with other words that are important to IF gameplay.
The Solution
From an IF point o view, directional commands always contain just the direction, so I was wondering if it would be possible to change how ALAN treats directions and their aliases, compared to other game objects.
If ALAN were to handle directions and their aliases as a separate category of words, when faced with a single word input the parser could first assume that it's a direction command, or an alias thereof, before attempting to match it with a verb.
This wouldn't solve the conflict between "no" the direction (northwest) and "no" the reply (to a yes/no question), but only if the "no" reply was implemented in an adventure as a raw reply — whereas if implemented as reply no or reply yes, there wouldn't be any conflict since the reply would no longer be a single word input.
As for "e" (short for east) conflicting with "e" the
AND_WORD
, if the parser was to consider first the possibility of it being a direction alias, the conflict would be solved — also, it wouldn't make sense to use an AND word in an input sentence with less than three word.I'm not sure how complex these changes would be, or whether they might have an unexpected impact on backward compatibility, but having a separate list for direction words and their aliases, and the parser attempting to first match a directional command for single-word inputs, seems a reasonable change; also, the parser could always check if there's also a same-worded verb and throw a disambiguation request if this was the case — but as mentioned above, authors could easily avoid these edge cases by formulating fuller verbs (e.g. reply no, answer no).
As for other direction commands, like go no, the parser already strips the
NOISE_WORD
"go" from the input, and the same can be achieved in any locale — also, I believe most players will type just the direction anyhow.Beta Was this translation helpful? Give feedback.
All reactions