You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, responses are always atomic. However, for some applications it would be nice to represent labels (categorical responses) with limited structure such that features are extracted over parts of the structure, as well as the full label. We are still talking about making a single prediction, not structured prediction; the proposal is simply to enable a richer space of features over labels.
For example, in building a part-of-speech classifier, one could have features that score the full, fine-grained POS tag, as well as features that group together related tags into coarser categories to share statistical strength.
Define a composite label as a categorical response that is made up of multiple categorical parts, or components. The components could be characters in a string (such as a bit string), or in an explicit structure (such as a JSON data structure).
In the model, there will be a feature for every input characteristic (percept) and any full (simple or composite) label. In addition, when a percept is scored with a composite label, a feature for every component will fire that conjoins the percept with that component. So if every label is a POS tag and consists of two components, a coarse component and a fine component, there will be three features that fire for the label for every percept: one with the coarse component, one with the fine component, and one with the full label.
We assume the output space of the classifier will not be affected by the use of composite labels—only full labels (simple or composite) seen during training will be candidates for prediction.
Interface
Information about label structure could be (a) inferred automatically from the name of the label, (b) specified in the response file, in place of a single string name for the label, or (c) specified in some other file as a mapping of label names to richer structures. The interface proposed here will allow (a) or (b).
Let the option --composite-labels [json|string] [positional|bag] enable this feature:
If json (the default format) is specified, then all responses will be read as JSON objects. There are three allowed types of responses: JSON strings, lists of strings, and maps from strings to strings. JSON strings are interpreted as simple labels; in a list of strings, each string is a component; and in a map, the key-value pairs are components.
If string is specified, then all responses will be read as unquoted strings and treated as composite; the components are individual characters.
If positional (the default ordering) is specified, then any sequential composite labels (the label name in string mode, lists in json mode) are treated as ordered slot-fillers; i.e., each component is conjoined with its offset in the sequence.
If bag is specified, then any sequential composite labels are interpreted as bags of components; within a label, any repetition of a component will trigger an error. JSON maps are always treated as bags of key-value pairs.
Examples
If all labels are length-2 POS tags like NN = noun singular, NS = noun plural, PN = pronoun singular, PS = pronoun plural, etc., the following are equivalent ways to specify the response:
PN with --composite-labels string positional (note that bag would conflate the two possible uses of N!)
["P", "N"] with --composite-labels json positional
{"coarse": "P", "fine": "N"} with --composite-labels json
If all labels are fixed-length bitstrings, the following are equivalent:
01011 with --composite-labels string positional
["0", "1", "0", "1", "1"] with --composite-labels json positional
If the labels are clusters of morphosyntactic attributes, then with --composite-labels json bag, the two labels ["noun", "singular", "accusative"] and ["verb", "past", "singular", "causative"] would share one component in common: features associated with the "singular" component would fire for both.
The text was updated successfully, but these errors were encountered:
Currently, responses are always atomic. However, for some applications it would be nice to represent labels (categorical responses) with limited structure such that features are extracted over parts of the structure, as well as the full label. We are still talking about making a single prediction, not structured prediction; the proposal is simply to enable a richer space of features over labels.
For example, in building a part-of-speech classifier, one could have features that score the full, fine-grained POS tag, as well as features that group together related tags into coarser categories to share statistical strength.
Define a composite label as a categorical response that is made up of multiple categorical parts, or components. The components could be characters in a string (such as a bit string), or in an explicit structure (such as a JSON data structure).
In the model, there will be a feature for every input characteristic (percept) and any full (simple or composite) label. In addition, when a percept is scored with a composite label, a feature for every component will fire that conjoins the percept with that component. So if every label is a POS tag and consists of two components, a coarse component and a fine component, there will be three features that fire for the label for every percept: one with the coarse component, one with the fine component, and one with the full label.
We assume the output space of the classifier will not be affected by the use of composite labels—only full labels (simple or composite) seen during training will be candidates for prediction.
Interface
Information about label structure could be (a) inferred automatically from the name of the label, (b) specified in the response file, in place of a single string name for the label, or (c) specified in some other file as a mapping of label names to richer structures. The interface proposed here will allow (a) or (b).
Let the option
--composite-labels [json|string] [positional|bag]
enable this feature:json
(the default format) is specified, then all responses will be read as JSON objects. There are three allowed types of responses: JSON strings, lists of strings, and maps from strings to strings. JSON strings are interpreted as simple labels; in a list of strings, each string is a component; and in a map, the key-value pairs are components.string
is specified, then all responses will be read as unquoted strings and treated as composite; the components are individual characters.positional
(the default ordering) is specified, then any sequential composite labels (the label name instring
mode, lists injson
mode) are treated as ordered slot-fillers; i.e., each component is conjoined with its offset in the sequence.bag
is specified, then any sequential composite labels are interpreted as bags of components; within a label, any repetition of a component will trigger an error. JSON maps are always treated as bags of key-value pairs.Examples
If all labels are length-2 POS tags like
NN
= noun singular,NS
= noun plural,PN
= pronoun singular,PS
= pronoun plural, etc., the following are equivalent ways to specify the response:PN
with--composite-labels string positional
(note thatbag
would conflate the two possible uses ofN
!)["P", "N"]
with--composite-labels json positional
{"coarse": "P", "fine": "N"}
with--composite-labels json
If all labels are fixed-length bitstrings, the following are equivalent:
01011
with--composite-labels string positional
["0", "1", "0", "1", "1"]
with--composite-labels json positional
{"0": "0", "1": "1", "2": "0", "3": "1", "4": "1"}
with--composite-labels json
If the labels are clusters of morphosyntactic attributes, then with
--composite-labels json bag
, the two labels["noun", "singular", "accusative"]
and["verb", "past", "singular", "causative"]
would share one component in common: features associated with the"singular"
component would fire for both.The text was updated successfully, but these errors were encountered: