-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagging mixed number as #Value #1085
Comments
hey Tom, yep - if I remember we still do some of this number-range stuff out of the box, but shied-away from some of it that resembled algebra or subtraction. This is a real doozie, and I agree it's a cool thing to opt-in to, and we should support any unambiguous 'and a half' stuff as much as we can. You can see some of the fractions tests we pass, and avoid for this here, PRs welcome if you can improve on it, in any way. ps i enjoyed your blog. |
@spencermountain Thank you Spencer! I just realized something that looks like a bug. When > nlp('15-ounce (15-ounce)').debug()
┌─────────
│ '15' - Value, Cardinal, NumericValue, Hyphenated
│ 'ounce' - Noun, Unit, Singular, Hyphenated
│ '15-ounce' - Infinitive, Verb, PresentTense sidebar: is there a way we can convert verbose number ranges (2 to 3) to hyphenated number ranges (2-3)? that would enable me to tap into the same #NumberRange tag for a match. > nlp('2 to 3 people').debug()
┌─────────
│ '2' - Value, Cardinal, NumericValue
│ 'to' - Conjunction
│ '3' - Value, Cardinal, NumericValue
│ 'people' - Noun, Plural, Actor
> nlp('2-3 people').debug()
┌─────────
│ '[2]' - Value, Cardinal, NumericValue, NumberRange
│ '[to]' - Conjunction, NumberRange
│ '[3]' - Value, Cardinal, NumericValue, NumberRange
│ 'people' - Noun, Plural, Actor edit: also happy to split these concerns into separate issues/discussions if you prefer |
hey Tom, apologies for the delay. let doc = nlp('2 to 3 people')
let { before, prep } = doc.match('[<before>#Value] [<prep>to] #Value').groups()
before.post('') //remove '2' whitespace
doc.match(prep).replaceWith('-').post('') //remove '-' whitespace
console.log(doc.text()) //2-3 people in short, some of this is weird. You may benefit from using replace() with some term methods like This |
hey @track0x1 , this is fixed in let doc = nlp('10-ounce (12-ounce)')
doc.terms().length // 4 cheers |
You're the best! Thank you |
Mixed numbers are a common way to express a value like ‘1-1/2 cups’ sometimes without the hyphen separator ‘1 1/2 cups’. When I used compromise v11 I was able to make a plugin with a regex to try and tag these as #Value but it doesn’t seem to work in the latest release. Because it’s so common should this be out of the box tagging?
My purpose here is to match all types of values (including mixed number values) for capturing.
The text was updated successfully, but these errors were encountered: