Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging mixed number as #Value #1085

Closed
track0x1 opened this issue Feb 1, 2024 · 5 comments
Closed

Tagging mixed number as #Value #1085

track0x1 opened this issue Feb 1, 2024 · 5 comments

Comments

@track0x1
Copy link
Contributor

track0x1 commented Feb 1, 2024

Mixed numbers are a common way to express a value like ‘1-1/2 cups’ sometimes without the hyphen separator ‘1 1/2 cups’. When I used compromise v11 I was able to make a plugin with a regex to try and tag these as #Value but it doesn’t seem to work in the latest release. Because it’s so common should this be out of the box tagging?
My purpose here is to match all types of values (including mixed number values) for capturing.

@spencermountain
Copy link
Owner

hey Tom, yep - if I remember we still do some of this number-range stuff out of the box, but shied-away from some of it that resembled algebra or subtraction. This is a real doozie, and I agree it's a cool thing to opt-in to, and we should support any unambiguous 'and a half' stuff as much as we can.

You can see some of the fractions tests we pass, and avoid for this here, PRs welcome if you can improve on it, in any way.

ps i enjoyed your blog.
cheers

@track0x1
Copy link
Contributor Author

track0x1 commented Feb 6, 2024

@spencermountain Thank you Spencer! I just realized something that looks like a bug. When 15-ounce is wrapped in parentheses it's tagged as a single term and resultantly has the wrong tags.

> nlp('15-ounce (15-ounce)').debug()

  ┌─────────
   '15'       - Value, Cardinal, NumericValue, Hyphenated
   'ounce'    - Noun, Unit, Singular, Hyphenated
   '15-ounce'  - Infinitive, Verb, PresentTense

sidebar: is there a way we can convert verbose number ranges (2 to 3) to hyphenated number ranges (2-3)? that would enable me to tap into the same #NumberRange tag for a match.

> nlp('2 to 3 people').debug()

  ┌─────────
   '2'        - Value, Cardinal, NumericValue
   'to'       - Conjunction
   '3'        - Value, Cardinal, NumericValue
   'people'   - Noun, Plural, Actor

> nlp('2-3 people').debug()

  ┌─────────
   '[2]'      - Value, Cardinal, NumericValue, NumberRange
   '[to]'     - Conjunction, NumberRange
   '[3]'      - Value, Cardinal, NumericValue, NumberRange
   'people'   - Noun, Plural, Actor

edit: also happy to split these concerns into separate issues/discussions if you prefer

@spencermountain
Copy link
Owner

hey Tom, apologies for the delay.
yeah, there's an ugly way:

let doc = nlp('2 to 3 people')
let { before, prep } = doc.match('[<before>#Value] [<prep>to] #Value').groups()
before.post('') //remove '2' whitespace
doc.match(prep).replaceWith('-').post('') //remove '-' whitespace
console.log(doc.text()) //2-3 people

in short, some of this is weird. You may benefit from using replace() with some term methods like @hasDash or @hasHyphen

This nlp('15-ounce (15-ounce)').debug() one is a doozie. Haven't got it yet, but will.

@spencermountain
Copy link
Owner

hey @track0x1 , this is fixed in 14.12.0:

let doc = nlp('10-ounce (12-ounce)')
doc.terms().length // 4

cheers

@track0x1
Copy link
Contributor Author

hey @track0x1 , this is fixed in 14.12.0:

let doc = nlp('10-ounce (12-ounce)')
doc.terms().length // 4

cheers

You're the best! Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants