You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a small feature request on the phoneNumbers() method exposed.
I'm noting some offset weirdness.
Example that offsets correctly:
const json = nlp('My co-worker's number is +1 212-456-7890.').phoneNumbers(0).json({ offset: true })[0];
console.log(json.offset);
// offset: { index: 4, start: 25, length: 16 }
console.log('My co-worker\'s number is +1 212-456-7890.'.substr(25,16));
// '+1 212-456-7890.'
Example that offsets incorrectly:
const json = nlp('My phone number is (201) 111-2222.').phoneNumbers(0).json({ offset: true })[0];
console.log(json.offset);
// offset: { index: 4, start: 20, length: 15 }
console.log('My phone number is (201) 111-2222.'.substr(20,15));
// '201) 111-2222.'
I would expect the opening ( to have been included in the top level offset to be consistent with the first response. The closing . (period) could also be omitted in both cases, but that seems more of a global issue and unrelated (and i think you've mentioned using other helpers to remove those post-facto).
I would have contributed this change directly, but the fix appears non obvious, so i've added a simple unit test instead that may help indicate the suggested fix. priley86@5182385
hey Patrick, thank you for the good issue. This is a doozy.
The issue is from an awkward decision in the offset-calculator. When a word is split into it's whitespace (both pre & post), and it's text, the offsets don't include the whitespace characters. You can see the same issue here:
the reasoning for this is that offsets are used in text-highlighting applications, so it looked awkward highlighting spaces and things.
but yeah, this is awkward because the ( character is central to the phonenumber, and should not be thought of as whitespace.
Will keep this open as a bug. Maybe phone-number classification should happen up at the top, when words get split up. Maybe the offsets results needs more verbose values. Maybe both. I'm not sure what is best.
cheers
hey there again @spencermountain !
This is a small feature request on the
phoneNumbers()
method exposed.I'm noting some offset weirdness.
Example that offsets correctly:
Example that offsets incorrectly:
I would expect the opening
(
to have been included in the top level offset to be consistent with the first response. The closing.
(period) could also be omitted in both cases, but that seems more of a global issue and unrelated (and i think you've mentioned using other helpers to remove those post-facto).I would have contributed this change directly, but the fix appears non obvious, so i've added a simple unit test instead that may help indicate the suggested fix.
priley86@5182385
Could this an extension of the
toJson
method on the phoneNumbers View defined here?https://github.com/spencermountain/compromise/blob/master/src/3-three/misc/selections/index.js#L39
Sorry, totally unfamiliar here but trying to help 😄
The text was updated successfully, but these errors were encountered: