Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USA is getting labeled with a feature code 'BLDG' #77

Closed
shahvaiz opened this issue Apr 9, 2020 · 3 comments · Fixed by #81
Closed

USA is getting labeled with a feature code 'BLDG' #77

shahvaiz opened this issue Apr 9, 2020 · 3 comments · Fixed by #81

Comments

@shahvaiz
Copy link

shahvaiz commented Apr 9, 2020

Hello - If "USA" is in a sentence, it's getting tagged as a building. See below. Thanks so much.

geo.geoparse("We traveled to the USA")

[{'word': 'USA',
  'spans': [{'start': 19, 'end': 22}],
  'country_predicted': 'USA',
  'country_conf': 0.9998105,
  'geo': {'admin1': 'California',
   'lat': '34.00474',
   'lon': '-117.33588',
   'country_code3': 'USA',
   'geonameid': '7195491',
   'place_name': 'Sgi-Usa Riverside Community Center',
   'feature_class': 'S',
   'feature_code': 'BLDG'}}]
@ahalterman
Copy link
Member

Thanks for the report. Mordecai wasn't really intended to geolocate country names, rather than subnational place names, but that's definitely not a good result to return.

@ylieder
Copy link

ylieder commented Jun 9, 2020

Same behavior occurs with "France": geo.geoparse("We traveled to France")

[{
  "word": "France",
  "spans": [
    {
      "start": 15,
      "end": 21
    }
  ],
  "country_predicted": "FRA",
  "country_conf": 0.9998105,
  "geo": {
    "admin1": "\u00cele-de-France",
    "lat": 48.94956,
    "lon": 2.5684,
    "country_code3": "FRA",
    "geonameid": "2971874",
    "place_name": "Tremblay-en-France",
    "feature_class": "P",
    "feature_code": "PPL",
    "country_code2": "FR"
  }
}]

Maybe it would be possible to weight records with feature class A.PCLI (independent political entity) higher, which are currently 192 countries.
I don't know, how the search is implemented, but both, "France" and "USA" are listed as aliases for the respective country in the GeoNames Gazette.

Reference: http://download.geonames.org/export/dump/featureCodes_en.txt

@akankshanb
Copy link

  1. The same behavior is seen for China as well.
    geo.geoparse('We traveled to China')
[{'country_conf': 0.68758196,
  'country_predicted': 'CHN',
  'geo': {'admin1': 'Hubei',
          'country_code3': 'CHN',
          'feature_class': 'S',
          'feature_code': 'SCHC',
          'geonameid': '6620465',
          'lat': '30.52047',
          'lon': '114.39637',
          'place_name': 'China University of Geosciences'},
  'spans': [{'end': 20, 'start': 15}],
  'word': 'China'}]
  1. It predicts Canada as a country if U.S in the sentence is used
    geo.geoparse('We traveled to the U.S')
[{'country_conf': 0.28868943,
  'country_predicted': 'CAN',
  'spans': [{'end': 22, 'start': 19}],
  'word': 'U.S'}]

Kindly look in this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants