Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"parking lot" should find amenity=parking #461

Open
matkoniecz opened this issue May 17, 2022 · 21 comments
Open

"parking lot" should find amenity=parking #461

matkoniecz opened this issue May 17, 2022 · 21 comments
Labels
question Further information is requested

Comments

@matkoniecz
Copy link
Contributor

(I may implement it, after #337 is reviewed/rejected/merged)

@Hufkratzer
Copy link
Contributor

How do you intend to do that? In default language "en" It is already found:

parking_lot

Only in the language "en_GB" it is not found, because it was translated as "Car Park".

@matkoniecz
Copy link
Contributor Author

How do you intend to do that?

Add an alias like in say #335

@tyrasd tyrasd added the question Further information is requested label May 17, 2022
@tyrasd
Copy link
Member

tyrasd commented May 17, 2022

I guess this should probably be fixed in iD's code directly because this is likely an issue with all presets whose name has been translated in any (non US) dialect of English: Any preset name could also be automatically be included in the search term for all English locales. This would be the only "good" solution if you ask me, because otherwise we would potentially need to include every preset's name in their respective list of terms (see the readme section for context).

@matkoniecz
Copy link
Contributor Author

Such systematic idea is likely superior than doing it with all terms as it is being spotted. Though I am not going to promise implementing this wider scope.

@westnordost
Copy link
Contributor

westnordost commented May 23, 2022

Well, another idea here: Instead of always also matching against the en-US default localization, instead match against the tag value. E.g. if you type "parking", it will also find amenity=parking regardless of the user's locale because that's the OSM tag value of the preset.

@matkoniecz
Copy link
Contributor Author

matkoniecz commented May 24, 2022

Would it be possible to add this English dialect aliases to output files?

Note that if such aliasing will be done on iD side it would require StreetComplete and GoMap!! and Every Door by Zverik and so on to implement the same mechanism.

While implementing it in build script would allow to add it once and have it applied to all other users.

For example see westnordost/osmfeatures#13 which appears to be result of the same issue.

alcohol shop finds shop=alcohol in en-GB and fails to find it in en-US

https://github.com/openstreetmap/id-tagging-schema/blob/main/dist/translations/en-GB.json#L3453
https://github.com/openstreetmap/id-tagging-schema/blob/main/dist/translations/en.json#L7137

@matkoniecz
Copy link
Contributor Author

Build script seems to be in https://github.com/ideditor/schema-builder/blob/main/lib/build.js - and I could try implementing this as part of improving StreetComplete (and improve iD, GoMap!!, Every Door and maybe also other editors).

@tyrasd
Copy link
Member

tyrasd commented May 24, 2022

Would it be possible to add this English dialect aliases to output files? […] and I could try implementing this

That sounds like a good idea. 👍 Your contribution would be very welcome.

@westnordost
Copy link
Contributor

westnordost commented May 24, 2022

@matkoniecz That does not make sense in my opinion (I am the author of osmfeatures library):

The primary reason being that en.json is actually en-US, i.e. there is no en-US.json. Mateusz idea premises on that the translations are organized in a form like

  • pt - Portuguese translations: contains common translations
  • pt-PT - Portuguese translations (Portugal dialect)
  • pt-BR - Portuguese translations (Brazil dialect)

If the translations were organized in that manner, merging pt into pt-PT for distribution would make sense. But the translations are not organized in that manner. We have

  • pt - all Portuguese translations (implicitly Portugal dialect)
  • pt-BR - all Portuguese translations (Brazil dialect)

The same for other languages that have significant dialects in different countries, i.e. en. So, merging together pt and pt-BR will then just include the Brazilian words into the localization for Portugal. E.g. highway=bus_stop will both be named "Paragem de autocarro" (correct) and "Ponto de ônibus" (but that's Brazilian Portuguese).

In the end, whether to fall back or even merge (in)to another locale should remain a client-side decision. I.e. if you decide that merging en-US into en-GB may not be the cleanest solution, but nevertheless it improves things (may want to consult with British users though), then it is a decision you should make for iD and not for any user of this preset data.

@matkoniecz
Copy link
Contributor Author

matkoniecz commented May 24, 2022

My premise is that if term is used for something in one dialect of English, then it will be an useful alias in any other dialect of English.

I am aware that en is actually en-US

Note that it would be only alias: not something shown as a label and only mattering when someone used this term on their own.

Lets take fictional example and say than in EN-AU name for parking lot is "foobar". Is it useful to show parking lot when user searched for "foobar" while using EN-GB?

How likely is that (1) alias would be used also in other dialects, maybe less commonly (2) someone would be mixing multiple dialects and use terms from different ones at once (especially common with people learning English as a foreign language)?

How likely is that alias would result in actively misleading/confusing/unwanted matches? Like #237?

Summoning @ZeLonewolf (hope that it is OK) as I am NOT a native speaker of English.

E.g. highway=bus_stop will both be named "Paragem de autocarro" (correct) and "Ponto de ônibus" (but that's Brazilian Portuguese).

I am not proposing that. I am proposing that it would be named "Paragem de autocarro". But findable also if someone would type "Ponto de ônibus" (or "Ponto" or "ônibus"), as "Ponto de ônibus" would become listed as an alias.

@westnordost
Copy link
Contributor

westnordost commented May 24, 2022

So, your idea is to (in example of en and en-GB,en-AU...):

  • append en+en-AU,... name of preset to en-GB aliases (except duplicates)
  • append en+en-AU,... aliases of preset to en-GB aliases (except duplicates)
  • append en+en-AU,... terms of preset to en-GB terms (except duplicates)

And then probably same the other way round, i.e. merge en-GB into en(-US) etc.

How likely is that alias would result in actively misleading/confusing/unwanted matches? Like #237?

I don't know, but this is the reason why I argued that it should be a client-side decision.

Also, note that aliases are not implemented in iD yet.

@tyrasd
Copy link
Member

tyrasd commented May 24, 2022

I would propose to only do the following:

  • append en name of preset to en-* terms (except duplicates)
  • append en aliases of preset to en-* terms (except duplicates)

@ZeLonewolf
Copy link
Contributor

I don't really have an opinion on how the language dialects are structured. There are certainly examples of words that mean one thing in en_GB and something else in en_US, For example en_GB "chips" means en_US "french fries" and en_US "chips" means en_GB "crisps". I'm not sure how many of these cases would apply to OSM features, however...

@westnordost
Copy link
Contributor

westnordost commented May 24, 2022

pavement comes to mind. That's the usually paved walk for pedestrians at the side of the street in British English. And in American English, it is the main part of the street that has pavement, i.e. often everything except the sidewalk.

@tyrasd
Copy link
Member

tyrasd commented May 24, 2022

If that were an issue in practice, we would need to rethink how we handle preset terms in English: Currently, all English dialects share the same list of search terms (see readme). Yes, this will result in a search for pavement to show both the preset for highway=sidewalk and area:highway=*1. But users would always get shown the corresponding (more "precise") preset name (and preset icon) in the results list, letting them choose what they actually intended to search for.

IMHO that is fine and working as intended. But maybe I'm overlooking something that could be problematic?

Footnotes

  1. luckly in this example, the presets only apply to different geometry types, so the theoretically possible situation that the search shows both results does not actually happen here.

@westnordost
Copy link
Contributor

Hm well, I just want to advert at the other possible solution that would solve the use case as described as well: #461 (comment)
This would be a feature in iD of course, not in this schema.

@tyrasd
Copy link
Member

tyrasd commented May 24, 2022

match against the tag value.

iD already does support this (see openstreetmap/iD#8869 (comment)).

While this helps in some cases (e.g. when searching for parking), in this particular case (searching for parking lot) it doesn't do the trick.

@bgo-eiu
Copy link

bgo-eiu commented May 27, 2022

I agree with using overlapping terms/aliases for all other English dialects if you have one locale selected. Not everyone I know even speaks the same dialect of English so sometimes I forget which ones might be locale specific.

The tag names themselves are also already a mix of dialects

@ZeLonewolf
Copy link
Contributor

pavement comes to mind. That's the usually paved walk for pedestrians at the side of the street in British English. And in American English, it is the main part of the street that has pavement, i.e. often everything except the sidewalk.

In en_US, pavement refers to any paved area. The street has pavement, the sidewalk has pavement, etc.

@1ec5
Copy link
Contributor

1ec5 commented May 28, 2022

Parking lots are pavement too. 😉

If either the build scripts or individual clients like iD mix terms from different dialects, these additional terms should be weighted much less than terms from the current dialect.

Also, there would already be use cases for language-specific tweaks to the preset search algorithm: openstreetmap/iD#8242 (comment). Maybe any change related to this request could be scoped to just English for now, where the impact would be better understood.

@1ec5
Copy link
Contributor

1ec5 commented May 31, 2022

Currently, all English dialects share the same list of search terms (see readme).

If this is the case, are the terms in the non-American English localizations ignored when using those locales? Or do you mean that the non-American English localizations don’t have many terms translated yet?

"shop/cheese": {
"name": "Cheese Shop",
"terms": "cheesemonger"
},
"amenity/vehicle_inspection": {
"terms": "car inspection,mot test,mot centre"
},
"shop/convenience": {
"name": "Convenience Store / Dairy",
"terms": "dairy,superette,convenience store,convenience shop,corner store,corner shop"
},

#475 is adding quite a few non-American English words for things, but if that’s already handled through the Transifex workflow, then that would save @westnordost some work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants