-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload Wizard cannot find 和泉 (杉並区) #5794
Comments
As far as I understand, that is expected, as Commons app does not search Wikipedia for apps-android-commons/app/src/main/java/fr/free/nrw/commons/upload/depicts/DepictsInterface.kt Lines 21 to 28 in 190135d
and it only fetches 25 elements: Line 24 in 190135d
in your case, it searches for this which does not include
Unfortunately I cannot read Japanese script so cannot tell if this specific case would be OK, but if it describes the city by its alternative names, it should be OK. More guidance may be found at: https://www.wikidata.org/wiki/Help:Aliases However, doing mass-import of data not verified by human being is not likely to be OK (and should definitely first be discussed with wikidata admins even if it sounded like good idea). See https://www.wikidata.org/wiki/Wikidata:Data_Import_Guide for general considerations.
That is correct. For any popular name which has more than 25 matches; if the specific string you search for does not occur in TOP-25, you won't find a match 😢 |
However, as that API supports paginated search, it could be supported similar to idea proposed for categories search here #3179 (comment) in second bullet point, i.e. add
(etc. you get the idea, but your match from this specific issues would already be found) That way, you'd be able to find your popular search term in all cases. Alternatives to
|
@mnalis Thanks for the link https://www.wikidata.org/wiki/Help:Aliases ! An English equivalent could be Paris, Texas: https://www.wikidata.org/wiki/Q830149 |
If you are talking about "we could batch-add article titles as aliasses" as the idea here, It looks like it is prohibited by step 1. of that import guidelines that I linked to. If you are however talking about fixing this one specific example only, It would be best if you asked about it (I'm don't even read the script, much less can translate it or weigh its nuances)
Perhaps, if we use some third API for searching wikipedia articles for exact title. But note that it would likely rarely help, as wikipedia titles are finicky, and IIRC user would somehow have to specify which wikipedia language to search in advance. e.g. I don't think that searching for "Thành phố Hồ Chí Minh" in titles of English Wikipedia would work, and searching for "Ho Chi Minh" on English wikipedia won't work either if you're only matching on exact title -- as article is named "Ho Chi Minh City"; and if you go after partial title results, then there will be much more than 0 or 1 results (there are likely many articles starting with "Ho"), and you'd still need to do paging (more complex when you need to page two different APIs at the same time!) Given that just paging on your original query would've solved the issue issue, I think that should be first step anyway (as you'd likely have to implement it anyway for more complex solutions too) |
In the Wikidata website's search results, the town is at the 3rd place for 杉並 和泉 (or Suginami[space]Izumi). So just adding some right terms for disambiguation seems to help, and that's what I would do manually, if I don't find it with just 和泉 (Izumi). More broadly, perhaps we could filter and rerank the raw search results to prominently show items that are more likely to be the target of depict. In this case, the same term (in written form) 和泉 can refer to family names, but names are not much ikely to be depicted in a photo: Izumi (Q13495859) has location, so we could theoretically check how close it is to the user's location and use that to boost it in reranking. |
Great finding! We should use the same search API URL as the desktop website, it gets results where we get nothing. This sounds much easier to implement than the solutions we had considered above. The proximity idea is great for a subsequent phase. Search results from the desktop website: Surprisingly the mobile website's search is not good: |
I took a picture in the 和泉 neighborhood:
https://ja.wikipedia.org/wiki/%E5%92%8C%E6%B3%89_%28%E6%9D%89%E4%B8%A6%E5%8C%BA%29
Because there are many more famous towns and people with the same name, it does not make it to the suggestions:
No surprise so far.
But when I exactly type the full Wikipedia article name
和泉 (杉並区)
, I get nothing:To select the correct depiction, the user has to navigate to the Wikidata item (https://m.wikidata.org/wiki/Q13495859), which ip a pain to do on mobile, and copy its QID then paste it into our app's depiction search textbox:
It is not a Japanese-specific issue, it can happen for any language.
Maybe the Wikidata search API has an option to match also via article titles?
If not, it will be a difficult issue to implement, we might have to call an additional different API to get potential Wikidata items via the Wikipedia articles titles. Or we could batch-add article titles as aliasses if it is OK from an editorial point of view.
The text was updated successfully, but these errors were encountered: