Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken OSM city shape #358

Open
c0c0n3 opened this issue Sep 11, 2020 · 11 comments
Open

Broken OSM city shape #358

c0c0n3 opened this issue Sep 11, 2020 · 11 comments

Comments

@c0c0n3
Copy link
Member

c0c0n3 commented Sep 11, 2020

Describe the bug

Geocoding of city locations is currently broken.

To Reproduce

Run test_entity_add_city_shape in the geocoding.tests.test_geocoding module. If you debug it, you'll see that the call to OSM

info = geocoder.osm(key, maxRows=10)

returns an info object with a current_result.osm_type == 'node' whereas we expect current_result.osm_type == 'relation'.

Expected behavior

QuantumLeap should extract the city polygon from OSM's result, create a location attribute for it, and add the location to the entity being processed in the notification.

Environment

Additional context

PR #357 disabled test_entity_add_city_shape to get a clean build. We'll keep it disabled until we fix this issue, then re-enable it after the fix.

@c0c0n3 c0c0n3 self-assigned this Sep 11, 2020
@chicco785
Copy link
Contributor

this rather a change in the database of osm, if you try for example Milan:
https://nominatim.openstreetmap.org/search?q=Milano&format=jsonv2&addressdetails=1&limit=10

@c0c0n3
Copy link
Member Author

c0c0n3 commented Sep 11, 2020

either way, it's broken :-)
perhaps we shouldn't consider the osm_type since it isn't part of the API. I'm not even sure what their API actually is, so I'd have to dig deeper before fixing this. But to me, operation inputs and outputs are an integral part of an API. So we should see if API outputs are clearly documented somewhere, then stick to that...

@github-actions
Copy link
Contributor

Stale issue message

@Ravisaketi
Copy link
Contributor

Hi @chicco785 @c0c0n3 we are reproducing this issue in our environment,on debugging we found that for the first iteration in

if i.raw['osm_type'] == 'relation':
osm_type = 'node' but in the second iteration we got osm_type = 'relation'.
We are assuming that this issue is for the first iteration only.

Please confirm our understanding.

@c0c0n3
Copy link
Member Author

c0c0n3 commented May 6, 2022

Hi @Necravisaketi!

Thanks for helping with this :-)

Not quite sure what you mean by "first" and "second" iteration. Two runs of the same test case? If so which one? Or do you mean something else?

On another note, I must admit I'm just as clueless as you are. I don't actually know the details of the OSM API and if the meaning of 'osm_type' has changed over time, it's related to data changes as @chicco785 pointed out, or something else entirely. The first step here would be to get familiar with the API, dig into the docs and possibly ask the question on an OSM forum?

@Ravisaketi
Copy link
Contributor

Hi @c0c0n3 on debugging we found that for the first iteration in


if i.raw['osm_type'] == 'relation':

the output for the first iteration of i.raw['osm_type'] is 'node'
if i.raw['osm_type'] == 'relation':
(Pdb) pp i.raw['osm_type']
'node'

the output for the second iteration of i.raw['osm_type'] is 'relation'
if i.raw['osm_type'] == 'relation':
(Pdb) pp i.raw['osm_type']
'relation'

@c0c0n3
Copy link
Member Author

c0c0n3 commented May 6, 2022

oh right, thanks so much for clarifying, I missed that. But like I said we should probably get hold of the API spec to see what the returned data structure is supposed to look like. Once we know that we can plan a fix. It's entirely possible we'll have to rewrite part of that logic. Also, can you copy-paste in here an example of the info dictionary you get?

@Ravisaketi
Copy link
Contributor

Ravisaketi commented May 9, 2022

Hi @c0c0n3 i got this list of locations in the info

[Location(Ciudad de México, 06060, México, (19.4326296, -99.1331785, 0.0)),
Location(Ciudad de México, México, (19.3207722, -99.15151506199419, 0.0)),
Location(Ciudad de México, Maradunas, Lomas de Barrillas, Coatzacoalcos, Veracruz, 96535, México, (18.1510321, -94.5181526, 0.0)),
Location(Pozoleria "La Troje", Iztacalco, Calle Plan de Ayala, Colonia Nueva Santa Anita, Iztacalco, Ciudad de México, 08210, México, (19.3997047, -99.1248161, 0.0)),
Location(CNDH Héctor Fix Zamudio, Ciudad de México, 1922, Boulevard Adolfo López Mateos, Colonia Los Alpes, Álvaro Obregón, Ciudad de México, 01049, México, (19.3575358, -99.1951691, 0.0))]

@c0c0n3
Copy link
Member Author

c0c0n3 commented May 13, 2022

Hi @Necravisaketi, thanks for this but I'm struggling to see how this list of locations relates to the info map? e.g. where's the osm_type key-value pair?

@Ravisaketi
Copy link
Contributor

Hi @c0c0n3
we got this in the first iteration i.raw where i is in info

{'boundingbox': ['19.2726296', '19.5926296', '-99.2931785', '-98.9731785'],
 'class': 'place',
 'display_name': 'Ciudad de México, 06060, México',
 'geojson': {'coordinates': [-99.1331785, 19.4326296], 'type': 'Point'},
 'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_place_city.p.20.png',
 'importance': 1.149923932441765,
 'lat': '19.4326296',
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. '
            'https://osm.org/copyright',
 'lon': '-99.1331785',
 'osm_id': 62270270,
 'osm_type': 'node',
 'place_id': 286135844,
 'type': 'city'}

in the second iteration i.raw is

{'boundingbox': ['19.0487187', '19.5927572', '-99.3649242', '-98.9403028'],
 'class': 'boundary',
 'display_name': 'Ciudad de México, México',
 'geojson': {'coordinates': [[[-99.3649242, 19.2777693],
                              [-99.3619219, 19.2757419],
                              [-99.3561284, 19.2749894],
                              [-99.3482137, 19.2739609],
                              .
                              . 
                              .
                              .
                              .
                              etc
'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_boundary_administrative.p.20.png',
 'importance': 1.0538840457137142,
 'lat': '19.3207722',
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. '
            'https://osm.org/copyright',
 'lon': '-99.15151506199419',
 'osm_id': 1376330,
 'osm_type': 'relation',
 'place_id': 285199565,
 'type': 'administrative'}

@c0c0n3
Copy link
Member Author

c0c0n3 commented May 20, 2022

Hi @Necravisaketi thank you so much for helping out with this, much appreciated!

To sum up, there's at least two JSON objects in the info array returned by OSM:

info = [
    { 'osm_id': 62270270, 'osm_type': 'node', ...},
    { 'osm_id': 1376330, 'osm_type': 'relation', ...}
]

Thanks again for that info, but I reckon to get to the bottom of this, we've got to dig a bit more. Here's some pseudo-code to capture the logic of the flow we're interested in.

add_locations: [e1, e2, ...]
    return [add_location(e1), add_location(e2), ...]

add_location(e)
    key, osm_type = get_address_key_and_type(e)
    info = asyncio.run(geocode(key))

    if osm_type == TYPE_POINT:
        loc = _extract_point(info)
    elif osm_type == TYPE_WAY:
        for i in info:  # street
            if i.raw['osm_type'] == 'way':
                loc = i.raw['geojson']
                break
    elif osm_type == TYPE_RELATION:
        for i in info:  # state or country
            if i.raw['osm_type'] == 'relation':
                loc = i.raw['geojson']
                break

    _do_add_location(e, loc)

To understand what's happening, we need to know what are the input entities, i.e. the content of the array passed to add_locations and for each entity in there what's the corresponding key and osm_type the get_address_key_and_type call returns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants