Node gone when tags and info fields are present in the select list.

Posted in Gitter:
Hi. I encountered a very weird problem when using osm4scala which I cannot really explain :-(.
I have a PBF which has a node with id 5103977631
```
# osmium getid input.osm.pbf n5103977631 -o /tmp/output.osm.pbf
[======================================================================] 100% 
# osmium cat /tmp/output.osm.pbf -f osm
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmium/1.13.1">
  <bounds minlat="-90" minlon="-180" maxlat="90" maxlon="180"/>
  <node id="5103977631" version="1" timestamp="2017-09-13T19:57:39Z" uid="74746" changeset="52018502" lat="26.1914693" lon="-81.689915"/>
</osm>
When reading the same input PBF using osm4scala, I can perfectly read the same node:
spark.read.format("osm.pbf")
  .load("/mnt/data/input.osm.pbf")
  .filter("type == 0")
  .select("id","type","latitude","longitude","nodes","relations","tags")
  .filter("id == 5103977631")
+----------+----+-----------------+------------------+-----+---------+----+
|        id|type|         latitude|         longitude|nodes|relations|tags|
+----------+----+-----------------+------------------+-----+---------+----+
|5103977631|   0|26.1914693        |-81.689915       |   []|       []|  {}|
+----------+----+-----------------+------------------+-----+---------+----+
```

However, when I add the column "info" in the select cause, I'm getting this:
```
spark.read.format("osm.pbf")
  .load("/mnt/data/input.osm.pbf")
  .filter("type == 0")
  .select("id","type","latitude","longitude","nodes","relations","tags","info")
  .filter("id == 5103977631")
+---+----+--------+---------+-----+---------+----+----+
| id|type|latitude|longitude|nodes|relations|tags|info|
+---+----+--------+---------+-----+---------+----+----+
+---+----+--------+---------+-----+---------+----+----+
=> suddenly, the node can no longer be found?
```

So you would assume something is wrong with the "info" column, right? Let's try removing the "tags" column and we keep the "info" column
```
spark.read.format("osm.pbf")
  .load("/mnt/data/input.osm.pbf")
  .filter("type == 0")
  .select("id","type","latitude","longitude","nodes","relations","info")
  .filter("id == 5103977631")
+----------+----+-----------------+------------------+-----+---------+--------------------+
|        id|type|         latitude|         longitude|nodes|relations|                info|
+----------+----+-----------------+------------------+-----+---------+--------------------+
|5103977631|   0|26.1914693       | -81.689915       |   []|       []|{1, 2017-09-13 19...|
+----------+----+-----------------+------------------+-----+---------+--------------------+
```

The node can be found again??? Hu! :-D

Some environment details:

* Azure Databricks cluster 8.3
* Spark 3.1.1
* Scala 2.12 
* osm4scala com.acervera.osm4scala:osm4scala-spark3_2.12:1.0.8

Somebody an idea what I'm doing wrong?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Node gone when tags and info fields are present in the select list. #91

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Node gone when tags and info fields are present in the select list. #91

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions