Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce writeMeta and use for Nodes, Relations and Ways #83

Merged
merged 10 commits into from
Jan 13, 2025

Conversation

lehmann-4178656ch
Copy link
Member

writeMeta adds Facts about changeset, timestamp, user, version and visible. If these values are not set 0, 1970-01-01 00:00:00 and "" depending on the key will be answered.

@lehmann-4178656ch
Copy link
Member Author

Partial example output for Lichtenstein:

...
osmnode:26863444 rdf:type osm:node .
osmnode:26863444 osmmeta:changeset "141388466"^^xsd:integer .
osmnode:26863444 osmmeta:timestamp "2023-09-17T16:00:53"^^xsd:dateTime .
osmnode:26863444 osmmeta:user "tg4567" .
osmnode:26863444 osmmeta:version "7"^^xsd:integer .
osmnode:26863444 osmmeta:visible "yes" .
osmnode:26863444 osmkey:wikipedia "en:Kuhgrat" .
osmnode:26863444 osm2rdfkey:wikipedia <https://en.wikipedia.org/wiki/Kuhgrat> .
osmnode:26863444 osmkey:wikimedia_commons "Category:Kuegrat" .
osmnode:26863444 osmkey:wikidata "Q4244296" .
osmnode:26863444 osm2rdfkey:wikidata wd:Q4244296 .
osmnode:26863444 osmkey:natural "peak" .
osmnode:26863444 osmkey:name "Kuhgrat" .
osmnode:26863444 osmkey:ele "2122" .
osmnode:26863444 osm2rdf:facts "8"^^xsd:integer .
osmnode:26863444 geo:hasGeometry osm2rdfgeom:osm_node_26863444 .
osm2rdfgeom:osm_node_26863444 geo:asWKT "POINT(9.5608307 47.1666716)"^^geo:wktLiteral .
osmnode:26863444 osm2rdfgeom:convex_hull "POLYGON((9.5608307 47.1666716,9.5608307 47.1666716,9.5608307 47.1666716,9.5608307 47.1666716,9.5608307 47.1666716))"^^geo:wktLiteral .
osmnode:26863444 osm2rdfgeom:envelope "POLYGON((9.5608307 47.1666716,9.5608307 47.1666716,9.5608307 47.1666716,9.5608307 47.1666716,9.5608307 47.1666716))"^^geo:wktLiteral .
osmnode:26863444 osm2rdfgeom:obb "POLYGON((9.5608307 47.1666716,9.5608307 47.1666716,9.5608307 47.1666716,9.5608307 47.1666716,9.5608307 47.1666716))"^^geo:wktLiteral .
osmnode:30122806 rdf:type osm:node .
osmnode:30122806 osmmeta:changeset "145813790"^^xsd:integer .
osmnode:30122806 osmmeta:timestamp "2024-01-02T20:31:52"^^xsd:dateTime .
osmnode:30122806 osmmeta:user "tf66" .
osmnode:30122806 osmmeta:version "11"^^xsd:integer .
osmnode:30122806 osmmeta:visible "yes" .
osmnode:30122806 osmkey:ref "7" .
osmnode:30122806 osmkey:name "Haag" .
...

@patrickbr
Copy link
Member

patrickbr commented Jan 8, 2025

I have merged the latest master and made the following changes:

  • Use generaetLiteralUnsafe() for generating the changeset and version literals
  • Do not write the user and changeset triples if they are empty

osm2rdf::ttl::constants::IRI__OSMMETA_VERSION =
generateIRI(osm2rdf::ttl::constants::NAMESPACE__OSM_META, "version");
osm2rdf::ttl::constants::IRI__OSMMETA_VISIBLE =
generateIRI(osm2rdf::ttl::constants::NAMESPACE__OSM_META, "visible");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visible is guaranteed to be either true or false in the OSM XML format, so the corresponding triples could have Boolean values instead of strings. That said, visible will always be true in any normal extract of OSM or OHM data unless you specifically load an extract that contains full history.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I changed it to boolean values. Also, the visibility triples are now only written if visibility is false.

…ll normal datasets), use boolean value for visible and also for hasCompleteGeometry triples
@patrickbr
Copy link
Member

This PR now also adds the option --no-osm-metadata, which completely drops timestamp, user, version, changeset, and visibility information from the output.

@hannahbast
Copy link
Member

At https://qlever.cs.uni-freiburg.de/osm-test you find a SPARQL endpoint for all objects in the bounding box of Germany (extracted from planet-241230.osm.pbf via osmium-extract, and converted to TTL using osm2rdf with the new PR).

For example, https://qlever.cs.uni-freiburg.de/osm-test/4gtzjU . Note that every object has a version and changeset, but there are a few objects without timestamp or user, hence the two OPTIONALs in the query.

@1ec5 Does this now contain all the information you would expect or is there something important still missing?

Copy link

@1ec5 1ec5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is what I’d intuitively expect when metadata happens to be available in the source dataset. The visibility flag has me wondering about what it would like to support historical queries in the future, but you’d probably like to save that idea for some other day. 😉

// avoid writing empty users, drop entire triple
if (!object.user().empty()) {
_writer->writeTriple(subj, IRI__OSMMETA_USER,
_writer->generateLiteral(object.user(), ""));
Copy link

@1ec5 1ec5 Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for your awareness, one common attribute hasn’t been implemented yet: uid is the user’s numeric identifier. Looking at the alternative query engines, OverpassQL has a uid() operator, while Sophox doesn’t expose this information at all. Wikidata has a lightly used OpenStreetMap numeric user ID (P8754) property (only for otherwise notable users), which is expected to have a website username or ID (P554) qualifier.

A UID has somewhat different privacy implications than a user name. A user can change their user name at any time, but the UID remains constant. If the user deletes their account, the UID remains on any element they edited until someone else edits that element.

User names and UIDs are primarily used for quality assurance, or for mappers to track their own contributions somewhat crudely. I suspect people use user names more frequently than UIDs when crafting Overpass API queries by hand, since UIDs are somewhat hidden on the main OSM website. However, anything that needs to be stable despite user renaming, such as countervandalism tools, would probably use UIDs instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UID is now also written (predicate osmmeta:uid).

Also, I added proper prefixes for OSM users (https://www.openstreetmap.org/user/) and changesets (https://www.openstreetmap.org/changeset/).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I added proper prefixes for OSM users (https://www.openstreetmap.org/user/)

Reverting this to simply using strings for user names again because of some strange UTF-8 encoding issues when creating the user IRIs for flamboyant user names.

Copy link

@1ec5 1ec5 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Out of curiosity, what kind of problematic sequences occurred in the user names that prevented UTF-8 encoding? That might be useful for the OSM Wiki to document for other data consumers too.

@patrickbr patrickbr merged commit 1df3cef into master Jan 13, 2025
8 checks passed
@patrickbr patrickbr deleted the object-metadata branch January 13, 2025 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants