Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error raised for Spatial Data Service category keywords using CharacterString #419

Closed
AntoRot opened this issue Oct 14, 2020 · 26 comments
Closed
Labels
deployed in reference validator Solution deployed in production
Milestone

Comments

@AntoRot
Copy link
Contributor

AntoRot commented Oct 14, 2020

I tested a Network Service metadata record (i.e. RER-view.xml.zip) where the Spatial Data Service category keyword (coming from Part D 4 “Classification of Spatial Data Services” of Regulation 1205/2008) is given using the language-neutral value in the gmd:keyword/gco:CharacterString element, i.e.:

   <gmd:keyword>
            <gco:CharacterString>humanGeographicViewer</gco:CharacterString>
   </gmd:keyword>

An error is raised as that keyword should be given using the gmx:Anchor element.
This the Test Report: https://inspire.ec.europa.eu/validator/v2/TestRuns/EID9c1a5f9b-847c-46e9-85ba-1f13a017afcb.html#EIDd80b9cf5-24c5-4db1-b101-eb8ce8b4fc38
This the Assertion URI: https://inspire.ec.europa.eu/validator//v2/TestRuns/EID9c1a5f9b-847c-46e9-85ba-1f13a017afcb.html?lang=it#EIDd80b9cf5-24c5-4db1-b101-eb8ce8b4fc38

TG Requirement 3.4 doesn't mandate the use of the gmx:Anchor element for the Spatial Data Service category, but its use is only recommended (see TG Recommendation 3.2).
Consequently, no error should be raised in case gco:CharacterString is used as in the metadata record tested.

@iuriemaxim
Copy link

iuriemaxim commented Oct 15, 2020

@AntoRot I also had a look at the metadata file and the test results.

I think that the validator is not able to correctly indicate the error in the metadata file.

First of all it does not trigger an error in relation with humanGeographicViewer encoded as free text, but it indicates:

image

There is no error indicating a problem with the humanGeographicViewer text indicated by the validator, but indeed the validator is indicating errors in relation with the SDS Category keywords.

I think that the problem related to the humanGeographicViewer is that this value is coming from a controlled vocabulary:

And therefore this requirement apply as well:

image

Unfortunately the originating controlled vocabulary (thesaurus name element) is missing for the humanGeographicViewer keyword.

I do not think that the validator will be able always to indicate to the user exactly where the error is, but indeed the error message provided is quite strange.

Probably the indicator needed to communicate that it did not determined the Spatial Data Service Category as the corresponding originating controlled vocabulary is not present in the metadata file. But most probably it cant trigger such an error as there is no requirement to stipulate which should be the controlled vocabulary to be indicated:

image

I would be in favor of the first.

Another topic would be in relation to the value humanGeographicViewer that is used for a WMS in the provided metadata file.

I just thought that for WMS it is necessary to use infoMapAccessService.

Hope it helps.

@AntoRot
Copy link
Contributor Author

AntoRot commented Oct 15, 2020

@iuriemaxim the error is not raised indeed for the value used (i.e. humanGeographicViewer) nor because the thesarus citation is missing, but for the use of gco:CharacterString instead of gmx:Anchor.
To understand what is the error, you can see the ATS related to that metadata element, Spatial Data Service category, here: https://github.com/inspire-eu-validation/metadata/blob/2.0/sds/sds-category.md.

If in the metadata file the gmx:Anchor element is used (see the file updated RER-view_Anchor.xml.zip), that error is no longer raised, as you can see in the test report: https://inspire.ec.europa.eu/validator/v2/TestRuns/EID8ec54369-1291-45d0-80e9-c643c8aaefd4.html.

I think that the use of gco:CharacterString should be allowed and consequently no error should be raised.

@iuriemaxim
Copy link

@AntoRot You are right as Ii did not checked what is happening if the humanGeographicViewer will be encoded by using the gmx:Anchor in the metadata file provided.

Indeed the ATS is incorrect.

But the validator should not trigger also an error that the thesaurus was not indicated for this value that comes from a controlled vocabulary?

@carlospzurita
Copy link
Contributor

Dear @AntoRot

Thank you for opening this issue and provinding the test report and resources tested. We are reviewing the Metadata TG for the usage of gco:CharacterString.

Also, we are checking the issue mentioned by @iuriemaxim on the thesaurus.

@AntoRot
Copy link
Contributor Author

AntoRot commented Oct 15, 2020

Dear @carlospzurita

Thank you very much.

@iuriemaxim the validator shall check the metadata records only based on TG requirements.
Apart from the general TG requirement C.15, no thesaurus is mandated to be cited based on the TG Requirement 3.4 (specific for the Spatial Data Services category), but this Requirement only mandates to give only keywords value, differently, e.g., by the Requirement 1.4 for the keywords for Spatial Data Themes (where there are specific requirements on how the thesaurus shall be documented).
The citation of the thesaurus in case of Spatial Data Services category is only recommended by the TG Recommendation 3.2.

I would avoid to add further effort to the data and services providers in updating again the metadata records adding tests not based on TG Requirements.

@iuriemaxim
Copy link

iuriemaxim commented Oct 15, 2020

@AntoRot I am reading the same TG and based on the requirement C.15, I indicated the thesaurus for all keywords that are coming from a thesaurus, including for thesauri that are not INSPIRE, EC or EEA related, such as for keywords coming from the World Meteorology Organisation thesauri for example.

The requirement 3.4 does not contradicts requirement C.15. Also the recommendation 3.2 does not contradicts the requirement C.15, but it just recommend how to encode the thesauri that is required by C.15. As I mentioned in the post above, I would prefer to indicate INSPIRE registry as the thesauri and not the IR as recommended in 3.2, but still I need to indicate the thesauri.

image

@carlospzurita Thank you for mentioning that the requirement C.15 will be analysed in conjunction to req 3,2, so I will not open a new issue.

@iuriemaxim
Copy link

@carlospzurita It would be useful also to clarify which are the keywords that are expected for a WMS, a WFS, an ATOM, a WCS, CSW ... as I see that I used infoMapAccessService, while @AntoRot used humanGeographicViewer.

It is quite strange to impose the use of a keyword to indicate the category of the SDS, but it is not clear which keywords can be used at least for the most common services, such as WMS, WFS and ATOM at least. Otherwise the requirement 3.4 does not make to much sense. I know that there is no requirement in the TGs related on this topic, but most probably there are some requirements in the OGC standards, as I remember that this topic was covered in a previous issue.

@AntoRot
Copy link
Contributor Author

AntoRot commented Oct 15, 2020

@iuriemaxim based on TG Requirement 3.4 I wouldn't know what thesaurus to be cited as no specific thesaurus is mandated for the Spatial Data Services category. If a specific thesaurus is not explicitly mandated by a Requirement, the validator shall not check it.
Unless each data provider may choose what thesaurus to cite (between, e.g., INSPIRE registry and IR) to be conformant to TG Requirement C.15, as that one suggested in TG Recommendation 3.2 is indeed only a recommendation. In this case the validator would check only the presence of the metadata elements for the citation of the thesaurus but not the content.
But, if so, interoperability will not be ensured.
That's all by my side.

@iuriemaxim
Copy link

iuriemaxim commented Oct 15, 2020

@AntoRot Neither no thesaurus is mandated for the WMO keywords in the TG. But there is such a requirement stating that if a keyword is taken from a thesaurus, that thesaurus should be mentioned.

For example if I am using the keyword "Relative humidity" I need to indicate that it comes from "WMO Codes Registry, Register: BUFR4 Code and Flag table, version 416" and I can specify, even if I am not obliged that the registry is available at http://codes.wmo.int/bufr4/codeflag.

The value humanGeographicViewer is taken from a thesauri, otherwise it can be written ZZZZ as a keyword to indicate the Spatial Data Services category. If the value was taken from an ISO standard, than that ISO standard should be mentioned. If the value was taken from IR, than the specific IR should be mentioned. If the value was taken from INSPIRE Registry than the INSPIRE Registry should be mentioned. Same keyword can reside in many thesauri. Luckily the humanGeographicViewer resides in the INSPIRE Registry.

I cant agree that a general requirement should be treated as a recommendation based on the text of a recommendation as no specific detailed requirement exist. The effect would be that I have no obligation to indicate the WMO registry when using keywords such as "Relative humidity".

@iuriemaxim
Copy link

iuriemaxim commented Oct 15, 2020

Seems that the Spatial Data Service Category posed a lot of problems to the validator, so relevant to this issue are also:
https://github.com/inspire-eu-validation/community/issues/319
https://github.com/inspire-eu-validation/community/issues/213

Although there is no requirement or recommendation in any TG regarding what Spatial Data Service Category to be used for various Network Services:

  • for WMS and WMTS the keyword infoMapAccessService should be used,
  • for WFS the keyword infoFeatureAccessService should be used,
  • for WCS the keyword infoCoverageAccessService should be used
  • for SOS the keyword infoSensorDescriptionService should be used
  • for CSW the keyword infoCatalogueService should be used.

image

image

A screenshot from the INSPIRE proxybrowser, illustrates which are most declared Spatial Data Service Categories:

image

Maybe it would be appropriate to take into consideration all these aspects, even if no requirements or recommendations exist in the TGs, by taking into consideration that the descriptions of the Spatial Data Service Categories are quite clear to deduce which ones should be used for which service.

@carlospzurita
Copy link
Contributor

Dear @AntoRot

We modified the check on the Conformance Class to compare either values in a characterString or Anchors to the ones present on the registry. The test will keep checking that there is at least one element valid, but it will allow this elements to be defined using directly the keyword value or the URI.

Please check out on staging this addition.

Dear @iuriemaxim

We are checking the issue for the thesaurus. For now, the test executes the validations using by default the values from https://inspire.ec.europa.eu/metadata-codelist/SpatialDataServiceCategory , even if it is not declared in the metadata.

@carlospzurita carlospzurita added ready for testing Solution provided to reporter or developed & deployed in staging (or beta), waiting for testing and removed under analysis labels Oct 22, 2020
@AntoRot
Copy link
Contributor Author

AntoRot commented Oct 22, 2020

Dear @carlospzurita,

now the test is ok to me.

The text of the test method in the ATS (i.e. https://github.com/inspire-eu-validation/metadata/blob/2.0/sds/sds-category.md) should be revised accordingly once the change will be available in the production environment.

Thank you!

@iuriemaxim
Copy link

@carlospzurita Ok, please keep this issue open for the following related issues:

1 - No error is triggered even if the thesaurus is missing for a keyword that is taken from a controlled vocabulary (REQ C.15)
2 - Keyword values is not verified to be relevant for the type of the service (i.e. keyword should be infoFeatureAccessService for WFS). There is no REQ in the INSPIRE TG, but still the keyword should be correctly indicated in order not to mislead. Otherwise the requirement 3.4 is useless.

If necessary I can open other two issues with these topics, but I think that is appropriate to keep them here in order not to repeat the entire explanation.

@carlospzurita
Copy link
Contributor

Dear @iuriemaxim

After some internal discussion, we reached the conclusion that the test is correct as it is. Is true that the requirement C.15 is not being met completely, but to be able to distinguish a value from a controlled vocabulary from a custom vocabulary, we would need to be able to control the semantics of the keyword, which we can't at the moment.

The same applies to the second part of your issue: to be able to know if the keyword is relevant, we would need to know the semantic of the keyword, and we do not have available right now a list of possible values.

@carlospzurita carlospzurita added this to the v2021.0 milestone Nov 3, 2020
@MichaelOstling
Copy link

Will this be updated in production environment before Dec 16th ?
We have Network service records that now validate properly in the staging validator
http://staging-inspire-validator.eu-west-1.elasticbeanstalk.com/etf-webapp

But not in the production server
http://inspire.ec.europa.eu/validator/about/

eg
https://www.geodata.se/geodataportalen/srv/eng/csw-inspire?request=GetRecordById&service=CSW&version=2.0.2&elementSetName=full&outputSchema=csw:IsoRecord&id=23d3d9a1-baa5-4155-83d5-bcd761de9587

Since the rule is not correct implemented in production environment
all Network services in Sweden will fail in validation unless this is updated.

@MarcoMinghini
Copy link
Contributor

Dear @MichaelOstling,
as explained in the release plan of the INSPIRE Reference Validator, which was published in February 2020, the release of the Validator used for the 2020 Monitoring and Reporting is the September one (v2020.3). This was announced already in the 61st MIG-T meeting in March as well as in all the following MIG and MIG-T meetings.

As a reminder, one of the requests from Member States after the 2019 Monitoring and Reporting was to have a more transparent and clear release plan for the INSPIRE tools used in the Monitoring and Reporting (i.e. the Geoportal and the Validator), with the request to have the tools ready 3 months before the deadline.

According to the release schedule, all the changes which are currently available in the staging instance will be included in production with the next release v2021.0 foreseen for mid-January 2021.

@AntoRot
Copy link
Contributor Author

AntoRot commented Dec 10, 2020

Dear @MarcoMinghini,

I agree that the release schedule shall be followed. But in this case, the bug in the validator, unfortunately noticed after the latest release, risks affecting the monitoring results altough the metadata records comply to the INSPIRE Regulation and TG Requirements. Consequently, I think that an exception would need.

Unless in the monitoring results there will be a declaration that the metadata records fail due a validator bug and not to the missing conformity to the INSPIRE requirements.

@MichaelOstling
Copy link

Dear @MarcoMinghini,

I agree on comment from Antonio ( @AntoRot )
We can not change our metadata to comply with invalid rules in validator.
I think the monitoring then must then remove this rule from monitoring results.

/Michael

@iuriemaxim
Copy link

iuriemaxim commented Dec 10, 2020

@AntoRot As I already explained, now the validator on the staging environment has a bug as it is not checking the Requirement C.15.
Unfortunately those metadata do not comply with Requirement C.15 as the keywords used are present in the INSPIRE Registry. So even if the validator in production is providing a wrong error, still the MD files have errors in relation with the test that is performed, just that the error indicated is not the one that should be indicated.

@carlospzurita please keep this issue opened as it is not fixed yet. I supose that is not a constructive option to open another bug related to C.15 applied for all keywords that exist in the INSPIRE Registry. Technically it is possible to check all keywords and for those that are present in the INSPIRE Registry an error should be triggered if INSPIRE Registry is not mentioned as the vocabulary from which thre keyword was taken.
Another option is to remove the Requirement C.15.

@carlospzurita
Copy link
Contributor

We have added a fix on this issues for requirement 3.4, as the values for CharacterString was not checked against the registry values. Now it is checked in the same way that Anchors are tested.

However, we can't rely on any thesaurus that the user may defined, because C.15 states that the multiplicity of the Citation element is zero or more, and so you may reference codelists values without adding a citation element.

@iuriemaxim
Copy link

iuriemaxim commented Jan 21, 2021

@carlospzurita The fix is not ok if I understand well the comment. Of course that the multiplicity is zero or more for the citation element. Zero could be for the case that a keyword is not found in any vocabulary (i.e. many keywords in Romanian language will not be found in any vocabulary, unless the country will create those vocabularies or if in the INSPIRE Registry will be added translations in all EU languages).

But of course that for the keyword describing the Spatial Data Service Category this is not the case, as all those keywords are and should be present in vocabularies. And of course they are present in the INSPIRE Registry.

As I do not see why the implementation is not done correctly, and there are so many missinterpretations of C.15 requirement, you may see that the current fix contradicts the Commission Regulation 1205/2008, so please consider implementing a fix that does not contradicts the EU legislation.

image

As I understand, now the keywords that are indicating the Spatial Data Service category are checked against the INSPIRE registry values, so in this case it is clear that the keyword is found in a vocabulary. Therefore, due to the metadata implementing rules stated in the Commsion Regulation 1205/2008, the data provider should include the citation of the controlled vocabulary and the citation shall include at least the title and a reference date.

IR 1205/2008:
<<If the keyword value originates from a controlled vocabulary (thesaurus, ontology), for example GEMET, the citation of the originating controlled vocabulary shall be provided.
This citation shall include at least the title and a reference date
(date of publication, date of last revision or of creation) of the originating controlled vocabulary.
>>

And now maybe is more clear why the multiplicity is zero or more. "Zero" is because the sentence in the Implementing Rule starts with "If". "More" is because "more" keywords can originate from the same controlled vocabulary. So it is not the case to provide again and again the same vocabulary for each keyword, as it is possible to provide only one element with all keywords that are from the same vocabulary. And the vocabulary is provided only once. In this case there is a one-to-many relationship between vocabulary and keyword elements (one vocabulary - many keywords).

@carlospzurita please provide a fix according to the EU legislation. Of course that the check is not expected to be done against any vocabulary as they cant be known if they are not declared or registered somewere (however please note that the Commission Regulation is giving GEMET as an example). But validation against the INSPIRE Registry can be done and was already implemted trough the fix. Therefore now if is necessary to check that for those specific keywords that are found in the INSPIRE Registry the citation is included.

@carlospzurita carlospzurita modified the milestones: v2021.0, v2021.1 Feb 9, 2021
@carlospzurita
Copy link
Contributor

Dear @iuriemaxim

We understand your point on this, but we have no consistent way to enforce requirement C15 and 3.4 in the way that, if the keyword is coming from a vocabulary there is a corresponding citation for that originating thesaurus. We can check the value of the keyword on 3.4 against the INSPIRE registry because the possible values are already restricted by the requirement itself. But that should not invalidate the implementation that is already done on the C15.

Following your proposal, if we add a check for a citation element referencing the INSPIRE registry for every keyword declared, what would be the result in the case that the keyword does not come from INSPIRE? To mark it as failed, we would need to make sure to check every known vocabulary looking for that keyword to know if they are coming from them or not. That is not practically feasible in the validator.

We think that this is the rationale behind this relaxed implementation of requirement C15, and for now we do not foresee to change it.

Dear @AntoRot

Can you please check the current implementation on the staging instance of requirement 3.4 and share your feedback on the published fix? Thank you

@AntoRot
Copy link
Contributor Author

AntoRot commented Feb 10, 2021

Dear @carlospzurita,

Now the validation is fine with me as no errors is raised in case gco:CharacterString element is used to provide the Spatial Data Service category keyword.

Thank you very much,
Antonio

@carlospzurita carlospzurita added solved Solution developed and accepted, not yet deployed and removed ready for testing Solution provided to reporter or developed & deployed in staging (or beta), waiting for testing labels Feb 24, 2021
@carlospzurita
Copy link
Contributor

We are marking this issue as solved. It will be deployed on the next release in the reference validator.

@iuriemaxim
Copy link

iuriemaxim commented Feb 24, 2021

Following your proposal, if we add a check for a citation element referencing the INSPIRE registry for every keyword declared, what would be the result in the case that the keyword does not come from INSPIRE? To mark it as failed, we would need to make sure to check every known vocabulary looking for that keyword to know if they are coming from them or not. That is not practically feasible in the validator.

It is really not a big technical issue to check all keywords for which no vocabulary is given.
Firstly, they are just a few in most of the cases, as for most keywords a vocabulary is provided and even it is cheked. Secondly to check 100 keywords in the file against 10000 keywords in a registry (cached) is less than a second.
And third to check at leaset for those keywords that should be provided is limiting the keywords to less than 100 expected keywords that should be cheched against less than 100 keywords in the registry.

Users are obliged to provide one of these 78 keywords and they can be validated against the registry at this endpoint

https://inspire.ec.europa.eu/metadata-codelist/SpatialDataServiceCategory

If the none of those keywords are present an error is triggered. If more than one such keyword is present in the file, an error should be triggered. If for that keyword is not provided the vocabulary an error should be triggered,

And if you dont want to do the validation against all the provided keywords, that I may understabd, at least the validation should be made against these 78 keywords, because there is an obligation to provide one of them and only one in the file and for that one a dictionary should be provided.

Therefore I would say that that it should be fixed or if not a new issue should be opened.

@iuriemaxim
Copy link

@carlospzurita I made few updates in the text above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed in reference validator Solution deployed in production
Projects
None yet
Development

No branches or pull requests

6 participants