-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support different (older) PAGE namespaces #14
Comments
That's the question. If we have a required version attribute (which I think we should), it's a non-compatible change. In that case I'd say use the 2019 namespace. |
The idea was to give it a default value |
But old applications will not work. They see a 2018 XML file, use the old 2018 schema to validate, and that fails because of the version attribute or any other new attribute/element. |
Applications need to update anyway. But documents can stay as they are –
|
Yes, applications need updating, but there are many copies out there already that would break. At the moment they would say "not supported" for newer versions (namespaces). I really think it has to be 2019-07-15 and from now on we have the option for minor versions. I know this is inconvenient |
Sorry, but you still do not have me convinced this is necessary at all. If we adopt |
No, the difference is that, with a new namespace, old applications know they don't support that XML file. If we don't change the namespace, old applications think they support new files, but they don't. |
True. I understand now. You want them to say "unsupported" und not "PcGts/@schemaVersion is invalid". |
Yes, that's one scenario. Other tools (and I admit we might have some of those) might silently remove the new things when saving an XML file with the old namespace. |
Good point! It's better to be very careful here. I don't think it's that much of a problem to once again have everyone move to a new namespace. |
Just had a brief discussion with Stefan (Mr PAGE) and he had some concerns with the change. |
Sorry, I do not know enough about our |
generateDS is a code generation tool we use to create an API to PAGE from the schema. We ship only one version of that generated code, based on the latest schema. Currently, this means breaking backwards-compatibility when a new version is released because the namespace change and code won't work with documents with older namespaces and cannot generate documents with only the new namespace. Now, we could version our code generation and devise some sort of selection mechanism based on the data. But then we'd also need to upgrade documents dynamically. E.g. adding Another reason for non-namespace versioning are XSLT scripts. We use a lot of those, most of them using only features that have been part of PAGE-XML for years. Here we employ different workarounds to support older versions, like using |
I started a document with some of the mentioned points. Feel free to add. |
The Google Doc states No changes in existing software required (with regard to XML handling) as a pro for the current approach. However,
I.e., <?xml version="1.0" encoding="UTF-8"?>
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15">
<Metadata>
<Creator>OCR-D/core 1.0.0b11</Creator>
<Created>2019-07-31T10:44:05.838637</Created>
<LastChange>2019-07-31T10:44:05.838637</LastChange>
</Metadata>
<Page imageFilename="https://digital.slub-dresden.de/data/kitodo/Brsfded_39946221X-18750125/Brsfded_39946221X-18750125_tif/jpegs/00000001.tif.original.jpg" imageWidth="1992" imageHeight="2450">
<Border>
<Coords points="83,89 1917,89 1917,2361 83,2361"/>
</Border>
</Page>
</PcGts> |
Yes, by that I meant we don't have to change the XML handling procedure. New versions of the XML still have to be supported by the readers and writers. But no changes are required in terms of handling the version number, the storage of the schema files (online and offline) etc. |
@chris1010010 your draft already captures all the issues at hand IMV. I have 3 comments though: I would like to point out that on the downsides of the current approach, it is not necessarily only software which needs to be updated when a new release comes out, but (under certain circumstances) also the data (PAGE instances) themselves. That is because certain kinds of PAGE-processing software (like Moreover, your last point on the downsides of the proposed approach, on that changes in the schema would now have to be examined for backward compatibility, I don't think this can be counted as a downside at all. This is really a question of who carries the burden: Having a conscious, consensuous decision once per release about whether or not the changes break existing semantics, which is then made totally visible by increase in either version or namespace, is actually an upside for implementors and data providers. It might be considered a minor downside for standardizers, but usually they will be implementors themselves, so they have that burden already! Regarding a revised namespace URI hosting scheme (i.e. namespace document), I don't think there is much of an industry standard in that area. There is a good discussion in this XMLVS proposal BTW. |
Updated the Google Doc. I'll talk to Stefan |
We discussed this again quite lengthily and came to a conclusion that we would like to keep the current versioning scheme. We really appreciate the interesting discussion and we understand the frustration this might cause. This is our reasoning:
We don't intend to make many additions ourselves (most recent additions cam from OCR-D side). So hopefully we can provide a stable format that works for most users.
Moving XMLs to a new namespace should be straightforward via a stylesheet, see for example: https://stackoverflow.com/questions/46533579/copying-elements-to-a-new-namespace-with-xslt |
Copied from github.com/OCR-D/core/issues/67
The text was updated successfully, but these errors were encountered: