-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack overflow resulting from large maxOccurs values #31
Comments
This is correct, we could do so but that also means you deviate from the EXI specification which requires you do build all 998 grammars (which are the same) and the last one which is different. Did you try to increase the |
I did try to increase the stack size (setting it to 20M, which I believe is 20x the default). This simply made it take longer before overflowing. I was unaware that the EXI spec mandated building all of the grammars. Would it be possible to simply reuse the same grammar for the first 998 and then create 1 more for the last one, or would that be considered breaking from the spec as well? |
One may argue it is an implementation issue. I am pretty sure one can also implement it differently how grammars work in EXIficient. In its current fashion grammars may point to other grammar states without counting how often a given element has been seen before. Comment: You could also consider using a slightly different schema without Moreover, I don't think it is good practice to use such high limits. Let's look at Xerces which is maybe the most prominent XML validator in the Java World. My Oxygen IDE is similar.. Note: I looked at https://github.com/DFDLSchemas/NITF and the schemas you provided. They are not complete.. I think (see file |
@danielpeintner the missing JPEG schema is at https://github.com/DFDLSchemas/JPEG. (Lots of our schemas are layered like this container/payload idiom where one schema includes/imports numerous others that come from completely separate projects.) Re: unbounded vs. 999 or other finite limit. In the cyber security world, our products must statisfy certifying authorities who use tools that inspect XML schemas and reject anything unbounded in XML schemas as a potential security vulnerability to a denial-of-service attack. For example, for https://github.com/DFDLSchemas/mil-std-2045 schema, there are 7 uses of "unbounded" and we are required to remove them all. In the short term it would be perfectly fine if Exificient had a threshold where above a certain limit, the maxOccurs would be treated by Exificient, as if it were 'unbounded'. I don't know the code base, but this seems like a relatively small change to add along with a way to turn this behavior on. Does this sound feasible to you? |
Unfortunately this has been turned out to be an issue and was (is?) on the "ideas" list of an updated EXI version. Whether there will be ever a new version I cannot say.
Technically it is not a big issue. In fact it could be easily done by replacing any "larger number" with "unbounded" when reading XML schema files. Note: There is still the very low likelihood that an encoding with maxOccurs="1000" (changed to |
@danielpeintner can you explain this statement just a bit further: Note: There is still the very low likelihood that an encoding with maxOccurs="1000" (changed to unbounded) differs in the case there are actually 1000 appearances.. I looked at that w3 page, and that discussion does not mention such an issue, or I didn't see it. It suggests that treating all maxOccurs > 1 as 'unbounded' is plausible. |
Let me use an XML example element (of an XML schema) to describe the differences.
The EXI spec states clearly that for each state there is a grammar
Note: The last grammar g999 is different since it ONLY allows for EE (EndElement) and there is no more SE (StartElement) for Let's now change the example from maxOccurs
Now the set of grammars is different and very simple (it always points to itself).
Coming back to the issue I described. Since the LAST grammar g999 looks slightly different it also results in a different EXI encoding. Having said that.. this is ONLY the case for the last grammar... all other grammars .. like g998 look exactly the same and result in the same encoding. I hope this clarifies the issue I was pointing at. |
Very excellent explanation. Thanks. |
I am aware of this old ticket on Sourceforge: https://sourceforge.net/p/exificient/bugs/31/
You're suggested work around of setting the maxOccurs to be unbounded will not work for our application as the schemas we are creating are used in a secure environment and need to be as restrictive as possible. Also even significant increases to the stack size did not prevent the overflow from occurring.
From an outsider's perspective it would seem that a relatively easy fix for this would be to treat maxOccurs values above some threshold (5 or 10?) as unbounded internally.
An example of one of our schemas that triggers this error can be found here: https://github.com/DFDLSchemas/NITF
It has an element with a maxOccurs="999".
The text was updated successfully, but these errors were encountered: