Skip to content

ConstructingParser does not tolerate start of file whitespace #532

Open
@mbeckerle

Description

@mbeckerle

We use the constructing parser so as to get file/line/column information added to parsed XML, as well as for proper handling of CDATA regions.

However, we've encountered some things where we have had to add flexibility.

In particular we discovered that it requires the first character of an XML file to be "<" starting either an XML prolog or a comment, DTD, or the root element.

We have numerous XML files that begin with whitespace. E.g., a blank line, after which are comments, other ProcInstrs, etc.

We also have numerous XML files that begin with "<?xml" but where that is NOT an XML Prolog. As in

<?xml-model href="...." ... ?>

These things are all tolerated by standard Xerces.

So we've enhanced the constructing parser to be tolerant of these things.

Our constructing parser method overloads are all in this file:

project: https://github.com/apache/daffodil

file: daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala

I can create a PR with suggested changes, but before doing so wanted to run the whole idea past the maintainers of scala-xml. Is there a reason it should not be enhanced in this way?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions