Skip to content

proposal: encoding/xml: reject ill-formed XML #68299

Open
@DemiMarie

Description

@DemiMarie

Proposal Details

encoding/xml has multiple problems:

  1. It does not check for XML well-formedness constraints: encoding/xml: allows a colon before or after an XML name #68294, encoding/xml: does not reject duplicate attribute names #68295
  2. It does not check for XML namespace constraints: encoding/xml: does not check namespace constraints that do not require keeping extra state #68296, encoding/xml: does not reject multiple attributes with the same namespace URL and local part #68297
  3. Its handling of XML namespaces is known to be buggy.

This proposal covers all issues linked by #68293. These can all be fixed internally to encoding/xml, without changes to the API. However, there will be new API on xml.Decoder:

const (
	AllowLeadingColons = 1 << iota
	AllowTrailingColons
	AllowDuplicateAttributes
	// possibly other flags
)
	
/*
	Sets whether the parser allows ill-formed XML.
	Prior to 1.22, the parser always allowed ill-formed XML.
	Starting in 1.23, ill-formed XML is not allowed by default,
	but it can be re-enabled by calling decoder.AllowIllFormed(-1).
	The Allow* flags can be used for more fine-grained control.
*/
func (d *xml.Decoder) AllowIllFormed(flags int64)

and a GODEBUG flag allow-ill-formed-xml=<bitmask> for course-grained global control.

In the future, xml.Decoder will reject ill-formed XML. If it is found to accept ill-formed XML, this will be considered a bug and fixed, with a new flag so that applications can opt-in to the old behavior.

Debug flags for encoding/xml may be removed with not less than two major versions notice.

My understanding is that this change is enough to guarantee round-trip stability for RawToken users (not for Token users). I believe that is possible to implement a namespace-aware, round-trip-stable parser correctly on top of the RawToken API, but the standard library does not currently implement such a parser.

This is essentially “merge #48641 + debug flags”. this will require additional changes beyond this PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Incoming

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions