Open
Description
I am trying to implement a Scalacheck XML generator that round trips through writing and parsing. I've run into a discrepancy between the character sets in scala-xml and the JVM internals. Is it expected that scala-xml's alphabet targets a specific version of the XML spec? I'm finding that the scala-xml alphabet does not match the JVM's idea of XML 1.0 nor XML 1.1.
I tried to make this a scala-cli script, but I can't get it to accept the com.sun.org imports. I have to run this on Java 8 (specifically, I used 1.8.0_292) to avoid trouble with the module system.
import com.sun.org.apache.xml.internal.utils.XMLChar
import com.sun.org.apache.xml.internal.utils.XML11Char
import scala.xml.Utility
object Chars extends App {
val allChars = (Char.MinValue to Char.MaxValue)
val charSets = Map(
"scala-xml-start" -> ((c: Char) => Utility.isNameStart(c)),
"xml-1.0-start" -> ((c: Char) => XMLChar.isNameStart(c)),
"xml-1.1-start" -> ((c: Char) => XML11Char.isXML11NameStart(c)),
"scala-xml" -> ((c: Char) => Utility.isNameChar(c)),
"xml-1.0" -> ((c: Char) => XMLChar.isName(c)),
"xml-1.1" -> ((c: Char) => XML11Char.isXML11Name(c)),
)
def compare(a: String, b: String) = {
val diff = allChars.filter(charSets(a)).filterNot(charSets(b))
println(s"In ${a}, not ${b}: ${diff.size}")
println(diff.take(10))
println()
}
compare("scala-xml-start", "xml-1.0-start")
compare("xml-1.0-start", "scala-xml-start")
compare("scala-xml-start", "xml-1.1-start")
compare("xml-1.1-start", "scala-xml-start")
compare("scala-xml", "xml-1.0")
compare("xml-1.0", "scala-xml")
compare("scala-xml", "xml-1.1")
compare("xml-1.1", "scala-xml")
}
scala-xml
In scala-xml-start, not xml-1.0-start: 13800
Vector(ª, µ, º, IJ, ij, Ŀ, ŀ, ʼn, ſ, DŽ)
In xml-1.0-start, not scala-xml-start: 11
Vector(ʻ, ʼ, ʽ, ʾ, ʿ, ˀ, ˁ, ՙ, ۥ, ۦ)
In scala-xml-start, not xml-1.1-start: 3
Vector(ª, µ, º)
In xml-1.1-start, not scala-xml-start: 5700
Vector(ʰ, ʱ, ʲ, ʳ, ʴ, ʵ, ʶ, ʷ, ʸ, ʹ)
In scala-xml, not xml-1.0: 14993
Vector(ª, µ, º, IJ, ij, Ŀ, ŀ, ʼn, ſ, DŽ)
In xml-1.0, not scala-xml: 4
Vector(·, , ۞, ℮)
In scala-xml, not xml-1.1: 3
Vector(ª, µ, º)
In xml-1.1, not scala-xml: 4021
Vector(˂, ˃, ˄, ˅, ˒, ˓, ˔, ˕, ˖, ˗)
I think I can limit my generators to a characters that pass both the JVM's and scala-xml's predicate, but I'm curious if this difference is known and intentional. Thanks!
Metadata
Metadata
Assignees
Labels
No labels