Sassone is an XML SAX parser and encoder in Elixir that focuses on speed, usability and standard compliance.
Sassone was born as a fork of the great saxy library to fix bugs, address some limitations with XML standards compliance and add some missing features like namespaces and mapping to structs.
Comply with Extensible Markup Language (XML) 1.0 (Fifth Edition).
- An incredibly fast XML 1.0 SAX parser.
- An extremely fast XML encoder.
- Native support for streaming parsing large XML files.
- Support for automatically building and parsing XML with structs.
Add :sassone
to your mix.exs
.
def deps() do
[
{:sassone, "~> 1.0"}
]
end
Full documentation is available on HexDocs.
If you never work with a SAX parser before, please check out this guide.
A SAX event handler implementation is required before starting parsing.
defmodule MyEventHandler do
@behaviour Sassone.Handler
@impl Sassone.Handler
def handle_event(:start_document, prolog, state) do
IO.inspect("Start parsing document")
{:ok, [{:start_document, prolog} | state]}
end
@impl Sassone.Handler
def handle_event(:end_document, _data, state) do
IO.inspect("Finish parsing document")
{:ok, [{:end_document} | state]}
end
@impl Sassone.Handler
def handle_event(:start_element, {namespace, name, attributes}, state) do
IO.inspect("Start parsing element #{namespace}:#{name} with attributes #{inspect(attributes)}")
{:ok, [{:start_element, {namespace, name, attributes}} | state]}
end
@impl Sassone.Handler
def handle_event(:end_element, {namespave, name}, state) do
IO.inspect("Finish parsing element #{namespace}:#{name}")
{:ok, [{:end_element, {namespace, name}} | state]}
end
@impl Sassone.Handler
def handle_event(:characters, chars, state) do
IO.inspect("Receive characters #{chars}")
{:ok, [{:characters, chars} | state]}
end
@impl Sassone.Handler
def handle_event(:cdata, cdata, state) do
IO.inspect("Receive CData #{cdata}")
{:ok, [{:cdata, cdata} | state]}
end
end
Then start parsing XML documents with:
iex> xml = "<?xml version='1.0' ?><foo bar='value'></foo>"
iex> Sassone.parse_string(xml, MyEventHandler, [])
{:ok,
[{:end_document},
{:end_element, {nil, "foo"}},
{:start_element, {nil, "foo"}, [{nil, "bar", "value"}]},
{:start_document, [version: "1.0"]}]}
Sassone also accepts file stream as the input:
File.stream!("/path/to/file")
|> Sassone.parse_stream(MyEventHandler, initial_state)
It even supports parsing a normal stream.
File.stream!("/path/to/file")
|> Stream.filter(&(&1 != "\n"))
|> Sassone.parse_stream(MyEventHandler, initial_state)
Sassone can parse an XML document partially. This feature is useful when the document cannot be turned into a stream e.g receiving over socket.
{:ok, partial} = Partial.new(MyEventHandler, initial_state)
{:cont, partial} = Partial.parse(partial, "<foo>")
{:cont, partial} = Partial.parse(partial, "<bar></bar>")
{:cont, partial} = Partial.parse(partial, "</foo>")
{:ok, state} = Partial.terminate(partial)
Use Sassone.XML
to build and compose XML simple form, then Sassone.encode!/2
to encode the built element into XML binary.
iex> import Sassone.XML
iex> element = element("person", [gender: "female"], [characters("Alice")])
{nil, "person", [{"gender", "female"}], [{:characters, "Alice"}]}
iex> Sassone.encode!(element, [])
"<?xml version=\"1.0\" encoding=\"utf-8\"?><person gender=\"female\">Alice</person>"
See Sassone.XML
for the full XML building API documentation.
You can derive or implement Sassone.Builder
for your structs to
automatically generate the parsers and builders for them.
defmodule Person do
@derive {
Sassone.Builder,
root_element: "person",
fields: [gender: [type: :attribute], name: [type: :content]
}
defstruct [:gender, :name]
end
To generate an XML document for your struct by calling:
iex> Sassone.Builder.build(%Person{gender: "female", name: "Alice"}) |> Sassone.encode!()
"<?xml version=\"1.0\" encoding=\"utf-8\"?><person gender=\"female\">Alice</person>"
And you can now parse an XML document and obtain a map by calling:
iex> {:ok, {struct, map}} = Sassone.parse_string(data, Sassone.Builder.handler(%Person{}), nil)
{:ok, {Person, %{gender: "female", name: "Alice"}}}
You can then use the map to create the struct you need:
iex> struct(struct, map)
%Person{gender: "female", name: "Alice"}
In case of deeply nested data or custom data types, this can prove difficult. In that case, you
can use a library to handle the conversion to struct. Ecto
with embedded schemas is great to
cast and validate data.
For example, assuming you defined Person
as an embedded Ecto
schema with a changeset/2
function:
defmodule Person do
@derive {
Sassone.Builder,
root_element: "person",
fields: [gender: [type: :attribute], name: [type: :content]
}
embedded_schema do
field :gender
field :name
end
def changeset(person, params) do
person
|> cast([:gender, :name)
end
end
iex> struct.changeset(struct(schema), map) |> Ecto.Changeset.apply_action(:cast)
{:ok, %Person{gender: "female", name: "Alice"}}
See Sassone.Builder
for the full Builder API documentation.
Sassone in its core is a SAX parser, therefore Sassone does not, and likely will not, offer any XPath functionality.
SweetXml is a wonderful library to work with XPath. However,
:xmerl
, the library used by SweetXml, is not always memory efficient and
speedy. You can combine the best of both sides with Saxmerl, which
is a Saxy extension converting XML documents into SweetXml compatible format.
Please check that library out for more information.
Sassone is an italian word with two different meanings, depending how you pronounce it:
Sàssone
is the equivalent of the english word Saxon, a member of a people that inhabited parts of central and northern Germany from Roman times, many of whom conquered and settled in much of southern England in the 5th–6th centuries.Sassòne
is a big rock (sasso
in italian). e.g."Va che bel sassone!"
roughly translates to"What a nice big rock!"
in english.
Note that benchmarking XML parsers is difficult and highly depends on the complexity of the documents being parsed. Event I try hard to make the benchmarking suite fair but it's hard to avoid biases when choosing the documents to benchmark against.
Therefore the conclusion in this section is only for reference purpose. Please feel free to benchmark against your target documents. The benchmark suite can be found in bench/.
A rule of thumb is that we should compare apple to apple. Some XML parsers target only specific types of XML. Therefore some indicators are provided in the test suite to let know of the fairness of the benchmark results.
Some quick and biased conclusions from the benchmark suite:
- For SAX parser, Sassone is usually 1.4 times faster than Erlsom. With deeply nested documents, Sassone is noticeably faster (4 times faster).
- For XML builder and encoding, Sassone is usually 10 to 30 times faster than XML Builder. With deeply nested documents, it could be 180 times faster.
- Sassone significantly uses less memory than XML Builder (4 times to 25 times).
- Sassone significantly uses less memory than Xmerl, Erlsom and Exomler (1.4 times to 10 times).
- No XSD supported.
- No DTD supported, when Sassone encounters a
<!DOCTYPE
, it skips that. - Only support UTF-8 encoding.
If you have any issues or ideas, feel free to write to https://github.com/sibill-it/sassone/issues.
To start developing:
- Fork the repository.
- Write your code and related tests.
- Create a pull request at https://github.com/sibill-it/sassone/pulls.
Copyright (c) 2018-2024 Cẩm Huỳnh Copyright (c) 2024 Luca Corti
This software is licensed under the MIT license.