Skip to content

sibill-it/sassone

Repository files navigation

Sassone

Test suite Module Version

Sassone is an XML SAX parser and encoder in Elixir that focuses on speed, usability and standard compliance.

Sassone was born as a fork of the great saxy library to fix bugs, address some limitations with XML standards compliance and add some missing features like namespaces and mapping to structs.

Comply with Extensible Markup Language (XML) 1.0 (Fifth Edition).

Features highlight

  • An incredibly fast XML 1.0 SAX parser.
  • An extremely fast XML encoder.
  • Native support for streaming parsing large XML files.
  • Support for automatically building and parsing XML with structs.

Installation

Add :sassone to your mix.exs.

def deps() do
  [
    {:sassone, "~> 1.0"}
  ]
end

Overview

Full documentation is available on HexDocs.

If you never work with a SAX parser before, please check out this guide.

SAX parser

A SAX event handler implementation is required before starting parsing.

defmodule MyEventHandler do
  @behaviour Sassone.Handler

  @impl Sassone.Handler
  def handle_event(:start_document, prolog, state) do
    IO.inspect("Start parsing document")
    {:ok, [{:start_document, prolog} | state]}
  end

  @impl Sassone.Handler
  def handle_event(:end_document, _data, state) do
    IO.inspect("Finish parsing document")
    {:ok, [{:end_document} | state]}
  end

  @impl Sassone.Handler
  def handle_event(:start_element, {namespace, name, attributes}, state) do
    IO.inspect("Start parsing element #{namespace}:#{name} with attributes #{inspect(attributes)}")
    {:ok, [{:start_element, {namespace, name, attributes}} | state]}
  end

  @impl Sassone.Handler
  def handle_event(:end_element, {namespave, name}, state) do
    IO.inspect("Finish parsing element #{namespace}:#{name}")
    {:ok, [{:end_element, {namespace, name}} | state]}
  end

  @impl Sassone.Handler
  def handle_event(:characters, chars, state) do
    IO.inspect("Receive characters #{chars}")
    {:ok, [{:characters, chars} | state]}
  end

  @impl Sassone.Handler
  def handle_event(:cdata, cdata, state) do
    IO.inspect("Receive CData #{cdata}")
    {:ok, [{:cdata, cdata} | state]}
  end
end

Then start parsing XML documents with:

iex> xml = "<?xml version='1.0' ?><foo bar='value'></foo>"
iex> Sassone.parse_string(xml, MyEventHandler, [])
{:ok,
 [{:end_document},
  {:end_element, {nil, "foo"}},
  {:start_element, {nil, "foo"}, [{nil, "bar", "value"}]},
  {:start_document, [version: "1.0"]}]}

Streaming parser

Sassone also accepts file stream as the input:

File.stream!("/path/to/file")
|> Sassone.parse_stream(MyEventHandler, initial_state)

It even supports parsing a normal stream.

File.stream!("/path/to/file")
|> Stream.filter(&(&1 != "\n"))
|> Sassone.parse_stream(MyEventHandler, initial_state)

Partial parsing

Sassone can parse an XML document partially. This feature is useful when the document cannot be turned into a stream e.g receiving over socket.

{:ok, partial} = Partial.new(MyEventHandler, initial_state)
{:cont, partial} = Partial.parse(partial, "<foo>")
{:cont, partial} = Partial.parse(partial, "<bar></bar>")
{:cont, partial} = Partial.parse(partial, "</foo>")
{:ok, state} = Partial.terminate(partial)

Generate XML

Use Sassone.XML to build and compose XML simple form, then Sassone.encode!/2 to encode the built element into XML binary.

iex> import Sassone.XML
iex> element = element("person", [gender: "female"], [characters("Alice")])
{nil, "person", [{"gender", "female"}], [{:characters, "Alice"}]}
iex> Sassone.encode!(element, [])
"<?xml version=\"1.0\" encoding=\"utf-8\"?><person gender=\"female\">Alice</person>"

See Sassone.XML for the full XML building API documentation.

Struct driven XML parsing and generation

You can derive or implement Sassone.Builder for your structs to automatically generate the parsers and builders for them.

defmodule Person do
  @derive {
    Sassone.Builder,
    root_element: "person",
    fields: [gender: [type: :attribute], name: [type: :content]
  }
  defstruct [:gender, :name]
end

To generate an XML document for your struct by calling:

iex> Sassone.Builder.build(%Person{gender: "female", name: "Alice"}) |> Sassone.encode!()
"<?xml version=\"1.0\" encoding=\"utf-8\"?><person gender=\"female\">Alice</person>"

And you can now parse an XML document and obtain a map by calling:

iex> {:ok, {struct, map}} = Sassone.parse_string(data, Sassone.Builder.handler(%Person{}), nil)
{:ok, {Person, %{gender: "female", name: "Alice"}}}

You can then use the map to create the struct you need:

iex> struct(struct, map)
%Person{gender: "female", name: "Alice"}

In case of deeply nested data or custom data types, this can prove difficult. In that case, you can use a library to handle the conversion to struct. Ecto with embedded schemas is great to cast and validate data.

For example, assuming you defined Person as an embedded Ecto schema with a changeset/2 function:

defmodule Person do
  @derive {
    Sassone.Builder,
    root_element: "person",
    fields: [gender: [type: :attribute], name: [type: :content]
  }
  embedded_schema do
    field :gender
    field :name
  end

  def changeset(person, params) do
    person
    |> cast([:gender, :name)
  end
end
iex> struct.changeset(struct(schema), map) |> Ecto.Changeset.apply_action(:cast)
{:ok, %Person{gender: "female", name: "Alice"}}

See Sassone.Builder for the full Builder API documentation.

FAQs with Sassone/XMLs

Does Sassone work with XPath?

Sassone in its core is a SAX parser, therefore Sassone does not, and likely will not, offer any XPath functionality.

SweetXml is a wonderful library to work with XPath. However, :xmerl, the library used by SweetXml, is not always memory efficient and speedy. You can combine the best of both sides with Saxmerl, which is a Saxy extension converting XML documents into SweetXml compatible format. Please check that library out for more information.

Sassone! Where did the name come from?

Sassone is an italian word with two different meanings, depending how you pronounce it:

  1. Sàssone is the equivalent of the english word Saxon, a member of a people that inhabited parts of central and northern Germany from Roman times, many of whom conquered and settled in much of southern England in the 5th–6th centuries.
  2. Sassòne is a big rock (sasso in italian). e.g. "Va che bel sassone!" roughly translates to "What a nice big rock!" in english.

Benchmarking

Note that benchmarking XML parsers is difficult and highly depends on the complexity of the documents being parsed. Event I try hard to make the benchmarking suite fair but it's hard to avoid biases when choosing the documents to benchmark against.

Therefore the conclusion in this section is only for reference purpose. Please feel free to benchmark against your target documents. The benchmark suite can be found in bench/.

A rule of thumb is that we should compare apple to apple. Some XML parsers target only specific types of XML. Therefore some indicators are provided in the test suite to let know of the fairness of the benchmark results.

Some quick and biased conclusions from the benchmark suite:

  • For SAX parser, Sassone is usually 1.4 times faster than Erlsom. With deeply nested documents, Sassone is noticeably faster (4 times faster).
  • For XML builder and encoding, Sassone is usually 10 to 30 times faster than XML Builder. With deeply nested documents, it could be 180 times faster.
  • Sassone significantly uses less memory than XML Builder (4 times to 25 times).
  • Sassone significantly uses less memory than Xmerl, Erlsom and Exomler (1.4 times to 10 times).

Limitations

  • No XSD supported.
  • No DTD supported, when Sassone encounters a <!DOCTYPE, it skips that.
  • Only support UTF-8 encoding.

Contributing

If you have any issues or ideas, feel free to write to https://github.com/sibill-it/sassone/issues.

To start developing:

  1. Fork the repository.
  2. Write your code and related tests.
  3. Create a pull request at https://github.com/sibill-it/sassone/pulls.

Copyright and License

Copyright (c) 2018-2024 Cẩm Huỳnh Copyright (c) 2024 Luca Corti

This software is licensed under the MIT license.

About

Fast SAX parser and encoder for XML in Elixir

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages