Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline DTD allows XML bomb attack #71

Closed
nlevnaut opened this issue Mar 30, 2019 · 15 comments
Closed

Inline DTD allows XML bomb attack #71

nlevnaut opened this issue Mar 30, 2019 · 15 comments
Assignees

Comments

@nlevnaut
Copy link

nlevnaut commented Mar 30, 2019

This is Wiki page for the vulnerability, it is a very well known XML parser vulnerability: https://en.wikipedia.org/wiki/Billion_laughs_attack.

To replicate this issue in SweetXml, you can do the following in an iex session and watch in the observer:

xmerl_xml_bomb

4 hours later, and it's still running! Memory usage is slowly climbing past 900MB, scheduler 1 utilization hovers 70%.

I was looking into xmerl a bit to see if there's a way to disable inline DTD when using xpath before opening this issue, but I'm not familiar enough with it yet. Hoping someone else may be able to chime in. The closest thing I could find was in the release notes for xmerl 1.2.3 there's an option to turn off external DTD parsing. That sounds like that wouldn't solve this issue though, because internal DTD is the problem.

Maybe something can be done by changing the arguments passed to xmerl_xpath.string/3?

One suggestion to make this safer from @ellispritchard is to set the max heap size for the process calling xmerl functions: http://erlang.org/doc/man/erlang.html#process_flag_max_heap_size

@voltone
Copy link

voltone commented Nov 10, 2019

Using 'xmerl_scan' on XML from an untrusted source is generally a bad idea anyway: it creates atoms from XML tag and attribute names, which can bring down the VM by exhausting atom table space.

Anyway, FWIW, it is possible to block DTD entity definitions altogether by using a custom set of rules, and raising on creation of an entity:

  def parse([c | _] = doc, options) when is_integer(c) do
    ets_table = :ets.new(:sweet_xml_rules, [:set, :public])

    try do
      {parsed_doc, _} = :xmerl_scan.string(doc, [no_dtd_entities_rules(ets_table) | options])
      parsed_doc
    after
      :ets.delete(ets_table)
    end
  end

  defp no_dtd_entities_rules(ets_table) do
    {:rules, &no_dtd_entities_rules_read/3, &no_dtd_entities_rules_write/4, ets_table}
  end

  defp no_dtd_entities_rules_write(:entity, _name, _value, _state) do
    raise "DTD entities not allowed"
  end

  defp no_dtd_entities_rules_write(context, name, value, state) do
    ets_table = :xmerl_scan.rules_state(state)

    case :ets.lookup(ets_table, {context, name}) do
      [] ->
        :ets.insert(ets_table, {{context, name}, value})

      _ ->
        :ok
    end

    state
  end

  defp no_dtd_entities_rules_read(context, name, state) do
    ets_table = :xmerl_scan.rules_state(state)

    case :ets.lookup(ets_table, {context, name}) do
      [] ->
        :undefined

      [{_, value}] ->
        value
    end
  end

@axelson
Copy link
Contributor

axelson commented Apr 1, 2020

Are there any mitigations for this?

@ludwikbukowski
Copy link

hey, any update on this?

@benbarber
Copy link

Has there been any progress on this?

@Shakadak
Copy link
Member

Hi benbarber and everyone, we are discussing this internally. We'll come back to you next week.

@Shakadak
Copy link
Member

Hi everyone, here are a few vectors that I gathered to help make a decision:

  • SweetXml is first and foremost a small DSL intending to translate DOM based data into Elixir's structures, via xpath and its nesting.
    • This means we want to keep it minimal.
    • This means we don't want to do the parsing ourselves.
      • Because of this, we are limited in the scope of what we can or are willing to change.
    • This means the API is not tied to xmerl, altough the implementation currently is.
      • Thanks to this, we could in the future support a different parser, and allow the user to choose wich one to use. But I personally don't know yet how to make the extra dependencies work with the goal of keeping the lib minimal.
  • We want to keep backward compatibility as much as possible.
    • This means we won't make the current API restrictive by default.

From all that, I'm deciding that:

  • We will add options to parse/2, xpath/2, so that the user can restrict the behavior of the parser.
  • We will update the doc so that it recommends heavily to set the options for more safety.
  • We won't work on the atom exhaustion problem in the near future.
  • I personally haven't investiguated stream and stream_tags, but since it is based of :xmerl_scan, the same changes will probably be applied to them.

On a last note, it is still possible to open an issue directly for xmerl to bring changes to it, on https://bugs.erlang.org/secure/Dashboard.jspa

@Shakadak
Copy link
Member

I did not have much available time to work on it yet, but it's still present in my mind. I'll give an update next week, then try to give updates every two weeks at worst.

Shakadak added a commit that referenced this issue Mar 3, 2021
dtd option extraction
    handling dtd: [:all, :none, :internal_only, only: [allowed entities]]
more tests needed
possibly evolve options handling
@Shakadak
Copy link
Member

Shakadak commented Mar 5, 2021

Hi everyone,

I've made a first implementation in issue-71, but it still need to be documented, real-world tested, and possibly iterated upon before I'm merging this in master

@Shakadak Shakadak self-assigned this Mar 9, 2021
@Shakadak
Copy link
Member

Hi everyone, I haven't had time to work on the doc yet, but I've set up a date to do it, I'll update you next week

Shakadak added a commit that referenced this issue Mar 25, 2021
@Shakadak
Copy link
Member

Hi everyone, I'm practically done with the documentation. I haven't published a release candidate, but you can test it with:

{:sweet_xml, github: "kbrw/sweet_xml", tag: "v0.7.0-rc.1"},

I'll get a few teams to test it on our side as well

@oliver-kriska
Copy link

Hi, any news from testing?

@Shakadak
Copy link
Member

Hi oliver-kriska, I haven't had much time to allocate on this recently, so unfortunately no. That made me think about it again though. I'll get in contact with a specific team and give an update here, by next friday, on whether I could bring the new version to their project

@Shakadak
Copy link
Member

Shakadak commented May 4, 2021

Hello everyone, the update is currently running on a forked xml flux with the option dtd: :none. This is parsing files that don't use the advanced features of xml so all I can ensure is that the new option won't cause trouble when the parsed files are already safe. I'll try to see if one of our projects has some relevant flux that could make use of the other possible arguments to :dtd, but that is quite unlikely.
In the meantime, I invite you all to try parsing some file with the different possible options if you have some ready at hand :)

@jc00ke
Copy link

jc00ke commented Aug 19, 2021

I'm unclear if this should be closed by 088b3be?

@Shakadak
Copy link
Member

Hey @jc00ke, the version 0.7.0 is indeed published to hex.pm. I didn't close the issue because I wanted it to act as a reminder for Kbrw to update our project to the new version, but I suppose an automated reminder would be a better way.

So yes, the issue should be resolved, and obviously if there are problems with the update feedbacks are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants