Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

defusedxml & lxml - thoughts #221

Closed
c00kiemon5ter opened this issue Mar 4, 2018 · 4 comments
Closed

defusedxml & lxml - thoughts #221

c00kiemon5ter opened this issue Mar 4, 2018 · 4 comments

Comments

@c00kiemon5ter
Copy link

Hi there,

I am opening this as a discussion around lxml and defusedxml and their relation. Feel free to close it at anytime. It seems that many projects are using defusedxml but the project itself seems to be unresponsive for a year now.

In relation to lxml, it states that the defusedxml.lxml module is an example. This is one thing to keep in mind.

This is combined with the other comment:

Additionally the package has an untested function to monkey patch all stdlib modules with defusedxml.defuse_stdlib().

The other thing that I see (also in relation to the duo vulnerability and the fix) is that lxml has a parser option to discard comments, namely remove_comments. I would feel much better with that than watching out for .text related operations and treating them specially. This is what defusedxml does, it replaces the default parser, but unfortunately it omits that option.

Finally, requesting defusedxml does not request lxml, but lxml is used throughout this project. Atm, lxml is installed through dm.xmlsec.binding, but I would add it as a dependency for this project explicitly.

@c00kiemon5ter
Copy link
Author

c00kiemon5ter commented Mar 4, 2018

Also funny that the README states on the dependencies section:

  • defusedxml XML bomb protection for Python stdlib modules

while the defusedxml.lxml module says The code has NO protection against decompression bombs.

@pitbulk
Copy link
Contributor

pitbulk commented Mar 4, 2018

Hi @c00kiemon5ter

I saw it was adopted by pysaml2 and you included it there too last year.

I remember I started to use it time ago, when someone reported XEE issues on php-saml ...then I searched a solution for python... and found that library, to be honest I saw its readme, tried some attack vectors and review its code for a while, but I have not reviewed it deeper.

Are you guys planing to use something different?

Related to the sentence of the description of defusexml, "XML bomb protection for Python stdlib modules", is its description on pypi

The fix for the vulnerability was a patch to solve the issue asap. I plan to analyze if a redesign on the toolkit is possible following leif recommendations.

@c00kiemon5ter
Copy link
Author

No plans yet, though, I will try to explain what I am thinking.

It is my opinion that every library that one's project is using, is effectively the project's code (even if it happens through indirection - a dependency). At this point, I am very skeptical of whether the way the parser works should be delegated to an external library, or if it should be configured by the project to meet the project's specific needs. I am leaning to the latter, which effectively means that we should define a module that explicitly defines the way that the xml parser works. This would be duplicating some of the things that defusedxml is doing now, but the settings would be under the project's control and aligned with the project's needs.

To go further, I would define a module that represents the actions on xml documents that I need, and proxy those to the selected xml implementation (lxml, cElementTree, etc), effectively defining an API that fits the project's needs, but also making it easy to transition between different implementations.

btw, pysaml2 uses xml.etree.cElementTree (and fallbacks to xml.etree.ElementTree). It does not use lxml, and that is why it is not affected - the xml parsers provided by default strip comments.

I went through many projects the last couple of days, to see how things in python and xml have evolved, and I am not particularly excited. I will try to find time to try some things out and try to report back with more information.

@pitbulk
Copy link
Contributor

pitbulk commented Mar 4, 2018

Thanks for taking the time for such research.

pitbulk added a commit that referenced this issue Jan 7, 2021
…defused. Parser will ignore comments and processing instructions and by default have deactivated huge_tree, DTD and access to external documents
pitbulk added a commit that referenced this issue Jan 12, 2021
See #221 and #267. Custom lxml parser based on the one defined at xmldefused
@pitbulk pitbulk closed this as completed Dec 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants