Skip to content

Convert ext/xml to use object instead of resource #3526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

nikic
Copy link
Member

@nikic nikic commented Sep 16, 2018

The primary motivation here is to fix https://bugs.php.net/bug.php?id=72793 and https://bugs.php.net/bug.php?id=76874, which cannot be done with a resource, as resources are not part of cycle GC.

The object is called XmlParser. An open question is what xml_parser_free() is supposed to do. Right now I've left it as essentially a no-op.

Targeting PHP 7.4 based on precedent. ext/gmp was converted in 5.6, ext/hash in 7.2, so we seem to be fine with doing these resource->object conversions in minor versions.

  • Implement proper get_gc
  • Forbid explicit construction of the object
  • Decide what to do with xml_parser_free()

@KalleZ
Copy link
Member

KalleZ commented Sep 16, 2018

Whats the motivation behind improving ext/xml over ext/xmlreader? May sound silly but I have been out of the XML loop for way too long

@cmb69
Copy link
Member

cmb69 commented Sep 16, 2018

@KalleZ Of course, we may consider to deprecate ext/xml and move it to PECL, but still it would be nice to fix the memory leakage issue beforehand.

@nikic Thanks for tackling this. I guess we should have an RFC, though, like we had for ext/gmp and ext/hash.

Regarding the failing tests: bug68912.phpt needs an updated warning message or perhaps more reasonably some other resource. schema085.phpt shouldn't hardcode the object number anyway (i.e. regardless of this PR).

@KalleZ
Copy link
Member

KalleZ commented Sep 16, 2018

@cmb69 I agree that we should fix the memory leakage, thats a no brainer but it just seems odd to introduce a class behind the scenes if there are other alternatives thats perhaps something we would want to promote more.

Is there not another way around this without changing it to use objects by continuing to re-use the resource system?

@cmb69
Copy link
Member

cmb69 commented Sep 16, 2018

@KalleZ The leak occurs if the resource references an object, and this object references that resource. The cycle can't be broken, since the cyclic GC doesn't track resources.

schema085.phpt shouldn't hardcode the object number anyway (i.e. regardless of this PR).

Fixed with 2568594.

@nikic
Copy link
Member Author

nikic commented Sep 16, 2018

@KalleZ ext/xml is a push-based streaming parser. ext/xmlreader is a pull-based non-streaming parser. They have different use-cases. If you can use ext/xmlreader, it's the easier option, but it does not cover all use-cases.

@cmb69
Copy link
Member

cmb69 commented Sep 21, 2018

Forbid explicit construction of the object

Why not make __construct() an alias of xml_parser_create() and xml_parser_create_ns(), respectively (depending on whether $separator is given or not).

Decide what to do with xml_parser_free()

Leave as no-op, and deprecate the function (maybe “is deprecated; unset the parser instance instead”).

@php-pulls
Copy link

Comment on behalf of petk at php.net:

Labelling

@staabm
Copy link
Contributor

staabm commented Oct 12, 2018

quoting nikic:

ext/xml is a push-based streaming parser. ext/xmlreader is a pull-based non-streaming parser. They have different use-cases. If you can use ext/xmlreader, it's the easier option, but it does not cover all use-cases.

@cmb69 is this something worth noting in the docs?

@nikic nikic changed the base branch from master to PHP-7.4 February 12, 2019 14:13
Use an XmlParser object instead of a resource.
These leaks have been resolved by the introduction of a proper
get_gc() implementation.
@nikic nikic force-pushed the xml_parser_object branch from 4bc797f to 250028a Compare February 12, 2019 14:37
@nikic
Copy link
Member Author

nikic commented Feb 12, 2019

Rebased onto PHP-7.4.

Why not make __construct() an alias of xml_parser_create() and xml_parser_create_ns(), respectively (depending on whether $separator is given or not).

I think if we want to have that, we should also be offering the rest of the API as methods as well. It seems weird to me to allow new XmlParser($foo, $bar) but then force the use of xml_parser_* for everything else. Adding an OO API for this is certainly a possibility, just not something I personally want to pursue (this is mostly about fixing GC for me).

Leave as no-op, and deprecate the function (maybe “is deprecated; unset the parser instance instead”).

I think that's what we should do longer term. I don't want to deprecate this function in the same release as we change to objects (as previously the use of xml_parser_free was even recommended), but maybe introduce the deprecation one release later.

@cmb69
Copy link
Member

cmb69 commented Feb 12, 2019

I don't want to deprecate this function in the same release as we change to objects (as previously the use of xml_parser_free was even recommended), but maybe introduce the deprecation one release later.

I think, if xml_parser_free() becomes essentially a no-op, it is reasonable to deprecate it right away. Otherwise users might continue to use it even for new code, although it makes no sense to call it. If a deprecation is too strong for now, at least we could trigger an E_NOTICE.

@bukka
Copy link
Member

bukka commented Mar 5, 2019

I'm -1 on doing this in 7.4 as the check for resource is used in Pear XML parser that is included to quite a few projects in GitHub and I guess it might be even more in closed source. I don't think it's worth the break in minor version though. I'm fine with this going to 8.0.

@cmb69
Copy link
Member

cmb69 commented Mar 5, 2019

Funnily, XML_Parser is subject to bug #72793 (i.e. it leaks memory), which would be fixed by this PR …

@bukka
Copy link
Member

bukka commented Mar 6, 2019

Yeah but it will also break lots of applications. I guess the leak is not such a big issue for all cases and it might have been there for some time already. I think that postponing that to 8.0 would be better though.

@nikic
Copy link
Member Author

nikic commented Apr 8, 2019

Merged as e35a3cb into PHP 8 only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants