-
Notifications
You must be signed in to change notification settings - Fork 114
Open
Milestone
Description
Hey @technosophos,
Before going into my issue just wanted to say I love your work on QueryPath!
As for the issue I was wondering if you would have any advice on what I could be doing wrong and why QueryPath seems to be ignoring the fact that a string is valid UTF-8.
<?php
// Parse the HTML using QueryPath
$qp_options = array(
'convert_from_encoding' => 'UTF-8',
'convert_to_encoding' => 'UTF-8',
'strip_low_ascii' => FALSE,
);
//Taxonomy
$this->qp = htmlqp($dbRow->BreadCrumbHTML, NULL, $qp_options);
$taxonomy = $this->qp->top()->find('ul li:last')->text();Where the content of $dbRow->BreadCrumbHTML is:
<ul><li style="display:inline;"><a href="/fr/index.html">Accueil</a></li> > <li><a href="/fr/roads_trans/index.html">Routes et transports</a></li> > <li>Vélo</li></ul>and the string I get returned for $taxonomy is:
"Vélo"
If I don't use querypath and just get the whole text the UTF-8 is maintained. I did also check to make sure mb_convert_encoding is being called and it does work and maintain the UTF-8 Encoding at that point in xdebug (PHP 5.3.9). Would you have any sagely advice on this on particular routes to further debug?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels