Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpath not working? #129

Open
pyoio opened this issue Oct 21, 2014 · 6 comments
Open

xpath not working? #129

pyoio opened this issue Oct 21, 2014 · 6 comments

Comments

@pyoio
Copy link

pyoio commented Oct 21, 2014

I've tried searching around for this and I've come to the conclusion I must be doing something crazy. I have the following XML:

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="..."?>
<nitro xsi:schemaLocation="..." xmlns="..." xmlns:xsi="...">
    <results page="1" page_size="10" total="1" more_than="0">
        <episode>
            <pid>wcrhl7thx1w</pid>
        </episode>
    </results>
</nitro>

Now when I do the following:

$(document).find("results").find("episode").find("pid")

I get the expected result, a Match whose .text() is wcrhl7thx1w.

However, when I do:

$(document).find("results episode pid")

or

$(document).xpath("//results//episode//pid")

I get back an empty Match object. I've also tried //pid, //results and a variety of other xpath and nothing comes back. The only xpath I can get something back for is //*.

Is there something amiss in 1.2.0 or have I been looking at this too long and missed something?

p.s. Thank you for the library, I love it.

@lukaseder
Copy link
Member

Hmm, yes, there's an implementation difference in Impl find, between "simple" selectors (matching [\\w-]+) and non-simple ones:

    @Override
    public final Impl find(final String selector) {

        // The * selector is evaluated using the standard DOM API
        if ("*".equals(selector)) {
            List<NodeList> result = new ArrayList<NodeList>();

            for (Element element : elements) {
                result.add(element.getElementsByTagName(selector));
            }

            return new Impl(document, namespaces, this).addNodeLists(result);
        }

        // Simple selectors are valid XML element names without namespaces. They
        // are fetched using a namespace-stripping filter.

        // [#107] Note, Element.getElementsByTagNameNS() cannot be used, as the
        // underlying document may not be namespace-aware!
        else if (SIMPLE_SELECTOR.matcher(selector).matches()) {
            return find(JOOX.tag(selector, true));
        }

        // CSS selectors are transformed to XPath expressions
        else {
            return new Impl(document, namespaces, this).addElements(xpath(css2xpath(selector, isRoot())).get());
        }
    }

The difference is there for performance reasons, but it seems to produce different results, depending on the namespaces that are in use. I suspect the formally correct usage of jOOX with namespaces:

$(document)
    .namespace("my-prefix", "...") // put your namespace URL here, as in xmlns="..."
    .xpath("//my-prefix:results//my-prefix:episode//my-prefix:pid");

Namespaces currently seem not to be supported when using find() and css selectors. This should be fixed.

@ccudennec
Copy link

I just ran into the same issue. What do you think about using the local name instead of the tag name, e.g."//*[local-name() = 'foo']" in CSS2XPath?

@lukaseder lukaseder modified the milestones: Version 1.4.0, Version 1.5.0 May 3, 2016
@Geraldf
Copy link

Geraldf commented Apr 7, 2017

I have an issue as well. I try to get the links of all href using the following xpath String:
"//a[contains(@href, 'wiki/Mathe_f')]/@href/text()"
this returns an empty selection, while
"//a[contains(@href, 'wiki/Mathe_f')]"
returns all relevant "a" elements

@lukaseder
Copy link
Member

@Geraldf: jOOX can only "Match" XML elements, not attributes or text nodes, unfortunately. You could write this to get the same result, though:

$(xml).xpath("//a[contains(@href, 'wiki/Mathe_f')]").attr("href")

@lukaseder lukaseder modified the milestones: Version 1.6.0, Version 1.7.0 Nov 30, 2017
@pyoio pyoio closed this as completed Aug 7, 2019
@lukaseder lukaseder reopened this Aug 13, 2019
@lukaseder lukaseder modified the milestones: Version 2.0.0, Version 2.1.0 Dec 8, 2021
@moaxcp
Copy link

moaxcp commented Oct 18, 2023

I have run into the same issue with using a default namespace. After checking the code I saw is used to create the document setNamespaceAware(true). Then I got the idea to pass the document to jOOX instead.

var domFactory = DocumentBuilderFactory.newInstance();
var builder = domFactory.newDocumentBuilder();
var document = builder.parse(new ByteArrayInputStream("""
    <VAST version="4.2" xmlns="http://www.iab.com/VAST">
            
    </VAST>
    """.getBytes()));
var vast42version = $(document).xpath("/VAST").attr("version");

assertThat(vast42version).isEqualTo("4.2");

This worked for me but when performing modifications to the document the default builder is used again to build the code fragments. The modifications end up with an empty namespace attribute.

$(document).append("\n<Pricing model=\"cpm\" currency=\"USD\"><![CDATA[ 25.00 ]]></Pricing>\n");
var transformer = TransformerFactory.newInstance().newTransformer();
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
assertThatXml(writer.toString())
    .and("""
        <?xml version="1.0" encoding="UTF-8" standalone="no"?>
        <VAST xmlns="http://www.iab.com/VAST" version="4.2">
            <Pricing currency="USD" model="cpm" xmlns=""><![CDATA[ 25.00 ]]></Pricing>
        </VAST>
        """)
    .ignoreWhitespace()
    .areIdentical();

What I believe may work but I have not tried yet is building a Document and appending the Element instead of a String.

Edit:

Ok even making the document and appending the elements ends up with an empty xmlns attribute. What worked for me was to use the default document made by jOOX. Then instead of modifying the xml with strings pass in the elements to jOOX. I took some code from jOOX and added the namespace to the wrapper document. This is modified from Util.createContent.

        public Element[] modifyContent(String content) {
            String wrapped = "<dummy xmlns=\"http://www.iab.com/VAST\">" + content + "</dummy>";
            Document parsed = null;
            try {
                parsed = JOOX.builder().parse(new InputSource(new StringReader(wrapped)));
            } catch (SAXException | IOException e) {
                return new Element[0];
            }
            DocumentFragment fragment = parsed.createDocumentFragment();
            NodeList children = parsed.getDocumentElement().getChildNodes();

            // appendChild removes children also from NodeList!
            while (children.getLength() > 0) {
                fragment.appendChild(children.item(0));
            }

            fragment = (DocumentFragment) document.importNode(fragment, true);
            return JOOX.list(fragment.getChildNodes()).toArray(new Element[0]);
        }

@lukaseder
Copy link
Member

@moaxcp: I'm not sure if your comment is a question, or a bug report, or a feature request? In any case, to properly track things (as this issue has already been closed), can you please create a new issue? It may or may not be related to this one...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants