Skip to content

Commit 0233c5a

Browse files
committed
blog7
1 parent 1b6101c commit 0233c5a

File tree

4 files changed

+63
-0
lines changed

4 files changed

+63
-0
lines changed

blogs/images/select uml.png

62.4 KB
Loading

blogs/images/streetfighter.jpg

106 KB
Loading

blogs/jsoup7.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
Jsoup代码解读之七-实现一个CSS Selector
2+
-----
3+
4+
![street fighter][1]
5+
6+
当当当!终于来到了Jsoup的特色:CSS Selector部分。selector也是[webmagic](https://github.com/code4craft/webmagic)开发的一个重点。附上一张street fighter的图,希望以后webmagic也能挑战Jsoup!
7+
8+
w3c的CSS Selector规范:[http://www.w3.org/TR/CSS2/selector.html](http://www.w3.org/TR/CSS2/selector.html)
9+
10+
Jsoup的select包里,类结构如下:
11+
12+
![uml][2]
13+
14+
Jsoup的select核心是`Evaluator``Evaluator`是一个抽象类,它只有一个方法:
15+
16+
```java
17+
public abstract boolean matches(Element root, Element element);
18+
```
19+
20+
注意这里传入了root,是为了某些情况下对树进行遍历时用的。在我们调用document.select(css)方法之后,Jsoup会将
21+
22+
23+
24+
<!----> [1]: http://static.oschina.net/uploads/space/2013/0830/180244_r1Vb_190591.jpg
25+
26+
[2]: http://static.oschina.net/uploads/space/2013/0830/184337_j85b_190591.png
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
package us.codecraft.learning.select;
2+
3+
import org.jsoup.nodes.Document;
4+
import org.jsoup.parser.Parser;
5+
import org.jsoup.select.Elements;
6+
7+
/**
8+
* @author code4crafter@gmail.com
9+
*/
10+
public class SelectorTest {
11+
12+
public static void main(String[] args) {
13+
String html = "<body>\n" +
14+
" <textarea>\n" +
15+
" &lt;!-- Text --&gt;\n" +
16+
" xxx\n" +
17+
" </textarea> \n" +
18+
" <div> \n" +
19+
" <table> \n" +
20+
" <!-- InTable --> \n" +
21+
" <!-- InTableText --> xxx \n" +
22+
" <tbody> \n" +
23+
" <tr> \n" +
24+
" <!-- InRow --> \n" +
25+
" <td> \n" +
26+
" <!-- InCell --> </td> \n" +
27+
" </tr> \n" +
28+
" </tbody> \n" +
29+
" </table> \n" +
30+
" </div> \n" +
31+
"</body>";
32+
Parser parser = Parser.htmlParser();
33+
Document document = parser.parseInput(html, "");
34+
Elements select = document.select("body div");
35+
System.out.println(select);
36+
}
37+
}

0 commit comments

Comments
 (0)