File tree 4 files changed +63
-0
lines changed
src/main/java/us/codecraft/learning/select
4 files changed +63
-0
lines changed Original file line number Diff line number Diff line change
1
+ Jsoup代码解读之七-实现一个CSS Selector
2
+ -----
3
+
4
+ ![ street fighter] [ 1 ]
5
+
6
+ 当当当!终于来到了Jsoup的特色:CSS Selector部分。selector也是[ webmagic] ( https://github.com/code4craft/webmagic ) 开发的一个重点。附上一张street fighter的图,希望以后webmagic也能挑战Jsoup!
7
+
8
+ w3c的CSS Selector规范:[ http://www.w3.org/TR/CSS2/selector.html ] ( http://www.w3.org/TR/CSS2/selector.html )
9
+
10
+ Jsoup的select包里,类结构如下:
11
+
12
+ ![ uml] [ 2 ]
13
+
14
+ Jsoup的select核心是` Evaluator ` 。` Evaluator ` 是一个抽象类,它只有一个方法:
15
+
16
+ ``` java
17
+ public abstract boolean matches(Element root, Element element);
18
+ ```
19
+
20
+ 注意这里传入了root,是为了某些情况下对树进行遍历时用的。在我们调用document.select(css)方法之后,Jsoup会将
21
+
22
+
23
+
24
+ <!-- --> [1]: http://static.oschina.net/uploads/space/2013/0830/180244_r1Vb_190591.jpg
25
+
26
+ [ 2 ] : http://static.oschina.net/uploads/space/2013/0830/184337_j85b_190591.png
Original file line number Diff line number Diff line change
1
+ package us .codecraft .learning .select ;
2
+
3
+ import org .jsoup .nodes .Document ;
4
+ import org .jsoup .parser .Parser ;
5
+ import org .jsoup .select .Elements ;
6
+
7
+ /**
8
+ * @author code4crafter@gmail.com
9
+ */
10
+ public class SelectorTest {
11
+
12
+ public static void main (String [] args ) {
13
+ String html = "<body>\n " +
14
+ " <textarea>\n " +
15
+ " <!-- Text -->\n " +
16
+ " xxx\n " +
17
+ " </textarea> \n " +
18
+ " <div> \n " +
19
+ " <table> \n " +
20
+ " <!-- InTable --> \n " +
21
+ " <!-- InTableText --> xxx \n " +
22
+ " <tbody> \n " +
23
+ " <tr> \n " +
24
+ " <!-- InRow --> \n " +
25
+ " <td> \n " +
26
+ " <!-- InCell --> </td> \n " +
27
+ " </tr> \n " +
28
+ " </tbody> \n " +
29
+ " </table> \n " +
30
+ " </div> \n " +
31
+ "</body>" ;
32
+ Parser parser = Parser .htmlParser ();
33
+ Document document = parser .parseInput (html , "" );
34
+ Elements select = document .select ("body div" );
35
+ System .out .println (select );
36
+ }
37
+ }
You can’t perform that action at this time.
0 commit comments