Skip to content

Commit 49099f4

Browse files
author
steven.devijver
committed
Small bug fixes + convenience main class + start of documentation.
1 parent 006180a commit 49099f4

File tree

11 files changed

+646
-55
lines changed

11 files changed

+646
-55
lines changed

.classpath

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,6 @@
66
<classpathentry kind="lib" path="lib/junit.jar"/>
77
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
88
<classpathentry kind="lib" path="lib/commons-lang.jar"/>
9+
<classpathentry kind="lib" path="lib/commons-cli-1.1.jar"/>
910
<classpathentry kind="output" path="target/classes"/>
1011
</classpath>

doc/wiki-guide.html

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
<div><h1> Wiki notation guide </h1><h2> Introduction </h2><p>Welcome to the wiki notation guide for the Java wikipedia parser.</p><p>The wikipedia parsers supports a subset of the wikipedia wiki notation. The parser's intended use is to add wiki-notation support in applications. It is able to parse some wiki pages from wikipedia but certainly not all of them.</p><h2> Wiki notation </h2><h3> Typeface modifiers </h3><p>The following typeface modifiers are supported:</p><p><table border="1"><caption> Typeface modifiers</caption><tr><th> &nbsp;Wiki notation&nbsp; </th><th> &nbsp;Result in HTML&nbsp;</th></tr><tr><td align="center"><pre>''Italics''</pre></td><td align="center"><i>Italics</i></td></tr><tr><td align="center"><pre>'''Bold'''</pre></td><td align="center"><b>Bold</b></td></tr><tr><td align="center"><pre>'''''Bold and italics'''''</pre></td><td align="center"><b><i>Bold and italics</i></b></td></tr><tr><td align="center"><pre>'''''Italics'' and bold'''</pre></td><td align="center"><b><i>Italics</i> and bold</b></td></tr></table></p><p>The type modifiers can be used anywhere, except in table attributes. They will be ignored inside the &lt;nowiki&gt; tag.</p><h3> Lists </h3><p>The wikipedia parsers supports ordered and unordered lists. They can be combined in any way and to an endless depth. List items start with either the # or * character at the start of the line.</p><h4> Ordered lists </h4><p><table border="1"><tr><th> &nbsp;Wiki notation&nbsp; </th><th> &nbsp;Result in HTML&nbsp;</th></tr><tr><td><p><pre># Apple
2+
# Lemon
3+
# Orange</pre></p></td><td><ol><li>Apple</li><li>Lemon</li><li>Orange</li></ol></td></tr><tr><td><p><pre>#Fruits
4+
##Apple
5+
##Lemon
6+
##Orange
7+
#Vegetables
8+
##Garlic
9+
##Onion
10+
##Leech</pre></p></td><td><ol><li>Fruits<ol><li>Apple</li><li>Lemon</li><li>Orange</li></ol></li><li>Vegetables<ol><li>Garlic</li><li>Onion</li><li>Leech</li></ol></li></ol></td></tr></table></p><h4> Unordered lists </h4><p><table border="1"><tr><th> &nbsp;Wiki notation&nbsp; </th><th> &nbsp;Result in HTML&nbsp;</th></tr><tr><td><p><pre>* Apple
11+
* Lemon
12+
* Orange</pre></p></td><td><ul><li>Apple</li><li>Lemon</li><li>Orange</li></ul></td></tr><tr><td><p><pre>*Fruits
13+
**Apple
14+
**Lemon
15+
**Orange
16+
*Vegetables
17+
**Garlic
18+
**Onion
19+
**Leech</pre></p></td><td><ul><li>Fruits<ul><li>Apple</li><li>Lemon</li><li>Orange</li></ul></li><li>Vegetables<ul><li>Garlic</li><li>Onion</li><li>Leech</li></ul></li></ul></td></tr></table></p><h4> Combined ordered and unordered lists </h4><p><table border="1"><tr><th> &nbsp;Wiki notation&nbsp; </th><th> &nbsp;Result in HTML&nbsp;</th></tr><tr><td><p><pre>#Fruits
20+
#*Apple
21+
#*Lemon
22+
#*Orange
23+
#Vegetables
24+
#*Garlic
25+
#*Onion
26+
#*Leech</pre></p></td><td><ol><li>Fruits<ul><li>Apple</li><li>Lemon</li><li>Orange</li></ul></li><li>Vegetables<ul><li>Garlic</li><li>Onion</li><li>Leech</li></ul></li></ol></td></tr><tr><td><p><pre>*Fruits
27+
*#Apple
28+
*#Lemon
29+
*#Orange
30+
*Vegetables
31+
*#Garlic
32+
*#Onion
33+
*#Leech</pre></p></td><td><ul><li>Fruits<ol><li>Apple</li><li>Lemon</li><li>Orange</li></ol></li><li>Vegetables<ol><li>Garlic</li><li>Onion</li><li>Leech</li></ol></li></ul></td></tr></table></p><h3> Literals </h3><p>The wikipedia parser supports two types of literals:</p><ul><li>Literals with wiki notations</li><li>Literals without wiki notations</li></ul><h4> Literals with wiki notations </h4><p>If these literals have wiki notations they will be parsed by the wikipedia parser. They start with two space characters at the start of the line.</p><p><b>Wiki notation example</b>:</p><p><pre>
34+
'''public class''' MyClass
35+
{
36+
}
37+
</pre></p><p><b>Result in HTML</b>:</p><pre>
38+
<b>public class
39+
</b> MyClass
40+
{
41+
}
42+
</pre></div>

doc/wiki-guide.wiki

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
== Wiki notation guide ==
2+
3+
=== Introduction ===
4+
5+
Welcome to the wiki notation guide for the Java wikipedia parser.
6+
7+
The wikipedia parsers supports a subset of the wikipedia wiki notation. The parser's intended use is to add wiki-notation support in applications. It is able to parse some wiki pages from wikipedia but certainly not all of them.
8+
9+
=== Wiki notation ===
10+
11+
==== Typeface modifiers ====
12+
13+
The following typeface modifiers are supported:
14+
15+
{| border="1"
16+
|+ Typeface modifiers
17+
|-
18+
! &nbsp;Wiki notation&nbsp; !! &nbsp;Result in HTML&nbsp;
19+
|-
20+
| align="center"|<pre><nowiki>''Italics''</nowiki></pre>
21+
| align="center"|''Italics''
22+
|-
23+
| align="center"|<pre><nowiki>'''Bold'''</nowiki></pre>
24+
| align="center"|'''Bold'''
25+
|-
26+
| align="center"|<pre><nowiki>'''''Bold and italics'''''</nowiki></pre>
27+
| align="center"|'''''Bold and italics'''''
28+
|-
29+
| align="center"|<pre><nowiki>'''''Italics'' and bold'''</nowiki></pre>
30+
| align="center"|'''''Italics'' and bold'''
31+
|}
32+
33+
The type modifiers can be used anywhere, except in table attributes. They will be ignored inside the <nowiki><nowiki></nowiki> tag.
34+
35+
==== Lists ====
36+
37+
The wikipedia parsers supports ordered and unordered lists. They can be combined in any way and to an endless depth. List items start with either the # or * character at the start of the line.
38+
39+
===== Ordered lists =====
40+
41+
{| border="1"
42+
|-
43+
! &nbsp;Wiki notation&nbsp; !! &nbsp;Result in HTML&nbsp;
44+
|-
45+
|<pre><nowiki># Apple
46+
# Lemon
47+
# Orange</nowiki></pre>
48+
|<multi># Apple
49+
# Lemon
50+
# Orange</multi>
51+
|-
52+
|<pre><nowiki>#Fruits
53+
##Apple
54+
##Lemon
55+
##Orange
56+
#Vegetables
57+
##Garlic
58+
##Onion
59+
##Leech</nowiki></pre>
60+
|<multi>#Fruits
61+
##Apple
62+
##Lemon
63+
##Orange
64+
#Vegetables
65+
##Garlic
66+
##Onion
67+
##Leech</multi>
68+
|}
69+
70+
===== Unordered lists =====
71+
72+
{| border="1"
73+
|-
74+
! &nbsp;Wiki notation&nbsp; !! &nbsp;Result in HTML&nbsp;
75+
|-
76+
|<pre><nowiki>* Apple
77+
* Lemon
78+
* Orange</nowiki></pre>
79+
|<multi>* Apple
80+
* Lemon
81+
* Orange</multi>
82+
|-
83+
|<pre><nowiki>*Fruits
84+
**Apple
85+
**Lemon
86+
**Orange
87+
*Vegetables
88+
**Garlic
89+
**Onion
90+
**Leech</nowiki></pre>
91+
|<multi>*Fruits
92+
**Apple
93+
**Lemon
94+
**Orange
95+
*Vegetables
96+
**Garlic
97+
**Onion
98+
**Leech</multi>
99+
|}
100+
101+
===== Combined ordered and unordered lists =====
102+
103+
{| border="1"
104+
|-
105+
! &nbsp;Wiki notation&nbsp; !! &nbsp;Result in HTML&nbsp;
106+
|-
107+
|<pre><nowiki>#Fruits
108+
#*Apple
109+
#*Lemon
110+
#*Orange
111+
#Vegetables
112+
#*Garlic
113+
#*Onion
114+
#*Leech</nowiki></pre>
115+
|<multi>#Fruits
116+
#*Apple
117+
#*Lemon
118+
#*Orange
119+
#Vegetables
120+
#*Garlic
121+
#*Onion
122+
#*Leech</multi>
123+
|-
124+
|<pre><nowiki>*Fruits
125+
*#Apple
126+
*#Lemon
127+
*#Orange
128+
*Vegetables
129+
*#Garlic
130+
*#Onion
131+
*#Leech</nowiki></pre>
132+
|<multi>*Fruits
133+
*#Apple
134+
*#Lemon
135+
*#Orange
136+
*Vegetables
137+
*#Garlic
138+
*#Onion
139+
*#Leech</multi>
140+
|}
141+
142+
==== Literals ====
143+
144+
The wikipedia parser supports two types of literals:
145+
146+
* Literals with wiki notations
147+
* Literals without wiki notations
148+
149+
===== Literals with wiki notations =====
150+
151+
If these literals have wiki notations they will be parsed by the wikipedia parser. They start with two space characters at the start of the line.
152+
153+
'''Wiki notation example''':
154+
155+
<pre><nowiki>
156+
'''public class''' MyClass
157+
{
158+
}
159+
</nowiki></pre>
160+
161+
'''Result in HTML''':
162+
163+
'''public class''' MyClass
164+
{
165+
}
166+

lib/commons-cli-1.1.jar

35.3 KB
Binary file not shown.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
package be.devijver.wikipedia;
2+
3+
import java.io.BufferedReader;
4+
import java.io.FileReader;
5+
import java.io.FileWriter;
6+
import java.io.IOException;
7+
import java.io.OutputStreamWriter;
8+
import java.io.Reader;
9+
import java.io.Writer;
10+
11+
import org.apache.commons.cli.CommandLine;
12+
import org.apache.commons.cli.CommandLineParser;
13+
import org.apache.commons.cli.Options;
14+
import org.apache.commons.cli.ParseException;
15+
import org.apache.commons.cli.PosixParser;
16+
17+
public class MainClass {
18+
19+
public static void main(String[] args) throws ParseException, IOException {
20+
Options options = new Options();
21+
22+
options.addOption("f", true, "Wikitext file to parse");
23+
options.addOption("o", true, "HTML file to write");
24+
25+
CommandLineParser parser = new PosixParser();
26+
27+
CommandLine cmdLine = parser.parse(options, args);
28+
29+
String fileName = cmdLine.getOptionValue("f");
30+
31+
Reader reader = new FileReader(fileName);
32+
33+
BufferedReader buf = new BufferedReader(reader);
34+
35+
String wikitext = "";
36+
String line;
37+
while ((line = buf.readLine()) != null) {
38+
wikitext += line + "\n";
39+
}
40+
41+
buf.close();
42+
43+
Writer out;
44+
if (cmdLine.getOptionValue("o") != null) {
45+
out = new FileWriter(cmdLine.getOptionValue("o"), false);
46+
} else {
47+
out = new OutputStreamWriter(System.out);
48+
}
49+
50+
try {
51+
Parser.toHtml(wikitext, null, out, true);
52+
} finally {
53+
out.close();
54+
}
55+
}
56+
}

src/java/be/devijver/wikipedia/html/HtmlVisitor.java

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -120,9 +120,11 @@ public void endItalics() {
120120
output.append("</i>");
121121
}
122122

123+
private boolean inLiteral = false;
123124
public void endLiteral() {
124125
output.append("</pre>");
125126
output.flush();
127+
inLiteral = false;
126128
}
127129

128130
public void endNormalLinkWithCaption() {
@@ -157,7 +159,8 @@ public void endUnorderedListItem() {
157159
}
158160

159161
public void handleString(String s) {
160-
output.append(characterEncoder.encode(s) + "\n");
162+
output.append(characterEncoder.encode(s));
163+
if (inLiteral) output.append("\n");
161164
}
162165

163166
public void startBold() {
@@ -202,6 +205,7 @@ public void startItalics() {
202205

203206
public void startLiteral() {
204207
output.append("<pre>\n");
208+
inLiteral = true;
205209
}
206210

207211
public void startNormalLinkWithCaption(String s) {
@@ -247,7 +251,7 @@ public void startUnorderedListItem() {
247251
}
248252

249253
public void handleNowiki(String nowiki) {
250-
output.append(nowiki);
254+
output.append(characterEncoder.encode(nowiki));
251255
}
252256

253257
public void handleNormalLinkWithoutCaption(String string) {
@@ -260,11 +264,11 @@ public void handleSmartLinkWithoutCaption(String string) {
260264
}
261265

262266
public void endPre() {
263-
output.append("\n</pre>");
267+
output.append("</pre>");
264268
}
265269

266270
public void startPre() {
267-
output.append("<pre>\n");
271+
output.append("<pre>");
268272
}
269273

270274
public void endTable() {

0 commit comments

Comments
 (0)