Skip to content

Commit

Permalink
Change the project.
Browse files Browse the repository at this point in the history
  • Loading branch information
gy910210 committed May 23, 2015
1 parent e0062dd commit b70e580
Show file tree
Hide file tree
Showing 22 changed files with 509 additions and 563 deletions.
28 changes: 13 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,35 +13,33 @@ In the project "SpellCorrectorBuild" root directory, use Ant command to build th

+ ## How to use
2. Prepare the dictionary file "words.txt", original spell data file "final.out" and parameter file "parameter", and put "SpellCorrectorBuild.jar" in the same directory.
3. Create a "/tmp/" directory to save the middle file: "train_data.txt", "corpus_data.txt", "error_data.txt", "count_data.txt".
4. Use command "java -jar SpellCorrectorBuild.jar" to run the project.
5. You can get a "test_result.txt" file to save the test infomation, and "channle_data.txt" to save the noisy channel model parameter.
3. Use command "java -jar SpellCorrectorBuild.jar" to run the project.
4. You can get a "test_result.txt" file to save the test infomation, and "channle_data.txt" to save the noisy channel model parameter.

+ ## File descriptions
- ### "parameter"
A json format file:

```
{
"tmp_dir": "tmp/",
"channel_file": "channel_data.txt",
"input_file": "final.out",
"words_file": "words.txt",
"test_file": "tmp/train_data.txt",
"equal_prob": 1.0,
"smooth_prob": 0.00000005,
"model_file": "channel_data.txt",
"train_file": "final.out",
"dic_file": "words.txt",
"equal_prob": 0.9,
"smooth_prob": -1,
"most_dis": 2,
"context_num": 2,
"top_num": 3
"top_num": 3,
"train": "yes"
}
```

- ### "channle_data.txt"
The file is like the format:
(word_slice \t key_slice \t log_probability)

+ ## Tips
+ If you don't change the original spell data, you can reuse the middle file in tmp directory so that the train process can be fast.
+ The noisy channel model parameters file "channel_data.txt" can be resued all the time.

+ ## Reference
The main method is based on Noisy Channel Model and an improved method from Microsoft Research
http://ucrel.lancs.ac.uk/acl/P/P00/P00-1037.pdf

+ The main method is based on Noisy Channel Model and an improved method from Microsoft Research http://ucrel.lancs.ac.uk/acl/P/P00/P00-1037.pdf
2 changes: 1 addition & 1 deletion build.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<property name="classes.dir" value="bin" />
<property name="output.dir" value="out" />
<property name="jarname" value="SpellCorrector.jar" />
<property name="mainclass" value="cootek.spell.main.Run" />
<property name="mainclass" value="main.Run" />
<!-- 第三方jar包的路径 -->
<path id="lib-classpath">
<fileset dir="${lib.dir}">
Expand Down
11 changes: 11 additions & 0 deletions parameter
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"model_file": "channel_data.txt",
"train_file": "final.out",
"dic_file": "words.txt",
"equal_prob": 0.9,
"smooth_prob": -1,
"most_dis": 2,
"context_num": 2,
"top_num": 3,
"train": "yes"
}
95 changes: 0 additions & 95 deletions src/cootek/spell/main/PredictModel.java

This file was deleted.

52 changes: 0 additions & 52 deletions src/cootek/spell/main/TrainModel.java

This file was deleted.

90 changes: 0 additions & 90 deletions src/cootek/spell/model/CountModel.java

This file was deleted.

76 changes: 0 additions & 76 deletions src/cootek/spell/model/ErrorModel.java

This file was deleted.

Loading

0 comments on commit b70e580

Please sign in to comment.