-
Notifications
You must be signed in to change notification settings - Fork 1
/
CHANGELOG
125 lines (102 loc) · 6.73 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
110505 (done by Huy)
- New features: Sectlabel
- New sectlabel xml model with more training data (resources/sectLabel/sectLabel.modelXml.v2)
- Major changes: Sectlabel
- Addded Omni (lib/Omni/Omnitable.pm lib/Omni/Omnicell.pm lib/Omni/Omnidd.pm lib/Omni/Omniframe.pm lib/Omni/Traversal.pm)
- ParsCit now uses the reference output from Sectlabel as its input
- Minor changes: Sectlabel and ParsCit
- Cleanup all temporary files in /tmp properly
- Minor changes: bug fixes
- Biblioscript: patch from Tim Brody
110121 (done by Huy)
- Major changes: Omni classes for Omnipage XML process using XML::Twig and XML::Writer instead of regular expression
- Added Omni (lib/Omni/Config.pm lib/Omni/Omnicol.pm lib/Omni/Omnidoc.pm lib/Omni/Omniline.pm lib/Omni/Omnipage.pm lib/Omni/Omnipara.pm lib/Omni/Omnirun.pm lib/Omni/Omniword.pm)
- Use the new Omni classes in Parscit module
- Major changes: improve Parscit performance when there's no reference marker
- Added new crf++ XML model, template and train data (resources/parsCit.split.model crfpp/traindata/parsCit/parsCit.split.template crfpp/traindata/parsCit/parsCit.split.train.data)
- Use the new Parscit model when there's no reference marker
- Minor changes: bug fixes
- Parscit post-process: stripPunctuation function removes the semi-colon in XML special characters, e.g. & Thanks to Qasemizadeh, Behrang for reporting these issues.
- Parscit controller: terminate properly when no reference is found (normBodyText size 1 != posArray size 0)
- Parscit post-process: fix the volume number truncation, e.g. vol 5(1) becomes vol 5; Thanks to Lennart Borgman for reporting these issues
- Minor changes: Parscit
- Add bin/xml2train.pl: extract reference text and XML information from Omnipage and save it into Parscit train file's format
100901 (done by Thang)
- Incorporate BiblioScrip (http://github.com/mromanello/BiblioScript) and BibUtils (http://www.scripps.edu/~cdputnam/software/bibutils/)
100401e (done by Min on 100725)
- Minor changes to paths and to make it work again from wing.nus directory
(moved from forecite, due to restructuring of WING server)
100401d
- Minor changes to documentation and ParsHed library updating.
100401c
- Minor changes for correcting errors with punctuation and XML
entities in reference string parsing. Reported by Cheong Chi Hong
and Mario Lipinski. Fixed by Minh-Thang Luong.
100401b
- Minor changes (bug fixes) to section labeler model.
100401 (done by Thang)
- Major Change: Added SectLabel module (due to Minh-Thang Luong and Thuy Dung Nguyen)
- Added Iconip training data from Cheong Chi Hong
- Updated default model to include Iconip data
- Updated CGI demo to call new SectLabel module as well
- Updated documentation
- Corrected small regexp error in lib/ParsCit/PreProcess.pm
- Corrected small problem with training data in mixed-humanities
- Added SectLabel (bin/sectExtract.pl, bin/sectExtract/, resource/sectExtract, lib/SectExtract)
- Modified bin/citeExtract.pl, lib/ParsCit/PostProcess.pm, lib/ParsHed/PostProcess.pm to combine ParsCit, ParsHed, SectLabel, and standardize XML output
- Added test/ for testing purpose with 12 samples documents and standard-output of citeExtract.pl in 5 modes (citations, header, section, meta, and all) using both txt and XML inputs
- Added SectLabel annotated data doc/sectLabel.tagged.txt and doc/sectLabelXml.tagged.txt (40 documents fully annotated)
- Added in crfpp/traindata CRF++ feature files (sectLabel.train.dataXml, sectLabel.train.data) and templates (sectLabel.templateXml, sectLabel.template)
- Incorporated works by Emma
- Added GenericSect code (bin/sectLabel/genericSectExtract.rb, crfpp/traindata/genericSect.train.data, bin/sectLabel/genericSect/) into SectLabel
- Added GenericSect annotated data doc/genericSect.tagged.txt (211 documents with headers annotated)
- Added in crfpp/traindata CRF++ feature file (genericSect.train.data) and template (genericSect.template)
090625b
- Updated documentation only. no change to executables
- Released on 30 September 2009
090625 (due to Minh-Thang Luong)
- Standardized and improved ParsHed model with line-level classification instead of token-level as previously.
- Add a post-processing module for ParsHed to normalize field data, e.g. authors, email, etc.
- Detailed changes are reflected as follows:
* Added resources/parsHed - all parsHed-related including models (old models in resources/parsHed/archive), template file, and top frequent keyword files.
* Added lib/ParsHed - similar architect as lib/ParsCit to modularizes and faciliates line-level training in ParsHed.
lib/ParsHed/Tr2crf.pm: line-level CRF feature extractor
lib/ParsHed/PostProcess.pm: post-processing of field data
* Added bin/parsHed - all parsHed-related scripts (redo.parsHed.pl, and tr2crffpp_parsHed.pl).
Includes parseXmlHeader.pl and convert2TokenLevel.pl used by redo.parsHed.pl to convert output from line to token-level.
* Updated bin/headExtract.pl - to use the new model lib/ParsHed, as well as old model (with -tokenLevel flag).
* Bug fixes in Citation.pm and Preprocess.pm (see doc/v090625-Artemy-issues.txt). Thanks to Artemy Kolchinsky for reporting these issues.
* Reordered the CHANGELOG into descending chronological order.
090625 (due to Min-Yen KAN)
- Deprecated and unified ParsHedClient.rb into ParsCitClient.rb
- Deprecated and unified ParsHedServer.rb into ParsCitServer.rb
- Added wsdl/forecite.wsdl which describes the ParsCit portion of the services
- Added bin/ParsCitClientWSDL.rb which demonstrates the use of forecite.wsdl
090316:
- Adds ParsHed module, updates to:
* resources/ - parsHed.*.model model files for binaries
* doc/ - svm_headerparse.tagged.txt (from CiteSeer; 935 headers)
* bin/ - headExtract.pl, phOutput2xml.pl, redo.parsCit.pl, redo.parsHed.pl
* crfpp/traindata/ - parsHed.train.data (converted from svm_headerparse.tagged.txt)
Changes doc/index.html, doc/parsCit.cgi to handle parsHed call (not
yet entirely integrated, still separate module).
081201:
- Bug fixes from Scienstein.org team
* Added context positions
* Handle reference patterns such as [1-5]
* Handles context references within same window
* See doc/v081201-sciensteinEmail.txt for detailed notes
- Updated ParsCit.cgi to update for context position output
080917:
- Added new training data from Matteo Romanello
- Fixed Preprocess.pm bug (thanks to Dain Kaplan)
- Upgraded CRF++ model to 0.51 (now bundled with CRF++ source just in case it is no longer available on sourceforge)
- Added bin/redo.pl script to retrain model
080402:
- Added re-tagged data from FLUX-CiM
- Added conlleval.pl evaluation script
- Added output2xml.pl transformation script
- Corrected warning in parseRefString.pl (thanks to Ayeh Bandeh-Ahmadi)
080310:
- First released version to Peter Weiland
- Web services working for wing machines at NUS