Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Annette Rios committed Jun 28, 2017
1 parent 540cd4c commit be84d1e
Show file tree
Hide file tree
Showing 2 changed files with 139 additions and 51 deletions.
68 changes: 17 additions & 51 deletions FreeLingModules/example.cfg
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -43,81 +43,47 @@ TraceModule=0x0000
## formats are a choice of the main program using the library, as well
## as the responsability of calling only the required modules.
## Valid input/output formats are: plain, token, splitted, morfo, tagged, parsed
InputFormat=plain
OutputFormat=tagged
#InputFormat=plain
InputFormat=text
OutputFormat=crf
InputLevel=text
OutputLevel=morfo

# consider each newline as a sentence end
AlwaysFlush=yes

#### Tokenizer options
TokenizerFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/tokenizer.dat
TokenizerFile=your-path-to-squoia/FreeLingModules/grammar_es/tokenizer.dat

#### Splitter options
SplitterFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/splitter.dat
SplitterFile=your-path-to-squoia/FreeLingModules/grammar_es/splitter.dat

#### Morfo options
AffixAnalysis=yes
MultiwordsDetection=yes
NumbersDetection=yes
PunctuationDetection=yes
DatesDetection=yes
QuantitiesDetection=no
QuantitiesDetection=yes
DictionarySearch=yes
ProbabilityAssignment=yes
OrthographicCorrection=no
#OrthographicCorrection=no
DecimalPoint=,
ThousandPoint=.
LocutionsFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/locucions_laura.dat
QuantitiesFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/quantities.dat
AffixFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/afixos_desr.dat
#ProbabilityFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/probabilitats.dat_old
ProbabilityFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/probabilitats.dat
#ProbabilityFile=$FREELINGSHARE/es/probabilitats.dat
DictionaryFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/dicc_desr.src
PunctuationFile=/opt/share/freeling/common/punct.dat
LocutionsFile=your-path-to-squoia/FreeLingModules/grammar_es/locucions_squoia.dat
QuantitiesFile=your-path-to-squoia/FreeLingModules/grammar_es/quantities.dat
AffixFile=your-path-to-squoia/FreeLingModules/grammar_es/afixos_desr.dat
ProbabilityFile=your-path-to-squoia/FreeLingModules/grammar_es/probabilitats.dat
DictionaryFile=your-path-to-squoia/FreeLingModules/grammar_es/dicc_squoia.src
PunctuationFile=your-path-to-squoia/FreeLingModules/common/punct.dat
ProbabilityThreshold=0.001

# NER options
NERecognition=yes
NPDataFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/np_desr.dat
## comment line above and uncomment that below, if you want
## a better NE recognizer (higer accuracy, lower speed)
#NPDataFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/ner/ner-ab.dat

#Spelling Corrector config file
CorrectorFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/corrector/corrector.dat

## Phonetic encoding of words.
Phonetics=no
PhoneticsFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/phonetics.dat
NPDataFile=your-path-to-squoia/FreeLingModules/grammar_es/np_desr.dat

## NEC options
NEClassification=yes
NECFile=$FREELINGSHARE/es/nerc/nec/nec-ab-poor1.dat
#NECFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/nec/nec-svm.dat

## Sense annotation options (none,all,mfs,ukb)
SenseAnnotation=none
SenseConfigFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/senses.dat
UKBConfigFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/ukb.dat

#### Tagger options
Tagger=relax
TaggerHMMFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/tagger.dat
TaggerRelaxFile=$HOME/squoia/google_squoia/FreeLingModules/grammar_es/constr_gram_nora.dat
TaggerRelaxMaxIter=500
TaggerRelaxScaleFactor=670.0
TaggerRelaxEpsilon=0.001
TaggerRetokenize=yes
TaggerForceSelect=tagger

##TODO: don't need this, but has to be specified, otherwise analyzer crashes!!
#### Parser options
GrammarFile=$FREELINGSHARE/es/chunker/grammar-chunk.dat
NECFile=path-to-your-freeling-installation/share/freeling/es/nerc/nec/nec-ab-rich.dat

#### Dependence Parser options
DepTxalaFile=$FREELINGSHARE/es/dep/dependences.dat

#### Coreference Solver options
CoreferenceResolution=no
CorefFile=$FREELINGSHARE/es/coref/coref.dat
122 changes: 122 additions & 0 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
### Installation

Note: this code is not maintained.

## FreeLing

`git clone https://github.com/TALP-UPC/freeling`

Installation (make sure to install from sources, headers are needed), see: https://talp-upc.gitbooks.io/freeling-user-manual/content/installation.html

compile Freeling analyzer with crf output format for wapiti:
```
export $FREELING_INSTALLATION_DIR= path to you installation of FreeLing
export $SQUOIA_DIR= path to this package
g++ -c -o output_crf.o output_crf.cc -I$FREELING_INSTALLATION_DIR/include -I$SQUOIA_DIR/FreeLingModules/config_squoia
g++ -c -o analyzer_client.o analyzer_client.cc -I$FREELING_INSTALLATION_DIR/include -I$SQUOIA_DIR/FreeLingModules/config_squoia
g++ -std=gnu++11 -c -o server_squoia.o server_squoia.cc -I$FREELING_INSTALLATION_DIR/include -I$SQUOIA_DIR/FreeLingModules/config_squoia
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$FREELING_INSTALLATION_DIR/lib
g++ -O3 -Wall -o server_squoia server_squoia.o output_crf.o -L$FREELING_INSTALLATION_DIR/lib -lfreeling -lboost_program_options -lboost_system -lboost_filesystem -lpthread
```

named entity classification:
```
g++ -std=gnu++11 -o nec nec.cc -I$FREELING_INSTALLATION_DIR/include -I$SQUOIA_DIR/FreeLingModules/config_squoia -L$FREELING_INSTALLATION_DIR/lib -lfreeling -lboost_program_options -lboost_system -lboost_filesystem -lpthread
```

analyzer_client:

```
g++ -O3 -Wall -o analyzer_client analyzer_client.o -L$FREELING_INSTALLATION_DIR/local/lib -lfreeling
export FREELINGSHARE=$FREELING_INSTALLATION_DIR/share/freeling
```

once compiled, you can test the server:
```
./server_squoia -f $SQUOIA_DIR/FreeLingModules/es_squoia.cfg --server --port=$PORT 2> logtagging &
echo "eso es una prueba" |./analyzer_client $PORT
```

Link server_squoia, analyzer_client and nec to the /bin folder (optional, if you do not link them, change the paths in es.cfg):

```
cd $SQUOIA_DIR/bin
ln -s ../FreeLingModules/server_squoia .
ln -s ../FreeLingModules/analyzer_client .
ln -s ../FreeLingModules/nec .
```

For system wide use, either link client and server to somewhere in your $PATH (e.g. in `/usr/local/bin`), or add their location to $PATH


## Wapiti

https://wapiti.limsi.fr/

follow installation instructions, then adapt path to wapiti in es.cfg


## MaltParser

http://www.maltparser.org/download.html

follow installation instructions, see http://www.maltparser.org/install.html

set maltPath in es.cfg to your installation of maltparser

compile server-client modules ($MALTPARSER_DIR= path to your maltparser installtion):

```
cd $SQUOIA_DIR/maltparser_tools/src
javac -cp $MALTPARSER_DIR/maltparser-1.8/maltparser-1.8.jar MPClient.java
javac -cp $MALTPARSER_DIR/maltparser-1.8/maltparser-1.8.jar MaltParserServer.java
```

move binaries to ../bin:
`mv MaltParserServer.class MPClient.class ../bin/`


## libsvm
https://www.csie.ntu.edu.tw/~cjlin/libsvm

## foma
https://bitbucket.org/mhulden/foma

## kenlm
https://kheafield.com/code/kenlm

compile squoia module for language model:

```
cd $SQUOIA_DIR/MT_systems/squoia/esqu
g++ -o outputSentences outputSentences.cpp -Ipath-to-your-kenlm/ -DKENLM_MAX_ORDER=6 -Lpath-to-your-kenlm/lib/ -lkenlm path-to-your-foma/libfoma.a -lz -lboost_regex -pthread -lboost_thread -lboost_system
```

## Perl modules required:
```
Getopt::Long;
Storable;
File::Basename;
File::Spec::Functions
XML::LibXML
List::MoreUtils
Algorithm::SVM
```

adapt paths in $SQUOIA_DIR/FreeLingModules/example.cfg
adapt paths in $SQUOIA_DIR/MT_systems/esqu/es_qu.cfg

use translate.pm to process text:

```
cd $SQUOIA_DIR/MT_systems
./translate.pm -f infile -i input-format -o output-format
```

use
```
./translate.pm -h
```
to get a list of options.


0 comments on commit be84d1e

Please sign in to comment.