We use the “jump and stay” method for extracting proper verb centered constructions (pVCCs): (1) from UD corpora in different languages (Czech, Dutch, English, Finnish, German, Hungarian, Norwegian, Turkish and Wolof) showing the language independency of the method; (2) from Hungarian corpora of different genres dependency-annotated by the e-magyar system showing that the pVCCs obtained appropriately represent the topic of the corpus.
Type:
make INPUT=<inputfile>
where <inputfile>
is the name of the input file
placed in the input
directory.
The output files named <inputfile>_<verb>.test.out3.pVCC
are created in the result
directory.
The newly created output can be compared to
the original saved version:
make result_pVCC_diff
Tested on Debian Linux. (May work on other operation systems...)
Requirements:
python3
and
make
make INPUT=??
as UD corpora filenames contain two letters.
make INPUT=????
as e-magyar corpora filenames contain four letters.
- dependency-analyse the text with e-magyar
using
tok,morph,pos,conv-morph,dep,conll
modules, (see e-magyar for details); - put the analysed file (e.g.
mycorpus
) into theinput
dir; - run
make INPUT=mycorpus
process_conll.py
preprocesses the UD/e-magyar corpora
in order to be able to run the
“jump and stay” implementation impl.py
on them.
impl.py
and Makefile.jands
are taken from the
“jump and stay” method repo (commit f3ca1ec
):
https://github.com/sassbalint/double-cube-jump-and-stay/commit/f3ca1ec
If you want to use this, please cite the paper below and contact me. :) No warranty, sorry.
If you use this, please cite one of the following papers: