Name	Name	Last commit message	Last commit date
Latest commit History 955 Commits
cmake	cmake
docs	docs
libs	libs
model-template	model-template
sample	sample
script	script
src	src
test	test
travis	travis
.clang-format	.clang-format
.gitignore	.gitignore
.travis.yml	.travis.yml
CMakeLists.txt	CMakeLists.txt
CONTRIBUTORS	CONTRIBUTORS
LICENSE	LICENSE
README.md	README.md
do_format.sh	do_format.sh
make_release.sh	make_release.sh
version.cmake	version.cmake

Name

Last commit message

Last commit date

cmake

What is Juman++

A new morphological analyser that considers semantic plausibility of word sequences by using a recurrent neural network language model (RNNLM). Version 2 has better accuracy and greatly (>100x) improved analysis speed than the original Juman++.

Installation

System Requirements

OS: Linux or MacOS X. Windows is not supported (yet?)
Compiler: C++14 compatible (will downgrade to C++11 later)
- For, example gcc 5.1+, clang 3.4+
- We test on GCC and clang

CMake v3.1 or later

Building from a package

Download the package from Releases

$ tar xf jumanpp-<version>.tar.xz # decompress the package
$ cd jumanpp-<version> # move into the directory
$ mkdir bld # make a subdirectory for build
$ cd bld
$ cmake .. \
  -DCMAKE_BUILD_TYPE=Release \ # you want to do this for performance
  -DCMAKE_INSTALL_PREFIX=<prefix> # where to install Juman++
$ make install -j<parallelism>

Building from git

Generally, the differences between the package and this repository is the presence of a prebuilt model and absense of some development scripts.

$ mkdir cmake-build-dir # CMake does not support in-source builds
$ cd cmake-build-dir
$ cmake ..
$ make # -j

Usage

Quick start

% echo "魅力がたっぷりと詰まっている" | jumanpp
魅力 みりょく 魅力 名詞 6 普通名詞 1 * 0 * 0 "代表表記:魅力/みりょく カテゴリ:抽象物"
が が が 助詞 9 格助詞 1 * 0 * 0 NIL
たっぷり たっぷり たっぷり 副詞 8 * 0 * 0 * 0 "自動認識"
と と と 助詞 9 格助詞 1 * 0 * 0 NIL
詰まって つまって 詰まる 動詞 2 * 0 子音動詞ラ行 10 タ系連用テ形 14 "代表表記:詰まる/つまる ドメイン:料理・食事 自他動詞:他:詰める/つめる"
いる いる いる 接尾辞 14 動詞性接尾辞 7 母音動詞 1 基本形 2 "代表表記:いる/いる"
EOS

Main options

usage: jumanpp [options] 
  -s, --specifics              lattice format output (unsigned int [=5])
  --beam <int>                 set beam width used in analysis (unsigned int [=5])
  -v, --version                print version
  -h, --help                   print this message
  --model <file>               specify a model location

More complete description of all options will come later.

Input

JUMAN++ can handle only utf-8 encoded text as an input. Lines beginning with # will be interpreted as comments.

Other

DEMO

You can play around our web demo which displays a subset of the whole lattice. The demo still uses v1 but, it will be updated to v2 soon.

Performance Notes

To get the best performance, you need to build with extended instructuion sets. If you are planning to use Juman++ only locally, specify -DCMAKE_CXX_FLAGS="-march=native".

Works best on Intel Haswell and newer processors (because of FMA and BMI instruction set extensions).

Model

See Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model. Hajime Morita, Daisuke Kawahara, Sadao Kurohashi. EMNLP 2015 link.

Authors

Arseny Tolmachev <arseny at kotonoha.ws>
Hajime Morita <hmorita at i.kyoto-u.ac.jp>
Daisuke Kawahara <dk at i.kyoto-u.ac.jp>
Sadao Kurohashi <kuro at i.kyoto-u.ac.jp>

Acknowledgement

The list of all libraries used by JUMAN++ is here.

Notice

This is a branch for the Juman++ rewrite. The original version lives in the legacy branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is Juman++

Installation

System Requirements

Building from a package

Building from git

Usage

Quick start

Main options

Input

Other

DEMO

Performance Notes

Model

Authors

Acknowledgement

Notice

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 12

Uh oh!

Languages

License

ku-nlp/jumanpp

Folders and files

Latest commit

History

Repository files navigation

What is Juman++

Installation

System Requirements

Building from a package

Building from git

Usage

Quick start

Main options

Input

Other

DEMO

Performance Notes

Model

Authors

Acknowledgement

Notice

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 12

Uh oh!

Languages

Packages