forked from chromium/chromium
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
176 lines (121 loc) · 5.04 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
About Hunspell
==============
NOTICE: Verison 2 is in the works. For contributing see
[version 2 specification][v2spec] and the folder `src/hunspell2`.
[v2spec]: https://github.com/hunspell/hunspell/wiki/Version-2-Specification
Hunspell is a spell checker and morphological analyzer library and program
designed for languages with rich morphology and complex word compounding or
character encoding. Hunspell interfaces: Ispell-like terminal interface
using Curses library, Ispell pipe interface, C++ class and C functions.
Hunspell's code base comes from the OpenOffice.org MySpell
(http://lingucomponent.openoffice.org/MySpell-3.zip). See README.MYSPELL,
AUTHORS.MYSPELL and license.myspell files.
Hunspell is designed to eventually replace Myspell in OpenOffice.org.
Main features of Hunspell spell checker and morphological analyzer:
- Unicode support (affix rules work only with the first 65535 Unicode
characters)
- Morphological analysis (in custom item and arrangement style) and stemming
- Max. 65535 affix classes and twofold affix stripping (for agglutinative
languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.)
- Support complex compoundings (for example, Hungarian and German)
- Support language specific features (for example, special casing of
Azeri and Turkish dotted i, or German sharp s)
- Handle conditional affixes, circumfixes, fogemorphemes,
forbidden words, pseudoroots and homonyms.
- Free software. Versions 1.x are licenced under LGPL, GPL, MPL tri-license.
Version 2 is licenced only under GNU LGPL.
Compiling on Unix/Linux and others
==================================
autoreconf -vfi
./configure
make
make install #if neccesary prefix with sudo
ldconfig #not needed on windows, on linux sudo may be needed
For dictionary development, use the --with-warnings option of configure.
For interactive user interface of Hunspell executable, use the --with-ui option.
The developer packages you need to compile Hunspell's interface:
autoconf automake autopoint libtool g++
optional developer packages:
- ncurses (need for --with-ui), eg. libncursesw5 for UTF-8
- readline (for fancy input line editing,
configure parameter: --with-readline)
- locale and gettext (but you can also use the
--with-included-gettext configure parameter)
Compiling on Windows
====================
1. Compiling with Mingw64 and MSYS2
-----------------------------------
Download Msys2, update everything and install the following packages:
pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool
Open Mingw-w64 Win64 prompt and compile the same way as on Linux, see above.
2. Compiling in Cygwin environment
----------------------------------
Download and install Cygwin environment for Windows with the following
extra packages:
- make
- automake
- autoconf
- gcc-g++ development package
- ncurses, readline (for user interface)
- iconv (character conversion)
###3.1. Cygwin1.dll dependent compiling
Same as on Linux.
Testing
========
Testing Hunspell (see tests in tests/ subdirectory):
make check
or with Valgrind debugger:
make check
VALGRIND=[Valgrind_tool] make check
For example:
make check
VALGRIND=memcheck make check
Documentation
=============
features and dictionary format:
man 5 hunspell
man hunspell
hunspell -h
http://hunspell.github.io/
Usage
=====
The src/tools dictionary contains ten executables after compiling:
- affixcompress: dictionary generation from large (millions of words)
vocabularies
- analyze: example of spell checking, stemming and morphological analysis
- chmorph: example of automatic morphological generation and conversion
- example: example of spell checking and suggestion
- hunspell: main program for spell checking and others (see manual)
- hunzip: decompressor of hzip format
- hzip: compressor of hzip format
- makealias: alias compression (Hunspell only, not back compatible with MySpell)
- munch: dictionary generation from vocabularies (it needs an affix file, too).
- unmunch: list all recognized words of a MySpell dictionary
- wordforms: word generation (Hunspell version of unmunch)
After compiling and installing (see INSTALL) you can
run the Hunspell spell checker (compiled with user interface)
with a Hunspell or Myspell dictionary:
hunspell -d en_US text.txt
or without interface:
hunspell
hunspell -d en_UK -l <text.txt
Dictionaries consist of an affix and dictionary file, see tests/
or http://wiki.services.openoffice.org/wiki/Dictionaries.
Using Hunspell library with GCC
===============================
Including in your program:
#include <hunspell.hxx>
Linking with Hunspell static library:
g++ -lhunspell example.cxx
Dictionaries
------------
Myspell & Hunspell dictionaries:
- http://extensions.libreoffice.org
- http://cgit.freedesktop.org/libreoffice/dictionaries
- http://extensions.openoffice.org
- http://wiki.services.openoffice.org/wiki/Dictionaries
Aspell dictionaries (need some conversion):
- ftp://ftp.gnu.org/gnu/aspell/dict
Conversion steps: see relevant feature request at http://hunspell.github.io/ .
László Németh
nemeth at numbertext org