@@ -24,10 +24,12 @@ sist2 (Simple incremental search tool)
24
24
* Recursive scan inside archive files \*\*
25
25
* OCR support with tesseract \*\*\*
26
26
* Stats page & disk utilisation visualization
27
+ * Named-entity recognition (client-side) \*\*\*\*
27
28
28
29
\* See [ format support] ( #format-support )
29
30
\*\* See [ Archive files] ( #archive-files )
30
31
\*\*\* See [ OCR] ( #ocr )
32
+ \*\*\*\* See [ Named-Entity Recognition] ( #NER )
31
33
32
34
## Getting Started
33
35
@@ -56,7 +58,7 @@ services:
56
58
entrypoint : python3 /root/sist2-admin/sist2_admin/app.py
57
59
` ` `
58
60
59
- Navigate to http://localhost:8080/ to configure sist2-admin.
61
+ Navigate to http://localhost:8080/ to configure sist2-admin.
60
62
61
63
### Using the executable file *(Linux/WSL only)*
62
64
@@ -67,10 +69,9 @@ Navigate to http://localhost:8080/ to configure sist2-admin.
67
69
docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.17.9
68
70
```
69
71
70
- 2. Download the [latest sist2 release](https://github.com/simon987/sist2/releases).
71
- Select the file corresponding to your CPU architecture and mark the binary as executable with `chmod +x`.
72
- 3. See [usage guide](docs/USAGE.md) for command line usage.
73
-
72
+ 2. Download the [latest sist2 release](https://github.com/simon987/sist2/releases).
73
+ Select the file corresponding to your CPU architecture and mark the binary as executable with `chmod +x`.
74
+ 3. See [usage guide](docs/USAGE.md) for command line usage.
74
75
75
76
Example usage :
76
77
@@ -124,7 +125,7 @@ The `simon987/sist2` image comes with common languages
124
125
(hin, jpn, eng, fra, rus, spa, chi_sim, deu) pre-installed.
125
126
126
127
You can use the `+` separator to specify multiple languages. The language
127
- name must be identical to the `*.traineddata` file installed on your system
128
+ name must be identical to the `*.traineddata` file installed on your system
128
129
(use `chi_sim` rather than `chi-sim`).
129
130
130
131
Examples :
@@ -135,6 +136,29 @@ sist2 scan --ocr-images --ocr-lang eng ~/Images/Screenshots/
135
136
sist2 scan --ocr-ebooks --ocr-images --ocr-lang eng+chi_sim ~/Chinese-Bilingual/
136
137
` ` `
137
138
139
+ # ## NER
140
+
141
+ sist2 v3.0.4+ supports named-entity recognition (NER). Simply add a supported repository URL to
142
+ **Configuration** > **Machine learning options** > **Model repositories**
143
+ to enable it.
144
+
145
+ The text processing is done in your browser, no data is sent to any third-party services.
146
+ See [simon987/sist2-ner-models](https://raw.githubusercontent.com/simon987/sist2-ner-models/main/repo.json) for more details.
147
+
148
+ # ### List of available repositories:
149
+
150
+ | URL | Maintainer | Purpose |
151
+ |---------------------------------------------------------------------------------------------------------|-----------------------------------------|---------|
152
+ | [simon987/sist2-ner-models](https://raw.githubusercontent.com/simon987/sist2-ner-models/main/repo.json) | [simon987](https://github.com/simon987) | General |
153
+
154
+
155
+ <details>
156
+ <summary>Screenshot</summary>
157
+
158
+ 
159
+
160
+ </details>
161
+
138
162
# # Build from source
139
163
140
164
You can compile **sist2** by yourself if you don't want to use the pre-compiled binaries
0 commit comments