This is Go module for processing text. Currently only support word splitter.
https://godoc.org/github.com/yusufsyaifudin/txtproc
go get -v github.com/yusufsyaifudin/txtproc
Then use import module ysf/txtproc
in import path.
Example:
package main
import (
"context"
"fmt"
"log"
"ysf/txtproc"
)
func main() {
text := "This is a words collection."
words, err := txtproc.WordSeparator(context.Background(), text)
if err != nil {
log.Fatal(err.Error())
return
}
for _, word := range words {
fmt.Println(word.GetOriginalText())
}
}
it should print:
This
is
a
words
collection.
Benchmark on Macbook Pro 16GB, Quad-Core Intel Core i5 2.4Ghz
go test -bench=.
goos: darwin
goarch: amd64
pkg: ysf/txtproc
BenchmarkWordSeparator_1Word-8 2270744 531 ns/op
BenchmarkWordSeparator_100Words-8 18622 62381 ns/op
BenchmarkWordSeparator_200Words-8 9567 125014 ns/op
PASS
ok ysf/txtproc 5.811s
- Word Splitter (split by space, tab, new line)
- Word Replacer (replace word with DIY Replacer function)
- Profanity Filter
- Testing/Mock implementation of interface
- word-level n-grams, for example if text written as
a s u
it will not be detected, but using n-gram, we can detect it using the versiona s u
,as u
,a su
andasu
- Leet speak