- Bugfixes
- New snowball stemmer
- A vector of strings is equivalent to a corpus for DTM, inverse index, lexicon
- Lexicon, inverse index creation functions
- ngram complexity specification support
- Sparse/Frequent terms stripping fixes
- Tokenization fixes
- Stemmer fixes
- Bugfix release
- Bugfix release
- Improved LSA embedding performance
- AbstractMetadata support
- All forms of DTVs are sparse
- DTMs, COOMs are immutable
- Performance improvementss
- DTM document vectors are columns
- Tokenizer can be specified in some methods
- Regex based DTV's
- Additional documentation
- svd fallback in LSA
- COOM performance improvement
- Bugfixes
- Added
:count
option to LSA, RP models - No projection hack for RP models
- More documentation
- Added Co-occurrence matrix
- Refined LSA, RP models
- More embedding methods
- Small bugfixes, improvements
- Added sparse random projections
- Bugfixes
- Preprocessing improvements
- Additional documentation
- LSA models can be saved/loaded
- Small additions
- Improved LSA
- Expanded online documentation
- Improved latent semantic analysis (LSA)
- Online documentation with Documenter.jl
- Typing improvements
- Added support for Vector element type in DTV iteration
- Made
AbstractDocument
a parametric type - Extended test coverage
- Bugfixes
- Many fixed bugs and inconsistencies
- Added bm25 ranking, tweaked tf-idf
- Extended tokenization and stemming methods
- Extended pre-processing API
- Extended document metadata
- Extended test coverage
- Simplified API i.e. removed sentiment analysis, lots of deps
- Inital version, very similar to TextAnalysis, commit:8517fe2
- Not released