A toolset for Amharic Language pre-processing. Includes an Amharic Stemmer, Transliterator, Stopword remover , Lexical analyzer, Corpus indexer and Term weighter.
-
Updated
May 27, 2023 - TypeScript
A toolset for Amharic Language pre-processing. Includes an Amharic Stemmer, Transliterator, Stopword remover , Lexical analyzer, Corpus indexer and Term weighter.
BlackLab Frontend, a feature-rich corpus search interface for BlackLab.
Online parallel text alignment tool.
Quickly check how something is translated into a conlang
doc2vec-based assisted close reading with support for abstract concept-based search and context-based search
GenAi Tokenizer is an interactive tokenizer playground to explore how text breaks into tokens, how unique token IDs are assigned, and how decoding works — all powered by a custom tokenizer.
The user interface for the Corpus & Repository of Writing, built in Angular
🤖 Build a documents-based AI assistant that uses RAG architecture for accurate technical responses and combines local processing with cloud-based services.
🧠 Explore tokenization with GenAi-Tokenizer, a user-friendly tool for decoding text, learning vocabulary, and visualizing token types effortlessly.
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."