Linguistic search for large annotated text corpora, based on Apache Lucene
-
Updated
Dec 12, 2025 - Java
Linguistic search for large annotated text corpora, based on Apache Lucene
Reading the data from OPIEC - an Open Information Extraction corpus
Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)
Naive Bayes classifier is classification algorithm. It uses Naive based Bernoulli and Multinomial equation to classify documents(Text) as ham or spam.
📖 Probabilistic model and Deep Learning based Korean NLP Engine
A text management tool for linguistic purposes...
This repository contains program source code of a converter that can transform Kiel Corpus files into standardised TEI-XML files.
QuoVadis: annotation of Entities and Relations, initial Ph.D. work
Uses markov chains and a corpus of text to respond to conversation
Code for my BSc thesis: Cleaning of Parallel Texts for Machine Translation
an search engine for classic Chinese poetry
⛏️📄 Script to scrape all files linked on a textfiles.com page
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."