Corpus Query Language Engine

Presentation

This repo hosts the code for a simple CQL processor. CQL is a language used for linguistics queries over large corporas.

Pip install

pip3 install corpus-query-language

Uses

Two main functions are implemented:

match, for checking if some pattern exists in a corpus (stops at first match). Returns a boolean
findall, for finding the position of all matching tokens. Returns a list of tuples, with start and end position.

The corpus should take the form of a list of dictionnaries:

[
  {"word": "Da", 
  "lemma": "dar", 
  "pos": "VERB", 
  "morph": "Mood=Imp|Number=Sing|Person=2|Polite=Infm|VerbForm=Fin"}, 
  {"word": "paz", 
    "lemma": "paz", 
    "pos": "NOUN", 
    "morph": "Gender=Masc|Number=Sing"}
]

import sys
import corpus_query_language as CQL

query = "Some CQL query"
corpus = CQL.utils.import_corpus("path/to/json/corpus.json")
MyEngine = CQL.core.CQLEngine()
MyEngine.findall(corpus, query)
MyEngine.match(corpus, query)

Implemented CQL functions

parsing of any kind of annotation classes: word, lemma, pos, morph
combination of annotations: [lemma='rey' & pos='NCMP000']
one or zero annotations [lemma='rey']? (partially implemented, may produce errors).
distance [lemma='rey'][]{,5}[lemma='santo']
any regex in the annotation value [lemma='reye?s?']
alternatives: ([lemma='rey']|[lemma='príncipe'])[]{,5}[lemma='santo'] (may produce errors)

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Corpus Query Language Engine

Presentation

Pip install

Uses

Implemented CQL functions

About

Uh oh!

Releases 2

Packages

Languages

License

matgille/CQL

Folders and files

Latest commit

History

Repository files navigation

Corpus Query Language Engine

Presentation

Pip install

Uses

Implemented CQL functions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages