Copyright (C) 2011-2017 mailto:onlyuser@gmail.com
parse-english is a minimum viable English parser implemented in LexYacc. It parses in parallel all possible interpretations of an English sentence accepted by a grammar and generates abstract syntax trees for successful parses. The algorithm is completely deterministic. No training data is required.
See old version here: NatLang
input:
the quick brown fox jumps over the lazy dog.
output:
cd ./demo/0_parse-english_full_nlp ./demo.sh "the quick brown fox jumps over the lazy dog"
| Switch | Description |
|---|---|
| -e SENTENCE | input sentence |
| -l | Lisp mode |
| -g | graph mode (slow for deep trees) |
| -d | dot mode |
| -x | extract ontology mode |
| -q | quiet mode |
| -m | memory debug |
| -n | indent lisp |
Unix tools and 3rd party components (accessible from $PATH):
gcc flex bison
- Parallel reentrant parsing
- Lisp / graph / dot output (multiple trees)
- Present tense
- Progressive tense
- Future tense
- Past tense
- Past perfect tense
- Passive voice
- Questions
- Conditionals
- Imperitive mood
- Comparisons
- Hard coded grammar & vocabulary.
- A brute force algorithm tries all supported interpretations of a sentence. This is slow for long sentences.
- BNF rules are suitable for specifying constituent-based phrase structure grammars, but are a poor fit for expressing non-local dependencies.
| target | action |
|---|---|
| all | make binaries |
| test | all + run tests |
| pure | test + use valgrind to check for memory leaks |
| dot | test + generate .png graph for tests |
| lint | use cppcheck to perform static analysis on .cpp files |
| doc | use doxygen to generate documentation |
| xml | test + generate .xml for tests |
| import | test + use ticpp to serialize-to/deserialize-from xml |
| clean | remove all intermediate files |
- "Part-of-speech tagging"
- http://en.wikipedia.org/wiki/Part-of-speech_tagging
- "Princeton WordNet"
- http://wordnet.princeton.edu/
- "Syntactic Theory: A Unified Approach"
- ISBN: 0340706104
- "Enju - A fast, accurate, and deep parser for English"
- http://www.nactem.ac.uk/enju/
Natural Language Processing, English parser, Yacc, BNF

