This project provides a CLi and a Python package for generating unnatural language.
It uses Markov chains to make directed graph of words and then generates sentences.
Each edge contains count
of how many times two nodes it connects were found in text.
During text generation count
is used to randomly choose next word.
CLI has two main commands: generate
and parse
.
$ textgen --help
Usage: textgen [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
generate generate text from a graph
parse generate a graph from text files
Parse command requires you to specify at least one text file to process and creates a JSON-file with graph's contents.
$ textgen parse --help
Usage: textgen parse [OPTIONS]
Options:
-in, --input FILE text files to parse (can be repeated) [required]
-out, --output FILE path to file to export graph into [required]
-g, --graph FILE path to an existing graph file
-o, --order INTEGER order of graph (read README to know more) [default: 1]
--help Show this message and exit.
Generate command requires an exisiting JSON-file of a graph to generate a block of text. You can specify amount of words to generate.
$ textgen generate --help
Usage: textgen generate [OPTIONS]
Options:
-g, --graph FILE path to the graph file [required]
-w, --words INTEGER number of words to generate [default: 255]
-o, --order INTEGER order of graph (read README to know more) [default: 1]
--help Show this message and exit.
Order is how many words single node contains. Increasing the order usually also increases grammatical correctness of text generated, but it also makes it harder to generate text, since some nodes will not have outgoing edges.
For example, graph with order equal to 1 will look like this:
he -> is 24
he -> was 19
he -> may 5
...
may -> be 11
may -> have 2
...
But graph with order equal to 3 will look like this:
with him to -> go 5
with him to -> run 1
...
him to go -> away 3
him to go -> somewhere 1
...
With higher order, text will look more natural, but it will also require more input text to work. With lower order, you can generate a lot of nonsense text.
Thanks to @danbst for the idea.
Cool article that shows how to do the same thing: yurichev.com
Download .txt
books there: Gutenberg library