Anton Antonov
MathematicaForPrediction at GitHub
MathematicaVsR at GitHub
January 2017
The file "src/Trie.java"
contains the definition of the class Trie
.
The file "src/TrieFunctions.java" has implementations of a variety of functions that can used over tries.
The file "src/Experiments.java" is only used to do sanity check tests over the implementations.
We call a trie "word" a list of strings.
The Mathematica package JavaTriesWithFrequencies.m provides functions for utilizing the implemented Java Trie functionalities.
The test file JavaTriesWithFrequencies-Unit-Tests.wlt provides unit tests for JavaTriesWithFrequencies.m.
In order to use the package the corresponding .jar file must be made -- see the next section.
In order to use the defined Java functions in Mathematica the following steps have to be taken.
In the local directory "src" execute the following commands:
src> mkdir build
src> javac -d ./build *.java; cd build; jar cvf ../../TriesWithFrequencies.jar *; cd ../
(Skip the first line if you have the directory "src/build" already.)
$JavaTriesWithFrequenciesPath = "<<path>>/MathematicaForPrediction/Java/TriesWithFrequencies";
Needs["JLink`"];
AddToClassPath[$JavaTriesWithFrequenciesPath];
ReinstallJava[JVMArguments->"-Xmx2g"]
LoadJavaClass["java.util.Collections"];
LoadJavaClass["java.util.Arrays"];
LoadJavaClass["Trie"];
LoadJavaClass["TrieFunctions"];
Get dictionary words starting with "b":
dWords = DictionaryLookup["b*"];
Length[dWords]
(* 4724 *)
Create a trie with the words:
Block[{},
(* Make a list of words. *)
jWords = MakeJavaObject[dWords];
jWords = Arrays`asList[jWords];
(* Make a string object (that represents a spliting regexp pattern). *)
jSp = MakeJavaObject[""];
(* Create the trie specifying the words to be split into characters. *)
jTr = TrieFunctions`createBySplit[jWords, jSp];
(* Optionally convert the node frequencies into probabilties. *)
(*jTr=TrieFunctions`nodeProbabilities[jTr]*)
];
Get the sub-trie that corresponds to "bark":
jSubTr = TrieFunctions`retrieve[jTr, Arrays`asList[MakeJavaObject[Characters["bark"]]]]
(* JLink`Objects`vm4`JavaObject17330643155288065 *)
Get JSON form of the sub-trie:
ImportString[jSubTr@toJSON[], "JSON"]
(* {"value" -> 10., "key" -> "k",
"children" -> {{"value" -> 1., "key" -> "s",
"children" -> {}}, {"value" -> 7., "key" -> "e",
"children" -> {{"value" -> 2., "key" -> "r",
"children" -> {{"value" -> 1., "key" -> "s",
"children" -> {}}}}, {"value" -> 1., "key" -> "d",
"children" -> {}}, {"value" -> 4., "key" -> "e",
"children" -> {{"value" -> 4., "key" -> "p",
"children" -> {{"value" -> 1., "key" -> "s",
"children" -> {}}, {"value" -> 2., "key" -> "e",
"children" -> {{"value" -> 2., "key" -> "r",
"children" -> {{"value" -> 1., "key" -> "s",
"children" -> {}}}}}}}}}}}}, {"value" -> 1.,
"key" -> "i",
"children" -> {{"value" -> 1., "key" -> "n",
"children" -> {{"value" -> 1., "key" -> "g",
"children" -> {}}}}}}}} *)
If we load the package TriesWithFrequencies.m :
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/TriesWithFrequencies.m"]
we can visualize the obtained sub-trie (Java object) using the function ToTrieFromJSON
and TrieForm
:
TrieForm@ToTrieFromJSON@ImportString[jSubTr@toJSON[], "JSON"]
TBD...