Visualization Interface Files and CountingGridsPy Library

Output Files

Internally, BrowseCloud requires several files to startup. The files have serialized data about model parameters, about the training data set, and about how to color the words in the visualization.

File in <path_to_output_folder></path_to_output_folder>	Description
colors_browser.txt	Feature maps on top of the word clustering requires custom implementations. A way this can be done is by rewriting the main driver script without doing learning, but repeating the preprocessing work. Default behavior is to just make every region blue, otherwise. row, column, red, green, blue example 1 1 0.37702 0.62298 0
correspondences.txt	Mapping from raw word/token to a synonymous words using lemmatization (e.g. "did" => "do"). raw word, word, word id
database.txt	id, title, abstract (AKA verbatim), link, image, layer
docmap.txt	The model has a distribution over mappings from documents to locations in the grid. row:<row_number_indexed_by_1></row_number_indexed_by_1> col:<col_number_indexed_by_1></col_number_indexed_by_1> [<doc_id></doc_id>:<q]probability>:<layer>]
top_pi.txt	The three dimensional tensor, called pi, has a discrete probability over the words at that index in the index. In this file, we write down the most likely words and their corresponding probability in each position, as long as their probability is above some threshold.
top_pi_layers.txt	The layered extension of top_pi.txt.
legend.txt (optional)	A legend object reads this data and pops up a legend in the visualization UI with this information. Using two points, we find linear function that maps a real number to RGB, which returns a curve in 3D space. <label_1></label_1> <label_2></label_2> example Dev IC 0.9882352941176471 0.984313725490196 0.9921568627450981 Dev Lead 0.24705882352941178 0.0 0.49019607843137253
vocabulary.txt	What are the words that are used in the corpus? word id, word example 313 camera
words.txt	id:<word_id></word_id> [<doc_id></doc_id>:<count></count>] example id:1 412:1 747:1 997:1 998:1 1094:1 1173:1 1400:1 1653:2

Other artifacts are created for caching purposes.

cg-processed.csv
CountingGridDataMatrices.mat
cached_correspondences.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Visualization Interface Files and CountingGridsPy Library

Output Files

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally