This repository was archived by the owner on Jul 22, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 19
Visualization Interface Files and CountingGridsPy Library
Spencer Buja edited this page Jun 22, 2019
·
2 revisions
Internally, BrowseCloud requires several files to startup. The files have serialized data about model parameters, about the training data set, and about how to color the words in the visualization.
| File in <path_to_output_folder></path_to_output_folder> | Description |
| colors_browser.txt | Feature maps on top of the word clustering requires custom implementations. A way this can be done is by rewriting the main driver script without doing learning, but repeating the preprocessing work. Default behavior is to just make every region blue, otherwise. row, column, red, green, blue example 1 1 0.37702 0.62298 0 |
| correspondences.txt | Mapping from raw word/token to a synonymous words using lemmatization (e.g. "did" => "do"). raw word, word, word id |
| database.txt | id, title, abstract (AKA verbatim), link, image, layer |
| docmap.txt | The model has a distribution over mappings from documents to locations in the grid. row:<row_number_indexed_by_1></row_number_indexed_by_1> col:<col_number_indexed_by_1></col_number_indexed_by_1> [<doc_id></doc_id>:<q]probability>:<layer>] |
| top_pi.txt | The three dimensional tensor, called pi, has a discrete probability over the words at that index in the index. In this file, we write down the most likely words and their corresponding probability in each position, as long as their probability is above some threshold. |
| top_pi_layers.txt | The layered extension of top_pi.txt. |
| legend.txt (optional) | A legend object reads this data and pops up a legend in the visualization UI with this information. Using two points, we find linear function that maps a real number to RGB, which returns a curve in 3D space. <label_1></label_1> <label_2></label_2> example Dev IC 0.9882352941176471 0.984313725490196 0.9921568627450981 Dev Lead 0.24705882352941178 0.0 0.49019607843137253 |
| vocabulary.txt | What are the words that are used in the corpus? word id, word example 313 camera |
| words.txt | id:<word_id></word_id> [<doc_id></doc_id>:<count></count>] example id:1 412:1 747:1 997:1 998:1 1094:1 1173:1 1400:1 1653:2 |
Other artifacts are created for caching purposes.
- cg-processed.csv
- CountingGridDataMatrices.mat
- cached_correspondences.tsv