Skip to content
This repository was archived by the owner on Jul 22, 2024. It is now read-only.

BrowseCloud Input Examples

Spencer Buja edited this page Jun 22, 2019 · 6 revisions

There are two BrowseCloud file types: Simple Input and Metadata Input.

For the site accessible by Microsoft employees, the maximum file size is 20,000 lines, and the minimum file size is 1,000 lines. The system takes longer to run depending on the number of lines and the number of unique words in the data set.

Simple Input

The recommended way to train a BrowseCloud model is with a "simple input" file. It is a tab-delimited file with three columns and ".txt" file extension. You can get this easily by exporting from Excel. Do not include a header and column order matters.

Columns:

  1. Title - A brief description or context of the content. For survey data, this is often the survey question. This text is included in the model.
  2. Content - This is the free-form text that you wish to model with BrowseCloud.
  3. Link - Please provide a link to metadata.
Here is an example.

Metadata Input

Another input file type is "metadata input". It is a CSV file with three or more columns and ".csv" file extension. Use this file type if you want to understand the relationship between other variables and the documents in the visualization. Include a header row and column order does not matter.

Columns:

  1. title - A brief description or context of the content. For survey data, this is often the survey question. This text is included in the model.
  2. abstract - This is the free-form text that you wish to model with BrowseCloud.
  3. link - Please provide a link to metadata.
Types of metadata that we can plot:
  1. Categorical Data with two categories (e.g. "hotdog" vs. "not hotdog")
  2. Ordinal Data expressed in numeric form (e.g. 3-level Likert-scale data where ["Dissatisfied","Neutral","Satisfied"] => [1,2,3])
  3. Numerical data

All

If you have no title or link, leave them blank.

Clone this wiki locally