-
Notifications
You must be signed in to change notification settings - Fork 19
BrowseCloud Input Examples
There are two BrowseCloud file types: Simple Input and Metadata Input.
For the site accessible by Microsoft employees, the maximum file size is 20,000 lines, and the minimum file size is 1,000 lines. The system takes longer to run depending on the number of lines and the number of unique words in the data set.
The recommended way to train a BrowseCloud model is with a "simple input" file. It is a tab-delimited file with three columns and ".txt" file extension. You can get this easily by exporting from Excel. Do not include a header and column order matters.
Columns:
- Title - A brief description or context of the content. For survey data, this is often the survey question. This text is included in the model.
- Content - This is the free-form text that you wish to model with BrowseCloud.
- Link - Please provide a link to metadata.
Another input file type is "metadata input". It is a CSV file with three or more columns and ".csv" file extension. Use this file type if you want to understand the relationship between other variables and the documents in the visualization. Include a header row and column order does not matter.
Columns:
- title - A brief description or context of the content. For survey data, this is often the survey question. This text is included in the model.
- abstract - This is the free-form text that you wish to model with BrowseCloud.
- link - Please provide a link to metadata.
- Categorical Data with two categories (e.g. "hotdog" vs. "not hotdog")
- Ordinal Data expressed in numeric form (e.g. 3-level Likert-scale data where ["Dissatisfied","Neutral","Satisfied"] => [1,2,3])
- Numerical data
If you have no title or link, leave them blank.