Skip to content
This repository has been archived by the owner on Sep 3, 2023. It is now read-only.

Write Data Statement for the Gold Dataset #64

Open
4 tasks
malteserteresa opened this issue Dec 15, 2019 · 0 comments
Open
4 tasks

Write Data Statement for the Gold Dataset #64

malteserteresa opened this issue Dec 15, 2019 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation high priority large task
Milestone

Comments

@malteserteresa
Copy link
Member

malteserteresa commented Dec 15, 2019

Objective
Write a README.md for our gold dataset.

Description
Currently our datasets are combined and confusing. We need to write a basic data statement, following the guidance in this paper. What we should include in a basic data statement is:

  • Curation: need the search terms and the API used by twitter
  • Annotator Demographic
  • Speech Situation: time, place, modality, scripted, edited, async/sync
  • Text Characteristics: genre, topic
  • Other: such as collector demographics
  • Provenance: other datasets in the dataset

Skills

Dependencies
gold dataset

Time Estimation

Tasks

  • Find search terms used to collect data, annotator demographics
  • Run topic modeling on scripts to identify text characteristics
  • Find our date of collection
  • Find out what data is contained within this dataset
@malteserteresa malteserteresa modified the milestones: 0.1, 0.2 Dec 15, 2019
@malteserteresa malteserteresa changed the title [WIP] Write Basic Data Statement Write Basic Data Statement Dec 15, 2019
@andra-pumnea andra-pumnea self-assigned this Mar 8, 2020
@andra-pumnea andra-pumnea changed the title Write Basic Data Statement Write Data Statement for the Gold Dataset Mar 8, 2020
andra-pumnea pushed a commit that referenced this issue Mar 8, 2020
@lucie-docs lucie-docs added the documentation Improvements or additions to documentation label Apr 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation high priority large task
Projects
None yet
Development

No branches or pull requests

3 participants