Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polishes and streamlines the record-level metadata section #24

Merged
merged 1 commit into from
Mar 15, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 40 additions & 64 deletions Record-level_metadata.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,37 @@
---
title: Record-level Metadata
teaching: 15
exercises: 20
objectives:
- Evaluate and rank the quality of existing metadata records.
- Describe the types of and importance of record level metadata.
- Compose an appropriate set of descriptive keywords for a given text.
keypoints:
- _TODO_
---

# Creating Record Level Metadata

Learning objectives:
## Metadata quality: Good - Better - Best

* Evaluate and rank the quality of existing metadata records.
* Describe the types of and importance of record level metadata.
* Compose an appropriate set of descriptive keywords for a given text.
### Exercise 1 - rank these Zenodo entries in terms metadata quality (7 minutes)

This is a continuation of the Exercise 3 in the [Documentation](documentation.md) section. Rank these from from 1 (most helpful/informative) to 3 (least helpful/informative):

## Ranking records
* MS Salmanpour. (2016). Data set [Data set]. Zenodo. http://doi.org/10.5281/zenodo.193025
* Solange Duruz. (2016). Simulated breed for GENMON [Data set]. Zenodo. http://doi.org/10.5281/zenodo.220887
* Zichen Wang, Avi Ma'ayan. Zika-RNAseq-Pipeline v0.1. Zenodo; 2016. http://doi.org/10.5281/zenodo.56311

### Exercise 1 - rank these Zenodo entries in terms of information quality (7 minutes)

Ranking worse to better zenodo records:

* https://zenodo.org/record/220887 - Simulated breed for GENMON
* https://zenodo.org/record/193080 - Raw data for a manuscript. Gives manuscript title.
* https://zenodo.org/record/215975
* https://zenodo.org/record/193025
* https://zenodo.org/record/158943 - data, jupyter notebook, video, and readme file for a paper (http://iopscience.iop.org/article/10.1088/0143-0807/38/1/015005/meta#footnotes)

Look through these three (TODO, [issue 1](https://github.com/Reproducible-Science-Curriculum/publication-RR-Jupyter/issues/1)) and rank them from 1 (most helpful/informative) to 3 (least helpful/informative). (5-7 minutes).

Place your green sticky up when you are done.

### Group discussion (5 minutes)

Tally for who ranked each record for being the most informative:

* [record number...]
* [record number...]
* [record number...]

Tally for who ranked each record for being the least informative:

* [record number...]
* [record number...]
* [record number...]

Discussion questions:
Discuss the results. Specifically, answer and discuss the following questions:

* What were the criteria that you used to rank?
* What was missing?
* What was the most helpful?
* What was the most critical piece of information?


## The metadata in your life

You're used to metadata within your research. You've got metadata about specific data points, observations, samples, etc. But there are many more parts of metadata.
You're used to metadata within your research. You've got metadata about specific data points, observations, samples, etc. But there are many more parts of metadata.

The information that you were looking at in the Zenodo records is metadata. Metadata about the dataset (record) on that page. Let's take a look at the pieces of these pages.

Expand All @@ -61,13 +44,13 @@ Point out where these pieces are:

This information is important because:

* people need to find your stuff
* people need to know what your stuff is
* People need to find your stuff
* People need to know what your stuff is

Let's think about the workflow of discovery, the user...

1. Searches for something
2. Reviews the results, based on the title and other information coming up in the search results pages
2. Reviews the results - is this the kind I was looking for, and if so, is it worth studying further?
3. Might add some filters to reduce and refine the results
4. Selects a record to review and goes to that record's page
5. Reviews the new information on this page, including the fuller description, keywords, and other readme/documentation files.
Expand All @@ -79,41 +62,34 @@ This person would continue to move through these steps so long as the informatio

## Keywords, best friend/worst enemy (7 minutes)

There are many places where you might need to add in tags and keywords about items. You do this for organizing your pictures, maybe your evernote notes or your Jira tickets.
There are many places where you might need to add in tags and keywords about items. You do this for organizing your pictures, maybe your electronic notes, or your issue tracker tickets.

The keywords you add need to be items:
* not too general that it doesn't disambiguate it from anything else or
* so specific that it would never be searched for.
* not so general that it doesn't disambiguate it from anything else,
* nor so specific that it would never be searched for.

The same keyword may count as too general, too specific, or just right depending on the platform you are using.

For example, can you think of times when the following keywords would be useful or not useful? Think about usage on GitHub issue tags, Evernote tags, article descriptive keywords, a relevant conference presentation, etc. (pick 2-3 that are relevant to this group and write them down).

* python
* data
* code
* (TODO: add in more example keywords that would be overly specific to specific domains)

For example, tagging a task in your ticketing system as Python might be relevant if it is about a python lesson development when you normally work in other languages. It may not be that useful for a ticket about a python project.

### Exercise 2: Picking keywords for the gapminder data

We're going to take a more critical look at the gapminder dataset that we've been using over the course of the workshop. Imagine that you are finishing up a project on this dataset and preparing to deposit your data and Jupyter Notebook into something like Zenodo. You've been asked to come up with a set of keywords to describe your deposit.

#### Step 1: You pick out keywords (3 minutes)

* Individually pick max five keywords for the gapminder dataset.
### Exercise 2: What makes a good keyword a good keyword?
For example, assume you have a series of Jupyter Notebooks you were going to deposit in an archive in support of a manuscript you are publishing on the publication genetics of BRCA1 alleles. Which of the following keywords would be useful or not useful?

You are free to pick any five (or fewer) keywords that you feel are relevant.
* iPython
* reproducible research
* computer code
* BRCA1
* population genetics
* cancer

#### Step 2: Check with your partner (3 minutes)
What kind of context could change your answers?

* Work with your partner, and together decide on a new set of five (or fewer) keywords.
### Exercise 3: Picking keywords for the gapminder data

You are free to mix and match between either of your keyword sets, or you might want to create new ones. You must decide and agree on your new keywords.
Imagine that you are finishing up a project on the gapminder dataset that we've been using over the course of the workshop. You are preparing to deposit the dataset and the Jupyter Notebooks into an archive such as Zenodo. The submission interface allows you to provide a set of keywords to descrive your deposit, and you want to maximize the impact of your deposit by allowing those for whom it would be useful to find it.

#### Step 3: Check with the room (5 minutes)
You will work with a partner (or a small group).

* Each pair (or other small group) places their keywords into the etherpad and the room reviews.
* Step 1: You pick (on your own) a set of at most 5 keywords (3 minutes)
* Step 2: Compare with your partner, and together decide on a new set of at most 5 keywords. You are free to mix and match between either of your keyword sets, or you might want to create new ones. You must decide and agree on your new keywords. (3 minutes)
* Step 3: Each pair (or small group) places their keywords into the etherpad and then the room reviews.

The entire room now decides on a single set of 5 keywords. Again, this may be some form of union or new creation.