title | teaching | exercises | objectives | keypoints | ||||
---|---|---|---|---|---|---|---|---|
Record-level Metadata |
15 |
20 |
|
|
This is a continuation of the Exercise 3 in the Documentation section. Rank these from from 1 (most helpful/informative) to 3 (least helpful/informative):
- MS Salmanpour. (2016). Data set [Data set]. Zenodo. http://doi.org/10.5281/zenodo.193025
- Solange Duruz. (2016). Simulated breed for GENMON [Data set]. Zenodo. http://doi.org/10.5281/zenodo.220887
- Zichen Wang, Avi Ma'ayan. Zika-RNAseq-Pipeline v0.1. Zenodo; 2016. http://doi.org/10.5281/zenodo.56311
Discuss the results. Specifically, answer and discuss the following questions:
- What were the criteria that you used to rank?
- What was missing?
- What was the most helpful?
- What was the most critical piece of information?
You're used to metadata within your research. You've got metadata about specific data points, observations, samples, etc. But there are many more parts of metadata.
The information that you were looking at in the Zenodo records is metadata. Metadata about the dataset (record) on that page. Let's take a look at the pieces of these pages.
Point out where these pieces are:
- Title
- Authors
- Description
- Keywords
This information is important because:
- People need to find your stuff
- People need to know what your stuff is
Let's think about the workflow of discovery, the user...
- Searches for something
- Reviews the results - is this the kind I was looking for, and if so, is it worth studying further?
- Might add some filters to reduce and refine the results
- Selects a record to review and goes to that record's page
- Reviews the new information on this page, including the fuller description, keywords, and other readme/documentation files.
- Downloads and digs in to the data files
This person would continue to move through these steps so long as the information continues to look sufficiently interesting.
<iframe src="https://zenodo.org/record/158943" width=700 height=350></iframe>There are many places where you might need to add in tags and keywords about items. You do this for organizing your pictures, maybe your electronic notes, or your issue tracker tickets.
The keywords you add need to be items:
- not so general that it doesn't disambiguate it from anything else,
- nor so specific that it would never be searched for.
The same keyword may count as too general, too specific, or just right depending on the platform you are using.
For example, assume you have a series of Jupyter Notebooks you were going to deposit in an archive in support of a manuscript you are publishing on the publication genetics of BRCA1 alleles. Which of the following keywords would be useful or not useful?
- iPython
- reproducible research
- computer code
- BRCA1
- population genetics
- cancer
What kind of context could change your answers?
Imagine that you are finishing up a project on the gapminder dataset that we've been using over the course of the workshop. You are preparing to deposit the dataset and the Jupyter Notebooks into an archive such as Zenodo. The submission interface allows you to provide a set of keywords to descrive your deposit, and you want to maximize the impact of your deposit by allowing those for whom it would be useful to find it.
You will work with a partner (or a small group).
- Step 1: You pick (on your own) a set of at most 5 keywords (3 minutes)
- Step 2: Compare with your partner, and together decide on a new set of at most 5 keywords. You are free to mix and match between either of your keyword sets, or you might want to create new ones. You must decide and agree on your new keywords. (3 minutes)
- Step 3: Each pair (or small group) places their keywords into the etherpad and then the room reviews.
The entire room now decides on a single set of 5 keywords. Again, this may be some form of union or new creation.