Skip to content

katiebuntic/intro-to-research-data-management-carpentries

Repository files navigation

Introduction to Research Data Management Lesson

🚧 This lesson is under development.

Visit the lesson built from this repository.

Lesson Description

This lesson aims to teach those just starting to undertake research how to manage their data and files.

Target Audience

  • Masters/PhD/Postdoc researchers at the beginning of their projects.
  • Basic digital skills required (e.g., file management, Excel, some version control exposure).
  • No programming experience necessary.

Prerequisites

  • Basic Excel use (open/save tables)
  • File/folder management on a computer
  • A research project or dataset in progress

Learning Objectives

After completing this course, the learners should be able to:

  • Define research data and distinguish between different data types.
  • Structure research materials using clear file naming conventions and a logical folder hierarchy
  • Describe methods of data collection that make data cleaner and easier to analyse
  • Detect inconsistencies and errors in a tabular dataset ("dirty data")
  • Use a set of basic techniques to remove/correct errors and inconsistencies in tabular data ("cleaning data")
  • Use version control to track different versions of files, and switch between them.

Maintainer(s)

Current Maintainers of this lesson are:

Dataset & Narrative

Dataset: MET Dataset, this is a subset of the original dataset.

  • Link to original dataset: https://github.com/metmuseum/openaccess
  • Size:
  • Types: CSV file including string and numerical data
  • Requires noise/messiness injection for teaching
  • Licensing: CC0 1.0 Universal

Episodes

🚧 This needs some work.

1. What is Research Data?

  • Data types
  • Sources of data
  • What is research data management (collection, storage, organisation, sharing, etc)?

Need to write objectives

2. Structuring research materials

  • Naming conventions
  • Folder structures
  • Version Control
  • Introduction to version control software, Git/ Github

Objectives

After following this episode, learners will be able to:

  • Organise their research data into a standard folder structure
  • Name files with a consistent naming convention
  • Understand why version control is important, and how to incorporate this into your naming conventions
  • Explain why version control software such as Git/GitHub can be useful for certain types of data.

3. Tabular data collection

  • Have a look at a 'dirty' data set
  • Is there a standard set of responses?
  • Is it free text?
  • How do you control what data is being collected?
  • Asking the right questions
  • Data dictionaries

Objectives

After following this episode, learners will be able to:

  • List variable types and formats
  • Identify inconsistencies in data that can cause problems during analysis
  • Describe methods that can be used during data collection and data entry that can prevent inconsistencies
  • Write guidance for how to collect and enter data
  • Create a data dictionary describing a dataset

4. How to clean a tabular dataset (using Excel)

  • Finding inconsistencies
  • Missing data
  • Capitalisation
  • Spelling mistakes
  • Pros and cons of Excel

Objectives

After following this episode, learners will be able to:

  • Describe what data cleaning is and why it is important
  • Find and resolve inconsistencies within a tabular dataset programmatically (e.g datetime, numeric precision)
  • Identify missing values within a tabular dataset using filters
  • Correct spelling mistakes using spell check tools and find + replace
  • Standardise text formats using spreadsheet functions
  • Describe the pros and cons of using spreadsheets for data collection and cleaning
  • [Note: update for using R?]

5. Introduction to R

Need to write objectives

Contributing

Please see the CONTRIBUTING.md for contributing guidelines and details on how to get involved with this project.

Also see the current list of issues for ideas for contributing. Look for the tag good_first_issue. This indicates that the issue does not require in-depth knowledge of the project and lesson infrastructure, and is a good opportunity for a new contributor to get involved.

The help wanted tag indicates issues that we would particularly appreciate contributions to fix.

To learn more about how this lesson site is built and how you can edit the pages, see the Introduction to The Carpentries Workbench.

Citation

See CITATION.cff for citation information, including a list of authors.

License

Lesson content is published with a CC-BY license.

Contact

Please get in touch with any of the maintainers above with any questions about this lesson.

About

Intro to Research Data Management

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •