FAIR Software & Data

The following material is paraphrased from the NIST-internal Data Sponsorship repository by @tkphd.

FAIR principles

What does FAIR even mean? The following sections reproduce the summary from Go FAIR, based on the original FAIR paper.

Think you know FAIR? Please use this tool to check your awareness!

Findable

The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.

(Meta)data are assigned a globally unique and persistent identifier
Data are described with rich metadata (defined by R1 below)
Metadata clearly and explicitly include the identifier of the data they describe
(Meta)data are registered or indexed in a searchable resource

Accessible

Once the user finds the required data, she/he/they need to know how can they be accessed, possibly including authentication and authorisation.

(Meta)data are retrievable by their identifier using a standardised communications protocol
1. The protocol is open, free, and universally implementable
2. The protocol allows for an authentication and authorisation procedure, where necessary
Metadata are accessible, even when the data are no longer available

Interoperable

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

(Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
(Meta)data use vocabularies that follow FAIR principles
(Meta)data include qualified references to other (meta)data

Reusable

The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

(Meta)data are richly described with a plurality of accurate and relevant attributes
1. (Meta)data are released with a clear and accessible data usage license
2. (Meta)data are associated with detailed provenance
3. (Meta)data meet domain-relevant community standards

Make It FAIR in Ten Easy Steps

[Library Carpentry][lc] has a summary of 10 "easy" steps to make your software FAIR (PDF). An annotated list follows. Note that while the list is software-centric, it applies equally to data.

Create a description of your software.
Write this in README.md with supporting tables, charts, images, etc. Include its dependencies, installation instructions, and citations of any work it builds upon.
Register your software in a software registry.
MIDAS is the go-to where Data Sponsorship is concerned, but is not the only option.
Use a unique and persistent identifier for your software.
Any registry compliant with NIST O 5702 will provide you with a persistent handle.
Make sure that people can download your software.
If the data is fire- or pay-walled, provide an alternative site. Wherever your data lives, check back from time to time to make sure the links are still valid.
Explain the functionality of your software.
Write this into a "Usage" section of README.md, or similar, with example of how to configure, launch, and interact with the software, with examples of output to be expected.
Use standard (community-agreed) formats for inputs and outputs.
While open standards are preferred, if a proprietary format is the lingua franca of the field, focus on that. Create open versions if possible.
Document your software.
This goes beyond README.md and in-line comments. Place documentation, or its build scripts, in a folder named doc with its own README.md describing how to build the docs and what to expect.
Give your software a license.
If all members of the development team are Federal employees, use the standard NIST Disclaimer of Copyright and Warranty for your LICENSE.md. Otherwise, decide on an appropriate license.
State how to cite your software.
This can be done in README.md, or as a separate CITATION.md using a BiBTeX-styled code block.
Follow best practices for software development.
Broadly speaking, this starts with version control using git or similar, linting your code, and following some type of branching workflow when multiple developers are involved. The regularly-scheduled Software Carpentry workshops at NIST teach the basics of some of these concepts.

What's good enough?

Good enough practices in scientific computing is an excellent paper outlining what you need to do to produce good science in a FAIR frame of mind.

tl;dr follows.

Data management

Save the raw data.
Ensure that raw data are backed up in more than one location.
Create the data you wish to see in the world.
Create analysis-friendly data.
Record all the steps used to process data.
Anticipate the need to use multiple tables, and use a unique identifier for every record.
Submit data to a reputable DOI-issuing repository so that others can access and cite it.

Software

Place a brief explanatory comment at the start of every program.
Decompose programs into functions.
Be ruthless about eliminating duplication.
Always search for well-maintained software libraries that do what you need.
Test libraries before relying on them.
Give functions and variables meaningful names.
Make dependencies and requirements explicit.
Do not comment and uncomment sections of code to control a program's behavior.
Provide a simple example or test data set.
Submit code to a reputable DOI-issuing repository.

Collaboration

Create an overview of your project.
Create a shared "to-do" list for the project.
Decide on communication strategies.
Make the license explicit.
Make the project citable.

Project organization

Put each project in its own directory, which is named after the project.
Put text documents associated with the project in the doc directory.
Put raw data and metadata in a data directory and files generated during cleanup and analysis in a results directory.
Put project source code in the src directory.
Put external scripts or compiled programs in the bin directory.
Name all files to reflect their content or function.

Keeping track of changes

Back up (almost) everything created by a human being as soon as it is created.
Keep changes small.
Share changes frequently.
Create, maintain, and use a checklist for saving and sharing changes to the project.
Store each project in a folder that is mirrored off the researcher's working machine.
Add a file called CHANGELOG.md to the project's docs subfolder.
Copy the entire project whenever a significant change has been made.
Use a version control system.

Manuscripts

Write manuscripts using online tools with rich formatting, change tracking, and reference management.
Write the manuscript in a plain text format that permits version control.

Links

F-UJI (tool): analyze a repository and get a report of its FAIR compliance, with an overall score and a checklist
FAIRaware quiz/checklist of understanding
FAIR for Research Software (FAIR4RS), a proposed modification of the FAIR principles specifically for software.
10 easy things to make your software FAIR! (PDF) from Library Carpentry
4 Simple Recommendations for Open-Source Software walk-through lesson, Carpentries-style

Some more general FAIR resources:

Research Data Alliance: Top 10 FAIR data and software things
Library Carpentries: Top 10 FAIR Research Software Things
NLeSC FAIR Software
NLeSC howfairis
Data and Software Sharing Guidance for Authors Submitting to AGU journals
FAIRSharing and FAIRShake from the Preservation Quality Tool (PresQT)
Registry of Research Data Repositories also has software locations
Automating the Monitoring of Research Software FAIR Metrics
Proposal for software indicators in the Open Science Monitor
National Plan for Open Science (France, 2021-2024). Theme Three: Opening Up and Promoting Source Code Produced by Research
The Turing Way
Software Citation Guide
Managing Research Software Projects
Guides for several groups (researchers, managers, developers, ...) from the Software Sustainability Institute
Chorus Software Citation Policies Index
Software Discovery Through Registries
Awesome FAIR Data: a list of FAIR data resources.
Awesome Research Software Registries
CodeMeta Standard
- CodeMeta Generator (tool): generate a complete set of CodeMeta-compliant metadata for your research software and/or data with this handy form. Exports to JSON.
  Note: codemeta.json is not the same as codemeta.yaml: the former is a nascent general schema, while the latter is only used to help index and link NIST websites.
SOftware Metadata Extraction Framework (SOMEF)
Good Enough Practices in Scientific Computing
Citation File Format docs from GitHub
Software REUSE Specification
FAIR Computational Workflows (paper): Data doesn't just happen. Record the workflow that created it to be super FAIR.

Institutional guidance:

DLR: Software Engineering Initiative
MIT: Software Citation and Publishing
MIT workshop: Managing your research code
TU Delft Guidelines on Research Software: Licensing, Registration and Commercialisation
TU Delft: Choosing a Repository Manager
Helmholtz: Guidelines for Sustainable Research Software
- Checklist for Helmholtz Guidelines
NIH: Best Practices for Sharing Research Software

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fair-software.md

fair-software.md

FAIR Software & Data

Table of Contents

FAIR principles

Findable

Accessible

Interoperable

Reusable

Make It FAIR in Ten Easy Steps

What's good enough?

Data management

Software

Collaboration

Project organization

Keeping track of changes

Manuscripts

Links

Files

fair-software.md

Latest commit

History

fair-software.md

File metadata and controls

FAIR Software & Data

Table of Contents

FAIR principles

Findable

Accessible

Interoperable

Reusable

Make It FAIR in Ten Easy Steps

What's good enough?

Data management

Software

Collaboration

Project organization

Keeping track of changes

Manuscripts

Links