You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h2class="unnumbered anchored" data-anchor-id="ten-reasons-to-implement-code-management-practices-early-in-a-research-group">Ten reasons to implement code management practices early in a research group</h2>
271
271
<p>Would the problem be solved if future new members of the lab arrived with better training in data science? No. We believe the research group should still define its priorities when it comes to managing code.</p>
272
-
<p>There are several benefits to defining clear minimum guidelines and basic computational skills from the moment new members join the lab:</p>
272
+
<p>Leaving code management entirely to early-career researchers can lead to inconsistent practices. Instead, establishing group-wide policies ensures consistency and sustainability. There are several benefits to defining clear minimum guidelines and basic computational skills from the moment new members join the lab:</p>
273
273
<oltype="1">
274
-
<li><p><strong>Avoid messy projects from the start</strong>.<br>
275
-
Centralizing data analyses on a GitHub Organization and creating standards for pushing code promotes improved repository structuring, version control, and better-documented code, ensuring reproducibility from the project’s inception.</p></li>
276
-
<li><p><strong>Implement minimum documentation and project management best practices</strong>.<br>
277
-
Defining group-level criteria for code and data management facilitates collaboration, saving time and avoiding errors.</p></li>
278
-
<li><p><strong>Focus on domain-specific skills first</strong>.<br>
279
-
Identifying domain-specific computational skills can save time for new researchers. This knowledge is sometimes shared in publications tailored to each discipline but is too specific to be addressed by general training courses and tutorials for scientists, being the only exception we know Data Carpentry.</p></li>
280
-
<li><p><strong>Early peer review</strong>.<br>
281
-
In this manual, we suggest creating early private repositories that are only visible among team members. Sharing analyses with team members in private repositories allows for valuable feedback. This practice could help researchers gain confidence in making their code publicly accessible once published.</p></li>
282
-
<li><p><strong>Define a consistent set of practices from all the different schools of thought</strong>.<br>
283
-
Different educational materials and training tutorials offer diverse sets of good management practices. Researchers, coming from varied backgrounds, may adopt practices they deem best. Therefore, providing clear guidelines on expectations and methodologies ensures consistency in management practices throughout the project.</p></li>
284
-
<li><p><strong>More efficient use of time</strong>.<br>
285
-
Taking a workshop on a computational tool may occur at an advanced stage of the project. As a result, decisions about code organization, documentation, and file structure could have been made more effectively from the beginning, saving valuable time.</p></li>
286
-
<li><p><strong>Maintain the group’s research history</strong>.<br>
287
-
This approach helps create and standardize a historical archive of the group’s data analyses, ensuring continuity and avoiding dependence on researchers leaving behind their code and data when they move on.</p></li>
288
-
<li><p><strong>Facilitate exchange of ideas about data and code management among team members</strong>.<br>
289
-
Creating guidelines helps build a body of knowledge that can be improved over time with contributions from students/researchers, allowing for discussions on which practices should be added, prioritized, and/or removed.</p></li>
290
-
<li><p><strong>Make informed decisions about what to learn next</strong>.<br>
291
-
A researcher may hear that they should learn to use GitHub. By explaining from the beginning what GitHub is and the minimum knowledge required, it becomes easier for them to assess if they should focus on learning additional skills or not. Supporting new members of the research group in adopting basic computational techniques from the start lowers the barrier for researchers to explore other tools early.</p></li>
292
-
<li><p><strong>Adoption of open science practices</strong>.<br>
293
-
If the group aims to begin making research code available, these guidelines and training will effectively promote leaving the code open source.</p></li>
274
+
<li><p><strong>Set a solid foundation to avoid messy projects.</strong><br>
275
+
Define the file formats to be used and establish a basic file structure to ensure reproducibility from the project’s inception. Additionally, outline how the data will be managed and integrated into the analysis.</p></li>
276
+
<li><p><strong>Define a consistent set of practices from all the different schools of thought.</strong><br>
277
+
Educational materials and training tutorials present various management practices, and researchers from different backgrounds may adopt different approaches. Therefore, providing clear guidelines ensures consistency in management practices across the projects.</p></li>
278
+
<li><p><strong>Focus on domain-specific skills first.</strong><br>
279
+
Identifying domain-specific computational skills can save time for new researchers. This knowledge is sometimes shared in publications tailored to each discipline but is too specific to be addressed by general training courses and tutorials for scientists.</p></li>
280
+
<li><p><strong>Early peer review.</strong><br>
281
+
In this manual, we suggest creating private repositories that are visible only to team members. Sharing analyses within these private repositories allows for valuable feedback. This practice could help researchers gain confidence in making their code publicly accessible once published and benefit from unpublished analyses conducted in the lab.</p></li>
For example, there could be a README template that all researchers use, making it easy to understand what can be found in a repository. This saves time, facilitates access to materials for all team members, increases project reproducibility, and makes it easier to identify repositories with older analyses.</p></li>
284
+
<li><p><strong>Optimize time management.</strong><br>
285
+
Taking a workshop on a computational tool may occur at an advanced stage of a project. As a result, decisions about code organization, documentation, and file structure could have been made more effectively from the beginning, saving valuable time.</p></li>
286
+
<li><p><strong>Maintain the group’s research history.</strong><br>
287
+
Centralizing data analyses on a repository hosting organization, such as a GitHub Organization, creates a historical archive of the group’s data analyses, ensuring continuity and avoiding dependence on researchers leaving behind their code and data when they move on.</p></li>
288
+
<li><p><strong>Facilitate exchange of ideas about data and code management among team members.</strong><br>
289
+
Creating guidelines helps build a body of knowledge that can be improved over time with contributions from students/researchers, allowing for discussions on which practices should be added, prioritized, or removed.</p></li>
290
+
<li><p><strong>Make informed decisions about what to learn next.</strong><br>
291
+
A researcher might hear that they need to learn Git but have no idea what this tool is for. A brief introduction to Git and clear guidance on where to begin make it easier to assess whether learning additional skills will be useful. Supporting new members in adopting basic computational techniques from the beginning lowers the barrier for researchers to explore other tools early.</p></li>
292
+
<li><p><strong>Adoption of open science practices.</strong><br>
293
+
If the group embraces open science, adopting these practices early will ensure that a high percentage of the code generated remains open source.</p></li>
294
294
</ol>
295
-
<p>Finally, there is still an eleventh reason. By demonstrating how the software will be mantained and managed throughout the project lifecycle, we highlight long-term sustainability and encourage funding agencies to invest in similar future projects.</p>
295
+
<p>These ten reasons can serve as a starting point for opening a discussion on how to approach these topics within the research group. Leaders do not need to be experts in software development. Guiding principal investigators to select the essential tools and practices maximizes the benefits of making key decisions for the team without requiring large investments in learning.</p>
296
+
<p>At the same time, the existence of a research group manual allows younger researchers to share, propose, and contribute improvements on how the code is managed based on their expertise in the research area and the training they will receive. Eventually, the manual should include the criteria for publishing code and how to recognize the need to create a software package that can be used in the lab to facilitate the group’s work.</p>
297
+
<p>Finally, beyond these ten reasons, there is an additional benefit: demonstrating how software will be maintained throughout the project lifecycle strengthens the case for long-term sustainability. This transparency encourages funding agencies to invest in similar future projects.</p>
<p>We thank<ahref="https://hykelvinlee.com/">Kelvin Lee</a> for their time and thoughtful feedback. Their insights and suggestions have improved the quality of this manual.</p>
314
+
<p>Thanks to<ahref="https://hykelvinlee.com/">Kelvin Lee</a> for the time and thoughtful feedback. The insights and suggestions provided have improved the quality of this blog post.</p>
"section": "Ten reasons to implement code management practices early in a research group",
21
-
"text": "Ten reasons to implement code management practices early in a research group\nWould the problem be solved if future new members of the lab arrived with better training in data science? No. We believe the research group should still define its priorities when it comes to managing code.\nThere are several benefits to defining clear minimum guidelines and basic computational skills from the moment new members join the lab:\n\nAvoid messy projects from the start.\nCentralizing data analyses on a GitHub Organization and creating standards for pushing code promotes improved repository structuring, version control, and better-documented code, ensuring reproducibility from the project’s inception.\nImplement minimum documentation and project management best practices.\nDefining group-level criteria for code and data management facilitates collaboration, saving time and avoiding errors.\nFocus on domain-specific skills first.\nIdentifying domain-specific computational skills can save time for new researchers. This knowledge is sometimes shared in publications tailored to each discipline but is too specific to be addressed by general training courses and tutorials for scientists, being the only exception we know Data Carpentry.\nEarly peer review.\nIn this manual, we suggest creating early private repositories that are only visible among team members. Sharing analyses with team members in private repositories allows for valuable feedback. This practice could help researchers gain confidence in making their code publicly accessible once published.\nDefine a consistent set of practices from all the different schools of thought.\nDifferent educational materials and training tutorials offer diverse sets of good management practices. Researchers, coming from varied backgrounds, may adopt practices they deem best. Therefore, providing clear guidelines on expectations and methodologies ensures consistency in management practices throughout the project.\nMore efficient use of time.\nTaking a workshop on a computational tool may occur at an advanced stage of the project. As a result, decisions about code organization, documentation, and file structure could have been made more effectively from the beginning, saving valuable time.\nMaintain the group’s research history.\nThis approach helps create and standardize a historical archive of the group’s data analyses, ensuring continuity and avoiding dependence on researchers leaving behind their code and data when they move on.\nFacilitate exchange of ideas about data and code management among team members.\nCreating guidelines helps build a body of knowledge that can be improved over time with contributions from students/researchers, allowing for discussions on which practices should be added, prioritized, and/or removed.\nMake informed decisions about what to learn next.\nA researcher may hear that they should learn to use GitHub. By explaining from the beginning what GitHub is and the minimum knowledge required, it becomes easier for them to assess if they should focus on learning additional skills or not. Supporting new members of the research group in adopting basic computational techniques from the start lowers the barrier for researchers to explore other tools early.\nAdoption of open science practices.\nIf the group aims to begin making research code available, these guidelines and training will effectively promote leaving the code open source.\n\nFinally, there is still an eleventh reason. By demonstrating how the software will be mantained and managed throughout the project lifecycle, we highlight long-term sustainability and encourage funding agencies to invest in similar future projects.\n\n\n\n\n\n\nHow to cite this book?\n\n\n\nD’Andrea, F., & Stringhini, S. Code Management Guidelines: R and GitHub Starter Kit for New Team Members. https://github.com/StringhiniLab/GitHubProceduresLab. Available at: https://stringhinilab.github.io/GitHubProceduresLab/ DOI: https://doi.org/10.5281/zenodo.14510774"
21
+
"text": "Ten reasons to implement code management practices early in a research group\nWould the problem be solved if future new members of the lab arrived with better training in data science? No. We believe the research group should still define its priorities when it comes to managing code.\nLeaving code management entirely to early-career researchers can lead to inconsistent practices. Instead, establishing group-wide policies ensures consistency and sustainability. There are several benefits to defining clear minimum guidelines and basic computational skills from the moment new members join the lab:\n\nSet a solid foundation to avoid messy projects.\nDefine the file formats to be used and establish a basic file structure to ensure reproducibility from the project’s inception. Additionally, outline how the data will be managed and integrated into the analysis.\nDefine a consistent set of practices from all the different schools of thought.\nEducational materials and training tutorials present various management practices, and researchers from different backgrounds may adopt different approaches. Therefore, providing clear guidelines ensures consistency in management practices across the projects.\nFocus on domain-specific skills first.\nIdentifying domain-specific computational skills can save time for new researchers. This knowledge is sometimes shared in publications tailored to each discipline but is too specific to be addressed by general training courses and tutorials for scientists.\nEarly peer review.\nIn this manual, we suggest creating private repositories that are visible only to team members. Sharing analyses within these private repositories allows for valuable feedback. This practice could help researchers gain confidence in making their code publicly accessible once published and benefit from unpublished analyses conducted in the lab.\nStandardize documentation practices.\nFor example, there could be a README template that all researchers use, making it easy to understand what can be found in a repository. This saves time, facilitates access to materials for all team members, increases project reproducibility, and makes it easier to identify repositories with older analyses.\nOptimize time management.\nTaking a workshop on a computational tool may occur at an advanced stage of a project. As a result, decisions about code organization, documentation, and file structure could have been made more effectively from the beginning, saving valuable time.\nMaintain the group’s research history.\nCentralizing data analyses on a repository hosting organization, such as a GitHub Organization, creates a historical archive of the group’s data analyses, ensuring continuity and avoiding dependence on researchers leaving behind their code and data when they move on.\nFacilitate exchange of ideas about data and code management among team members.\nCreating guidelines helps build a body of knowledge that can be improved over time with contributions from students/researchers, allowing for discussions on which practices should be added, prioritized, or removed.\nMake informed decisions about what to learn next.\nA researcher might hear that they need to learn Git but have no idea what this tool is for. A brief introduction to Git and clear guidance on where to begin make it easier to assess whether learning additional skills will be useful. Supporting new members in adopting basic computational techniques from the beginning lowers the barrier for researchers to explore other tools early.\nAdoption of open science practices.\nIf the group embraces open science, adopting these practices early will ensure that a high percentage of the code generated remains open source.\n\nThese ten reasons can serve as a starting point for opening a discussion on how to approach these topics within the research group. Leaders do not need to be experts in software development. Guiding principal investigators to select the essential tools and practices maximizes the benefits of making key decisions for the team without requiring large investments in learning.\nAt the same time, the existence of a research group manual allows younger researchers to share, propose, and contribute improvements on how the code is managed based on their expertise in the research area and the training they will receive. Eventually, the manual should include the criteria for publishing code and how to recognize the need to create a software package that can be used in the lab to facilitate the group’s work.\nFinally, beyond these ten reasons, there is an additional benefit: demonstrating how software will be maintained throughout the project lifecycle strengthens the case for long-term sustainability. This transparency encourages funding agencies to invest in similar future projects.\n\n\n\n\n\n\nHow to cite this book?\n\n\n\nD’Andrea, F., & Stringhini, S. Code Management Guidelines: R and GitHub Starter Kit for New Team Members. https://github.com/StringhiniLab/GitHubProceduresLab. Available at: https://stringhinilab.github.io/GitHubProceduresLab/ DOI: https://doi.org/10.5281/zenodo.14510774"
22
22
},
23
23
{
24
24
"objectID": "index.html#acknowledgments",
25
25
"href": "index.html#acknowledgments",
26
26
"title": "Code Management Guidelines",
27
27
"section": "Acknowledgments",
28
-
"text": "Acknowledgments\nWe thank Kelvin Lee for their time and thoughtful feedback. Their insights and suggestions have improved the quality of this manual."
28
+
"text": "Acknowledgments\nThanks to Kelvin Lee for the time and thoughtful feedback. The insights and suggestions provided have improved the quality of this blog post."
0 commit comments