diff --git a/README.md b/README.md index d36eeb7f..f0bd7cc9 100644 --- a/README.md +++ b/README.md @@ -135,14 +135,40 @@ That means, when you submit your paper, the reviewers and the rest of the world the analyses from raw data all the way to final results. If you are trying to be efficient, you will likely perform some summarization/data analysis steps before the data can be considered tidy. +The ideal thing for you to do when performing summarization is to create a computer script (in R, Python, or something else) +that takes the raw data as input and produces the tidy data you are sharing as output. You can try running your script +a couple of times and see if the code produces the same output. +In many cases, the person who collected the data has incentive to make it tidy for a statistician to speed the process +of collaboration. They may not know how to code in a scripting language. In that case, what you should provide the statistician +is something called psuedocode. It should look something like: +1. Step 1 - take the raw file, run version 3.1.2 of summarize software with parameters a=1, b=2, c=3 +2. Step 2 - run the software separatly for each sample +3. Step 3 - take column three of outputfile.out for each sample and that is the corresponding row in the output data set + +You should also include information about which system (Mac/Windows/Linux) you used the software on and whether you +tried it more than once to confirm it gave the same results. Ideally, you will run this by a fellow student/labmate +to confirm that they can obtain the same output file you did. What you should expect from a statistician ==================== - - +When you turn over a properly tidied data set it dramatically decreases the workload on the statistician. So hopefully +they will get back to you much sooner. But most careful statisticians will check your recipe, ask questions about +steps you performed, and try to confirm that they can obtain the same tidy data that you did with, at minimum, spot +checks. + +You should then expect from the statistician: +1. An analysis script that performs each of the analyses (not just instructions) +2. The exact computer code they used to run the analysis +3. All output files/figures they generated. + +This is the information you will use in the supplement to establish reproducibility and precision of your results. Each +of the steps in the analysis should be clearly explained and you should ask questions when you don't understand +what the analyst did. It is the responsibility of both the statistician and the scientist to understand the statistical +analysis. You may not be able to perform the exact analyses without the statistician's code, but you should be able +to explain why the statistician performed each step to a labmate/your principal investigator.