Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update stats plots, add longitudinal sample size calculation #98

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

PaulBautin
Copy link
Collaborator

@PaulBautin PaulBautin commented Jan 25, 2021

This PR intends to homogenize notations and conventions between the graphs presented in the manuscript and the "csa_atrophy" repo.

Done:

  • Update plots: sample_size, error_function_of_csa
  • Update legends: boxplot_atrophy, boxplot_csa
  • Draw bissection automatically on the boxplot_atrophy graph
  • Add graph showing error in function of CSA
  • Add longitudinal study sample size computation and update ref for sample size computation
  • Update README

FIX #95, FIX #92, FIX #100, FIX #80, FIX #103

- update plots: sample_size, error_function_of_csa
- update legends: boxplot_atrophy, boxplot_csa
@PaulBautin PaulBautin marked this pull request as draft January 25, 2021 17:22
- expected trend and mean values on CSA boxplot
- mean values on atrophy boxplot
- add function for adding pearson's r and p-value stats
- add diff and std diff column in dataframe for sample size computation
- add longitudinal study sample size

change:
- correct difference between means (atrophy_% * CSA) on sample size plot
- correct x_label on error_function_of_CSA
- correct x_label on error_function_of_intra_cov
- automatize the detection of outliers on error_function_of_intra_cov_outlier
@PaulBautin PaulBautin marked this pull request as ready for review March 11, 2021 18:27
@jcohenadad jcohenadad changed the title Update plots to match manuscript Update stats plots, add longitudinal sample size calculation Mar 17, 2021
@PaulBautin
Copy link
Collaborator Author

PaulBautin commented May 13, 2021

Up to now the computed longitudinal sample sizes were small (< 1). After investigation, we observed that the SD of the difference of measured CSA across subjects were computed using mean CSA across transformations ex:
diff(sI, rX) = Mean[CSA(sI, r1, :)] - Mean[CSA(sI, rX, :)] (1)
This had for consequence that the SD of differences did not take into account the variability due to transformations.

Implemented in 7874439, contrary to formula (1) program does not mean CSA across transformations but randomly samples a CSA value for each subject. ex:
diff(sI, rX) = CSA(sI, r1, tY) - CSA(sI, rX, tZ) (2)
The results with this method are much closer to what was previously found in literature. However, it also puts in evidence a large variability in longitudinal sample sizes due to transformations. This is surprising because the mean intra-subject SD is relatively small (hence, I would not expect such an influence of the transformation-related variability).

df_sub['perc_error'] = 100 * (df_sub['mean'] - df_sub['theoretic_csa']).div(df_sub['theoretic_csa'])
diff = []
for rescale, group in df.groupby('rescale'):
for sub, subgroup in group.groupby('subject'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment/explanation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added explanations for the sample size function in commit f50f18a

@PaulBautin
Copy link
Collaborator Author

PaulBautin commented May 20, 2021

With commit 05d7fcd using difference formula (tY and tZ are two different transforms):

diff(sI, rX) = CSA(sI, r1, tY) - CSA(sI, rX, tZ) (2)

longitudinal sample size variability is relatively important

longitudinal_sample_size

Also note that, when looking at between-group differences (vs. paired differences as described above), the formula was also updated as follows:

CSA(sI, rX) = CSA(sI, rX, tZ) and not CSA(sI, rX) = MEAN[CSA(sI, rX, :)]

Results seem to vary much less: SD of sample size no more than 3% of sample size (between groups).

Therefore, my best guess is that the important variability found computing longitudinal sample sizes are mostly due to the variability of CSA measures between scalings (which has already been shown in article).

@jcohenadad, should we continue with these results? My idea is now to keep the Monte Carlo simulations for both sample size computations.

@jcohenadad
Copy link
Member

@PaulBautin This is an interesting investigation but I need more guidance to understand the formula described in #98 (comment). Without the context of the code I cannot advise on what is the most appropriate solution. I suggest we discuss it in a meeting.

- normalize by the square of rescale for df['Normalized CSA in mm²']
- use poly1d when plotting trends in plots
- append fake values for pearson and and p_value for rescale =1
- un-comment concatenating csv files
- change iteration number for computing sample size and print message
@PaulBautin
Copy link
Collaborator Author

@jcohenadad, could you review? I think this PR is ready to be merged (plots in PR match plots in article).

@PaulBautin
Copy link
Collaborator Author

@jcohenadad, could you review? This PR should be merged into master because plots and stats for the article are based on this PR.

@jcohenadad
Copy link
Member

sorry-- realistically i will not have time to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants