-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split Markdown cell at heading by default? #130
Comments
Thanks Matthew, that's much appreciated! From the above I guess you are using the
to be rendered to a total of four cells, is that right? (or three, the last one starting at the H2 header?) Maybe we could think of an option that renders markdown to cells using a single blank line as the cell separator. This should (hopefully) still preserve round trip conversions, but split every markdown cell with a blank line into multiple ones. Would that be fine with you? Or you prefer something closer to your suggestion - create a new cell when a header follows a blank line? |
I find I often want a cell that has blank lines, so I would not use an option that split on one blank line, I think. But - I think I always want to break a cell at a heading. So I never want a cell that goes:
Yes, I am proposing that this cell be split into two:
- so, yes, in your example, the last cell starting at H2 header. |
Hello @matthew-brett, I have started working on this in this branch. Would you like to give it a try? Still pending: an update of the documentation, and the |
Sorry to be slow. Yes, that branch does exactly what I was hoping, thanks. |
Hello @matthew-brett , this is on its way for Jupytext 1.0. Could you please test the release candidate:
and review the documentation ? Thanks ! |
Thanks, yes, it works as you describe. The option was a good idea, I can see it would be confusing otherwise. I added a PR for a more verbose description of the option, please feel free to ignore if it's too verbose. Meanwhile, I found I could not edit my previous .Rmd files with the RC, with this message.
Should that get a doc update too? I wasn't immediately sure how to fix it. Deleting the |
Thanks @matthew-brett for your feedback, and for the doc update. The issue you encounter is caused by the version bump on the Rmd format that only existed in development version 0.9.0, when we considered implementing this by default. But, as it is finally implemented as an option it was not required to increment the Rmd format version number. I recommend that you remove the version information from all your markdown files. That may be a case where the new
|
@mwouts I tried this, and unfortunately, it did not work.
Something else to try? --- jupyter: jupytext: cell_metadata_filter: -all split_at_heading: true text_representation: extension: .Rmd format_name: rmarkdown format_version: '1.2' jupytext_version: 1.4.2 kernelspec: display_name: R language: R name: ir --- ```{r} --- title: BSTA 513 Midterm Notes author: Ariel Balter output: html_notebook: toc: true toc_float: true theme: simplex --- ``` ```{r} library(rmarkdown) library(tidyverse) library(magrittr) library(ggplot2) library(beeswarm) ``` # Variables The three main types of data are _categorical_, _continuous_, and _count_. These are mostly well-defined. In some cases one can argue about how to categorize a particular variable. ## Continuous Variables Continuous variables take on numbers that can be integers or real numbers. Integer continuous variables live in a grey area with _count_ data. Examples of continuous, real-valued numbers are: *weight*, *age*, *concentration*, *volume*, *mass*, *time*, etc. Continuous, integer-valued numbers are numbers that are sequential, but do not accumulate the way _count_ data does. Sometimes these could be equivalent to counts. Sometimes they could be equivalent to _ordinal_ variables. Main examples would be numbers that have been rounded: *age in years*, *number of weeks*, etc. Another example could be a record of events that happen at regular intervals but without any fixed bound such as *first*, *second*, *third*, *fourth*. ## Categorical Variables Categorical variables are _polychotomous_, meaning they take on a finite set of discrete values. In general, there are two types of categorical variabgles: _nominal_, and _ordinal_. ### Nominal Nominal variables generally _name_ something such as *race*, *sex*, *gender*, *color*, *tissue type*, *genotype*, etc. ### Ordinal Ordinal variables are discrete values that also have a natural order. Examples are: * Likert scale choices or ratings such as {*none*, *little*, *some*, *a lot*}, or "rate on a scale of 1 to 5.". * Continuous variables broken into chunks. Examples *age* {*young*, *middle*, *old*}, *weight* {*underweight* , *normal*, *overweight*, *obese*}, *height* {*short*, *medium*, *tall*}. These are discrete, but have a natural order. * Numbers representing the order in which something happened. Examples: *order of children* {*first*, *second*, ...}, {*before treatment*, *after treatment*}, ## Count Data Count data usually record an _accumulation_ of items or observations through events. However, some would consider anything you count as being count data, rather than integer, continuous data. Counts can arrive randomly or continuously. Examples of count data are *number of seizurs*, *electrical pulses on a nerve*, *number of sequences for a gene detected in a sample*, *how many bacteria have grown vs. time*, *number of children*. # Contingency Table ## Definition A contingency table records the counts of two categorical variables. In general, it can contain any number of cells. However the two-by-two table is extremely common. A table with $R$ rows and $C$ columns is called an $R \times C$ table. This would represent one categorical variable that has $R$ possible values and another that has $C$ possible values. Looking at the units of measurement in your data, typically patients or experimental subjects, you count the number of units that have the value $i$ of the variable along the rows and value $j$ along the columns, and you write that number in the $i,j$ cell of the table. Example: a group has 10 people. 5 are male sex and 4 are female sex, and one is intersex (XXY). Of the 5 male, 4 identify as male gender and one as binary. Of the 4 female, 3 identify as female gender, and one as male. The intersex person identifies as male. ```{r} sex_gender = data.frame( sex = c(rep("Male", 5), rep("Female", 4), "Intersex(XXY)"), gender = c("Male", "Male", "Male", "Male", "Binary", "Female", "Female", "Female", "Male", "Male") ) table(sex_gender) %>% addmargins() ``` The row and column totals are called "marginal" values as they fall in the margins of the (extended) table. ## Association - Know the name and distribution for test of general association for a 2 X 2 contingency table. Be able to write down the null and alternative hypothesis. Know how to compute the test statistic for a 2 X 2 table. Know the distribution of the test statistic under the null hypothesis. Know how to interpret R and SAS output for the test. - Know when to use Pearson’s chi-square test and when to use Fisher’s exact test. ## Homogeneity - Know the test of association for contingency table from a matched-pair study. Be able to write down the null and alternative hypothesis. Know how to compute the test statistic for a 2 X 2 table. Know the distribution of the test statistic under the null hypothesis. Know how to interpret R and SAS output for the test. - Know the name and distribution for test of general association for a R X C contingency table. Understand under what situation such a test is appropriate. Be able to write down the null and alternative hypothesis. Know the distribution of the test statistic under the null hypothesis. Know how to interpret R and SAS output for the test. ## Trend - Know the name and distribution for test of trend for a 2 X C ( or R X 2) contingency table. Understand under what situation such a test is appropriate. Be able to write down the null and alternative hypothesis. Know the distribution of the test statistic under the null hypothesis. Know how to interpret R and SAS output for the test. - Know the name and distribution for test of trend for a R X C contingency table. Understand under what situation such a test is appropriate. Be able to write down the null and alternative hypothesis. Know the distribution of the test statistic under the null hypothesis. Know how to interpret R and SAS output for the test. ## Sample Size - Know how to compute by hand for required sample size for one-proportion, two-proportion, and matched-pair studies. # Bsic Analysis Procedures (Steps) ## 1. Present Data * Tables * Graphs or charts ## 2. Descriptive analysis * Summary data such as mean, standard deviation, median, min, max, etc. * Proportions or percentages * Can include estimation meaning confidence intervals are included. ## 3. Univariate analysis ### Categorical Response and Categorical Explanatory Variables #### Tests of association: * Chi-Square * Fisher's Exact #### Level of association * Relative Risk * Odds Ratio ### Categorical Response and Categorical Explanatory Variables **Univariate Testing** * T-test * Wilcoxon/Mann-Whitney/Kruskal-Wallace * AN(C)OVA * Report the overall mean, standard deviation, median, min, max and the distribution as well in each categorized group (if necessary). ## 4. Model Based Methods (Regression, Classification, Machine Learning) * Evaluate association between explanatory variables and a response variable * If the dependent variable is categorical from **cohort** or **cross-sectional** study, use Logistic regression. * If the dependent variable is categorical from **case-control** study, use conditional logistic regression. * If the dependent variable is **counts data**, may use Poisson regression or negative binomial regression. * If the dependent variable is categorical that are measured multiple times, generalized estimating equation (GEE) method may be used. # Types of Study ## Cohort Study • A cohort study is a type of observational study that follows a group of people who do not have the disease. – The goal is to find risk factors that are associated with the disease. • A prospective cohort study follows subjects over time. • A retrospective cohort study takes a look back at events that already have taken place. • The methodology of prospective and retrospective cohort studies is fundamentally the same, but the retrospective study is performed post-hoc, as the cohort is followed retrospectively. * Cohort study – Prospective cohort study – Retrospective cohort study * Case-control study * Cross-sectional study * Experimental study ## Case-Control Study • A case-control study is also an observational study, but looks at two existing groups with and without a certain disease. – The group with the disease is called case group, and the group without the disease is called the control group. – The goal is to identify factors that may contribute to the disease. ## Cross-Sectional Study • A cross-sectional study dose not follow subjects over time. In stead, it evaluates subjects at one specific point time. – The goal is often to provide data on the entire population, for example, the prevalence of an illness. ## Experimental Study • An experimental study intentionally introduce one or more treatments, procedures, or programs, and then observe the outcome. – The goal is to assess the effect of intervention (treatments) on clinical outcomes. – It often involves random allocation of subjects to different interventions. • Example: Clinical Trials ## Matched Pair # Distributions ## Continuous ### Normal ### T ### Fisher ### Chi-Square - Binomial (normal approximation) - normal - t - chi-square ## Types Types Parts of Models Test Uses ## Distributions - Binomial (normal approximation) - normal - t - chi-square ## Hypothesis testing and inference - Understand inference and hypothesis testing for proportion, for both one-sample and two-sample studies. - Understand likelihood ratio test. List of tests and uses ## Association ## Homogeneity Lecture 3: Measurement of association for contingency table (I) # Odds Ratio and Risk ## Risk Difference ## Relative Risk - Know how to compute and interpret estimated risk difference, relative risk and odds ratio with their associated 95% confidence interval from a 2 X 2 contingency table. - Know when OR and/or RR is estimable and when RR is not estimable. Understand the relationship between RR and OR. - Know how to compute and interpret estimated odds ratio with its associated 95% confidence interval from a matched-pair study. - Know how to compute and interpret adjusted OR using Mantel-Haenszel method. Know when it is appropriate to report Mantel-Haenszel adjusted OR. Understand Breslow-Day test of homogeneity. Be able to write down the null and alternative hypothesis. Know the distribution of the test statistic under the null hypothesis. Know how to interpret R and SAS output for the test. - Understand Cochran-Mantel-Haenszel chi-square test. Be able to write down the null and alternative hypothesis. Know the distribution of the test statistic under the null hypothesis. Know how to interpret R and SAS output for the test. ## Attributable Risk - Know how to compute and interpret sensitivity, specificity, the positive predictive value (PV+), and the negative predictive value (PV-). Understand the relationship between (PV+, PV-) and (sensitivity, specificity). # Inter-rater agreement - Understand Cohen’s Kappa statistics as a measure of agreement among raters. Know how to compute and test for Cohen’s kappa. Be able to write down the null and alternative hypothesis. Know the distribution of the test statistic under the null hypothesis. Know how to interpret R and SAS output for the test. # Confounding - What is confounding? Understand the difference between a positive confounder and a negative confounder. - What is population attributable risk? Know how to compute and interpret population attributable risk for cohort studies and case-control studies. ## Sensitivity and Specificity - What is a ROC curve? What is the widely used summary of the overall diagnostic accuracy of a test in a ROC curve analysis? How to interpret that? # Maximum Likliehood Estimation # Logistic Regression ## Definition and use - Understand the three components of generalized linear models. - Understand why ordinary linear regression is not suitable for a categorical outcome variable. - Know the logistic regression function. Know how to write a logistic regression model. - Know how the logistic regression model can be expressed in the three components of GLM. ## Hypothesis Testing and Inference ### MLE - Know MLE and its nice properties. Be able to write down the likelihood function for a logistic regression model. ### Tests of Significance - Know the three tests for the significance of the coefficients. ### Devience - Know what deviance is. Know how to compute likelihood ratio test from outputs of SAS and R. Be able to write down the null and alternative hypothesis. Know the distribution of the test statistic under the null hypothesis. Know how to interpret STATA and SAS output for the test. ### Confidence Intervals #### Types - Know how to compute and interpret the confidence intervals for estimated coefficients in simple logistic regression. - Know that an alternative confidence interval to Wald confidence interval is the profile likelihood confidence interval, and know the difference of that from the Wald CI. ## Analyzing Results - Know how to compute the predicted probability and its associated 95% confidence interval using R or SAS output. - Understand when and how to create design variables. Know there are different ways of defining design variables. Computing estimated odds ratio from the estimated coefficients depends on how the design variables are defined. - Understand likelihood ratio test and the Wald test in the multiple logistic regression model setting. Know how to use the tests to decide which variable(s) should stay in the model. - Know how to compute and interpret predicted probability and the corresponding confidence interval in multiple logistic regression model, including both univariate and multivariable logistic regression. - Know how to interpret the coefficients for fitted logistic regression models, including dichotomous independent variables. Know how to compute estimated odds ratio from the estimated coefficients. ```{r} ``` |
Hello @abalter , well I gave it a try, and indeed I found two things that do not work as expected:
I'll fix the first point, but maybe I'd prefer not to change the behavior of the |
I think the line before a heading would probably make a document easier to read anyway. Sounds great! |
I'm having an excellent time using Jupytext, thanks again.
I notice that I often write text, with headings. If I forget to put a double carriage return before the heading, the resulting Markdown cell looks a bit goofy, with a heading in the middle. Do you think it would be sensible to split text cells automatically at headings? Or provide an option to do that?
The text was updated successfully, but these errors were encountered: