-
Notifications
You must be signed in to change notification settings - Fork 0
Vignettes and bugs #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
4f7b83f
rename data_checker
ryan-odea 181ee84
Update __init__.py
ryan-odea 7dec189
fixed a bug with selection removing all treatment levels
ryan-odea 238786b
fixed a naming clash
ryan-odea a151909
fixed impure DF going into the bootstrap process
ryan-odea 3a7eb20
added a clamp to edit out of bounds values (risk/survival)
ryan-odea 205388d
black formatting
ryan-odea ee1196a
added basic vignettes
ryan-odea 40438de
added output example vignette
ryan-odea 10d584c
bump version
ryan-odea 481af15
Update docs/vignettes/exploring_results.md
ryan-odea 5af3844
Update docs/vignettes/exploring_results.md
ryan-odea 734585b
Update docs/vignettes/getting_started.md
ryan-odea 818ed11
Update docs/vignettes/getting_started.md
ryan-odea cc48b5d
Update docs/vignettes/more_advanced_models.md
ryan-odea 7e17a8f
added collection step into output vignette
ryan-odea File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,188 @@ | ||
| # Exploring Results | ||
|
|
||
| Recall our previous example, {doc}`~vignettes/more_advanced_models`, where we finalized and collected our results with | ||
|
|
||
| ```python | ||
| my_output = my_analysis.collect() | ||
| my_output.to_md() | ||
| ``` | ||
| Let us now go over what the dump to md looks like and explore our output in further detail. | ||
|
|
||
| ## SEQuential Analysis: {date}: censoring | ||
|
|
||
| ## Weighting | ||
|
|
||
| We begin by exploring the weight models, this gives us general information about the numerator and denominator models, as well as weight statistics before applying any limits. If you recall, we imposed weight bounds at the 99th percentile. This means that in the outcome model our weights will be bound at [0.273721, 423.185]. Note that in real analysis, we would hope the weights are stabilized further. Using the generated data, specifically with excused analysis, often result in larger-than-intended weights. | ||
|
|
||
| We should also note here that in excused-censoring analysis, our adherance models hold `switch` as the dependent variable. In all non-excused cases, this would normally be your treatment value. | ||
|
|
||
| ### Numerator Model | ||
|
|
||
| ``` | ||
| MNLogit Regression Results | ||
| ============================================================================== | ||
| Dep. Variable: switch No. Observations: 65375 | ||
| Model: MNLogit Df Residuals: 65366 | ||
| Method: MLE Df Model: 8 | ||
| Date: Wed, 10 Dec 2025 Pseudo R-squ.: 0.008332 | ||
| Time: 10:18:38 Log-Likelihood: -13986. | ||
| converged: True LL-Null: -14103. | ||
| Covariance Type: nonrobust LLR p-value: 2.560e-46 | ||
| =============================================================================== | ||
| switch=1 coef std err z P>|z| [0.025 0.975] | ||
| ------------------------------------------------------------------------------- | ||
| Intercept -0.8797 0.692 -1.271 0.204 -2.236 0.477 | ||
| sex[T.1] -0.0461 0.035 -1.325 0.185 -0.114 0.022 | ||
| N_bas 0.0026 0.003 0.741 0.459 -0.004 0.009 | ||
| L_bas 0.3368 0.032 10.553 0.000 0.274 0.399 | ||
| P_bas -0.1864 0.073 -2.556 0.011 -0.329 -0.043 | ||
| followup -0.0211 0.006 -3.822 0.000 -0.032 -0.010 | ||
| followup_sq 0.0001 0.000 0.637 0.524 -0.000 0.000 | ||
| trial -0.0624 0.014 -4.430 0.000 -0.090 -0.035 | ||
| trial_sq 0.0004 0.000 2.309 0.021 5.78e-05 0.001 | ||
| =============================================================================== | ||
| ``` | ||
|
|
||
| ### Denominator Model | ||
|
|
||
| ``` | ||
| MNLogit Regression Results | ||
| ============================================================================== | ||
| Dep. Variable: switch No. Observations: 65375 | ||
| Model: MNLogit Df Residuals: 65363 | ||
| Method: MLE Df Model: 11 | ||
| Date: Wed, 10 Dec 2025 Pseudo R-squ.: 0.01586 | ||
| Time: 10:18:38 Log-Likelihood: -13880. | ||
| converged: True LL-Null: -14103. | ||
| Covariance Type: nonrobust LLR p-value: 5.374e-89 | ||
| =============================================================================== | ||
| switch=1 coef std err z P>|z| [0.025 0.975] | ||
| ------------------------------------------------------------------------------- | ||
| Intercept -1.4384 0.715 -2.012 0.044 -2.840 -0.037 | ||
| sex[T.1] -0.0447 0.035 -1.281 0.200 -0.113 0.024 | ||
| N -0.0195 0.003 -5.655 0.000 -0.026 -0.013 | ||
| L 0.3719 0.062 6.025 0.000 0.251 0.493 | ||
| P 0.9362 0.139 6.723 0.000 0.663 1.209 | ||
| N_bas 0.0023 0.003 0.674 0.501 -0.004 0.009 | ||
| L_bas -0.1703 0.092 -1.842 0.066 -0.351 0.011 | ||
| P_bas -0.9966 0.139 -7.166 0.000 -1.269 -0.724 | ||
| followup 0.0906 0.022 4.164 0.000 0.048 0.133 | ||
| followup_sq -0.0007 0.000 -3.678 0.000 -0.001 -0.000 | ||
| trial -0.0695 0.014 -4.934 0.000 -0.097 -0.042 | ||
| trial_sq 0.0006 0.000 3.788 0.000 0.000 0.001 | ||
| =============================================================================== | ||
| ``` | ||
|
|
||
| ### Weighting Statistics | ||
|
|
||
| | weight_min | weight_max | weight_mean | weight_std | weight_p01 | weight_p25 | weight_p50 | weight_p75 | weight_p99 | | ||
| |-------------:|-------------:|--------------:|-------------:|-------------:|-------------:|-------------:|-------------:|-------------:| | ||
| | 3.11308e-08 | 9.08003e+30 | 1.97762e+26 | 3.39822e+28 | 0.260691 | 0.853367 | 1.02192 | 1.28444 | 30119 | | ||
|
|
||
| ## Outcome | ||
|
|
||
| After weight information, we begin to gather information about the outcome model itself. This comes from the `fit` whereas survival information (or risk/incidence depending on your specifications) comes from `survival`. | ||
|
|
||
| ### Outcome Model | ||
|
|
||
| ``` | ||
| Generalized Linear Model Regression Results | ||
| ============================================================================== | ||
| Dep. Variable: outcome No. Observations: 658971 | ||
| Model: GLM Df Residuals: 658961 | ||
| Model Family: Binomial Df Model: 9 | ||
| Link Function: Logit Scale: 1.0000 | ||
| Method: IRLS Log-Likelihood: -2844.7 | ||
| Date: Wed, 10 Dec 2025 Deviance: 5689.4 | ||
| Time: 10:18:38 Pearson chi2: 6.80e+05 | ||
| No. Iterations: 11 Pseudo R-squ. (CS): 0.0001638 | ||
| Covariance Type: nonrobust | ||
| ==================================================================================== | ||
| coef std err z P>|z| [0.025 0.975] | ||
| ------------------------------------------------------------------------------------ | ||
| Intercept -23.1102 2.697 -8.569 0.000 -28.396 -17.824 | ||
| tx_init_bas[T.1] -0.2221 0.185 -1.203 0.229 -0.584 0.140 | ||
| sex[T.1] -0.5588 0.113 -4.942 0.000 -0.780 -0.337 | ||
| followup 0.0060 0.015 0.416 0.678 -0.022 0.035 | ||
| followup_sq 0.0001 0.000 0.465 0.642 -0.000 0.001 | ||
| trial 0.3377 0.054 6.274 0.000 0.232 0.443 | ||
| trial_sq -0.0022 0.001 -4.223 0.000 -0.003 -0.001 | ||
| N_bas -0.0007 0.011 -0.066 0.947 -0.022 0.021 | ||
| L_bas -0.3595 0.072 -4.962 0.000 -0.501 -0.217 | ||
| P_bas 1.6141 0.281 5.752 0.000 1.064 2.164 | ||
| ==================================================================================== | ||
| ``` | ||
|
|
||
| ### Survival | ||
|
|
||
| If we enable `km_curves` in our options, we can extract risk information between treatment values. These will be returned in the table below. Additionally, plots you create will be stored here. | ||
|
|
||
| To note, you can see here we have a risk plot. If you would like a different plot, you can simply specify another plot to be made when calling the class method {py:meth}`~pySEQTarget.SEQuential.plot`. This can be done on any `SEQuential` class object, or when collecting, you can also access the data used to create these plots with | ||
|
|
||
| ```python | ||
| survival_data = my_output.retrieve_data("km_data") | ||
| ``` | ||
|
|
||
| #### Risk Differences | ||
|
|
||
| | A_x | A_y | Risk Difference | RD 95% LCI | RD 95% UCI | | ||
| |------:|------:|------------------:|-------------:|-------------:| | ||
| | 0 | 1 | 0.00859802 | -0.169438 | 0.186634 | | ||
| | 1 | 0 | -0.00859802 | -0.186634 | 0.169438 | | ||
|
|
||
| #### Risk Ratios | ||
|
|
||
| | A_x | A_y | Risk Ratio | RR 95% LCI | RR 95% UCI | | ||
| |------:|------:|-------------:|-------------:|-------------:| | ||
| | 0 | 1 | 1.24069 | 0.0121904 | 126.272 | | ||
| | 1 | 0 | 0.806005 | 0.00791939 | 82.032 | | ||
|
|
||
| #### Survival Curves | ||
|
|
||
|  | ||
|
|
||
| ## Diagnostic Tables | ||
|
|
||
| After all of our primary results, we are met with a few diagnostic tables. These contain useful information to the expanded dataset. Tables with the title 'unique' indicates that one ID can attribute once to the count, e.g. ID A101 in the expanded framework has an outcome in Trial 1, and 2 while on treatment regime = 1. In the unique case, they would only attribute to one count, in the non-unique case, both trials would be included. | ||
|
|
||
| Because we have an excused-censoring analysis, we are also provided with information about switches from treatment as well as how many of these switches were excused. | ||
|
|
||
| ### Unique Outcomes | ||
|
|
||
| | tx_init | outcome | len | | ||
| |----------:|----------:|------:| | ||
| | 0 | 0 | 249 | | ||
| | 1 | 1 | 8 | | ||
| | 0 | 1 | 4 | | ||
| | 1 | 0 | 715 | | ||
|
|
||
| ### Nonunique Outcomes | ||
|
|
||
| | tx_init | outcome | len | | ||
| |----------:|----------:|-------:| | ||
| | 0 | 1 | 73 | | ||
| | 1 | 0 | 546644 | | ||
| | 1 | 1 | 227 | | ||
| | 0 | 0 | 117007 | | ||
|
|
||
| ### Unique Switches | ||
|
|
||
| | tx_init | isExcused | switch | len | | ||
| |----------:|:------------|---------:|------:| | ||
| | 0 | True | 1 | 30 | | ||
| | 1 | False | 1 | 47 | | ||
| | 0 | False | 1 | 91 | | ||
| | 1 | True | 1 | 32 | | ||
| | 0 | False | 0 | 132 | | ||
| | 1 | False | 0 | 644 | | ||
|
|
||
| ### Nonunique Switches | ||
|
|
||
| | tx_init | isExcused | switch | len | | ||
| |----------:|:------------|---------:|-------:| | ||
| | 0 | True | 0 | 22056 | | ||
| | 0 | False | 1 | 3724 | | ||
| | 1 | False | 1 | 1256 | | ||
| | 1 | False | 0 | 527107 | | ||
| | 1 | True | 0 | 18508 | | ||
| | 0 | False | 0 | 91300 | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # Getting Started | ||
|
|
||
| Getting started with SEQuential is hopefully quite easy. The primary flow is to define your options through `SEQopts`, and then build and modify the state of the `SEQuential` class. Let's move through a basic tutorial. | ||
|
|
||
| ## A Simple Analysis | ||
|
|
||
| Let's create a motivating example - we are primarily interested in a treatment's effectiveness, based on the initial treatment assignment, and how this differs between `sex` in our fabricated cohort. Assuming we already have the package installed and it is accessible to our python environment, we can dive into building our options: | ||
|
|
||
| A full list of options is available in the documentation under {py:class}`~pySEQtarget.SEQopts` | ||
|
|
||
| ## Setup | ||
|
|
||
| ```python | ||
| from pySEQTarget import SEQopts | ||
| my_options = SEQopts(subgroup_colname = "sex", | ||
| km_curves = True) | ||
| ``` | ||
|
|
||
| We don't have too many options available to use as we run an ITT analysis. Except in certain cases, this is an unweighted analysis, which is what many of the options interact with. | ||
|
|
||
| ## Initializing our primary 'Driver' | ||
|
|
||
| Now, we begin our analysis - this amounts to creating and modifying the state of our SEQuential class. Nothing is returned until a call to {py:meth}`~pySEQTarget.SEQuential.collect` is made, which will return all results created to the point of collection. | ||
|
|
||
| ```python | ||
| from pySEQTarget import SEQuential | ||
| from pySEQTarget.data import load_data | ||
|
|
||
| # Load sample data | ||
| data = load_data("SEQdata") | ||
|
|
||
| # Initialize the class | ||
| my_analysis = SEQuential(data, | ||
| id_col="ID", | ||
| time_col="time", | ||
| eligible_col="eligible", | ||
| treatment_col="tx_init", | ||
| outcome_col="outcome", | ||
| time_varying_cols=["N", "L", "P"], | ||
| fixed_cols=["sex"], | ||
| method="ITT", | ||
| parameters=my_options) | ||
| ``` | ||
|
|
||
| ## Building our analysis | ||
|
|
||
| Now that we've initialized our class a few things have happened, our covariates have been created and stored, and our parameters have been checked. If there is no error, we are ready to build our analysis! | ||
|
|
||
| ### Creating the nested target trial framework | ||
|
|
||
| ```python | ||
| my_analysis.expand() | ||
| ``` | ||
|
|
||
| In this code snippet, we access the class method {py:meth}`~pySEQTarget.SEQuential.expand` which builds our target trial framework. This internally creates a `DT` attribute (our expanded data). | ||
|
|
||
| ### Fitting our model | ||
|
|
||
| ```python | ||
| my_analysis.fit() | ||
| ``` | ||
|
|
||
| Since this is a relatively simple model, we can immediately move to fitting out model. Like most other python packages, this is done by calling {py:meth}`~pySEQTarget.SEQuential.fit`. This again doesn't return anything, but will add the outcome model to our internal class state. | ||
| At this point there are results to collect, so we could inspect them; however, let's save that for after building our survival curves and risk data. | ||
|
|
||
| ### 'Predicting' from our Model | ||
| Canonically in Python, we usually call a `predict` method. `SEQuential` handles this internally, and instead of the usual `predict`, survival, risk, and incidence rates are derived from {py:meth}`~pySEQTarget.SEQuential.survival`. Again at this point we could collect our results and have the majority of our results; however, `SEQuential` will also plot our data for us. | ||
|
|
||
| ```python | ||
| my_analysis.survival() | ||
| my_analysis.plot() | ||
| ``` | ||
|
|
||
| ### Collecting our results | ||
|
|
||
| Now that we've reached the end of our analysis, we can call {py:meth}`~pySEQTarget.SEQuential.collect`. To note, we can always call collect at any step of the way if you want to collect any results and check them as they are being built, but you can also do this by accessing the internal state of the class. The collection here, formally, sends all results currently made into an output class {py:class}`~pySEQTarget.SEQoutput` which has some handy tools for accessing results. | ||
|
|
||
| ```python | ||
| my_output = my_analysis.collect() | ||
| ``` | ||
| Now that we have created an object with our output class, the most immediate way to recover results is to dump everything to markdown or pdf using {py:meth}`~pySEQTarget.SEQoutput.to_md` or {py:meth}`~pySEQTarget.SEQoutput.to_pdf` respectively. | ||
|
|
||
| ```python | ||
| my_output.to_md() | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| # More Advanced Analysis | ||
| In getting started, we covered some of the basics for getting up and running on a simple analysis, but there are many options stored within `SEQuential`, or more aptly, many more parameters to play with in {py:class}`~pySEQTarget.SEQopts`. Let's cover a more in-depth analysis. | ||
|
|
||
| In this case, let's go over a censoring analysis with excused conditions and stabilized weighting, limiting weights to the 99th percentile, and adjusting for losses-to-followup. Futhermore, we are interested in bootstrapping our results to get a risk estimate with confidence bounds and for ease of computation, we are going to randomly downsample 30% of trials which did not initiate treatment. Because we are downsampling, we are additionally going to turn off the lag condition for our adherance weights. | ||
|
|
||
| If you are coming from the R version, many arguments have been streamlined or inferred - take R's `bootstrap`, and `bootstrap.nboot` - these have been merged such that any `bootstrap_nboot` over 0 automatically starts the bootstrap initiation. | ||
|
|
||
| ## Setting up our analysis | ||
|
|
||
| In similar fashion to our process in getting started, we begin by setting up our SEQopts | ||
|
|
||
| ```python | ||
| from pySEQTarget import SEQopts | ||
| from pySEQTarget.data import load_data | ||
|
|
||
| data = load_data("SEQdata_LTFU") | ||
| my_options = SEQopts( | ||
| bootstrap_nboot = 20, # 20 bootstrap iterations | ||
| cense_colname = "LTFU", # control for losses-to-followup as a censor | ||
| excused = True, # allow excused treatment swapping | ||
| excused_colnames = ["excusedZero", "excusedOne"], | ||
| km_curves = True, # run survival estimates | ||
| selection_random = True, # randomly sample treatment non-initiators | ||
| selection_sample = 0.30, # sample 30% of treatment non-initiators | ||
| weighted = True, # enables the weighting | ||
| weight_lag_condition=False, # turn off lag condition when weighting for adherance | ||
| weight_p99 = True, # bounds weights by the 1st and 99th percentile | ||
| weight_preexpansion = False # weights are predicted using post-expansion data as a stabilizer | ||
| ) | ||
| ``` | ||
|
|
||
| ## Running our Analysis | ||
|
|
||
| Now that we have our setup, it is time to repeat the analytical pipeline. From here on, not much differs. | ||
|
|
||
| ```python | ||
| from pySEQTarget import SEQuential | ||
|
|
||
| my_analysis = SEQuential(data, | ||
| id_col="ID", | ||
| time_col="time", | ||
| eligible_col="eligible", | ||
| treatment_col="tx_init", | ||
| outcome_col="outcome", | ||
| time_varying_cols=["N", "L", "P"], | ||
| fixed_cols=["sex"], | ||
| method="censoring", | ||
| parameters=my_options) | ||
|
|
||
| # Expand the data | ||
| my_analysis.expand() | ||
| ``` | ||
|
|
||
| ### A quick note about bootstrapping | ||
|
|
||
| The key difference, when bootstrapping, is that you will additionally have to call {py:meth}`~pySEQTarget.SEQuential.bootstrap`. This initializes the underlying randomization with replacement. Note that if you've forgotten to enable bootstrapping initially in your `SEQopts` you can do this here as well. | ||
|
|
||
| ```python | ||
| my_analysis.bootstrap() | ||
| ``` | ||
|
|
||
| ## Back to our analysis | ||
|
|
||
| Now that the underlying bootstrap structure has been in place, we can simply continue as we would in simpler models- fit, survival, plot, collect, and dump. | ||
|
|
||
| ```python | ||
| my_analysis.fit() | ||
| my_analysis.survival() | ||
| my_analysis.plot() | ||
|
|
||
| my_output = my_analysis.collect() | ||
| my_output.to_md() | ||
| ``` | ||
|
|
||
| ## That's it? | ||
|
|
||
| Yes! There are very few differences between the code for more straightforward and more difficult analyses using this package. Our hope is that through utilizing almost only the SEQopts to work with your analysis, that this is a streamlined process that is also easy to manipulate. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it would be worth adding a python chunk here with the code that generated the results - because without that a user has a harder time using this vignette.