CausalInference · ryan-odea · Dec 10, 2025 · Dec 10, 2025 · Dec 10, 2025 · Dec 10, 2025
diff --git a/docs/index.rst b/docs/index.rst
@@ -5,8 +5,16 @@
 .. toctree::
    :maxdepth: 2
    :hidden:
-   :caption: Contents:
+   :caption: Interface:
 
    sequential/seqopts
    sequential/sequential
    sequential/seqoutput
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Vignettes:
+
+   vignettes/getting_started
+   vignettes/more_advanced_models
+   vignettes/exploring_results
diff --git a/docs/vignettes/SEQuential_results.png b/docs/vignettes/SEQuential_results.png
diff --git a/docs/vignettes/exploring_results.md b/docs/vignettes/exploring_results.md
@@ -0,0 +1,188 @@
+# Exploring Results
+
+Recall our previous example, {doc}`~vignettes/more_advanced_models`, where we finalized and collected our results with 
+
+```python
+my_output = my_analysis.collect()
+my_output.to_md()
+```
+Let us now go over what the dump to md looks like and explore our output in further detail.
+
+## SEQuential Analysis: {date}: censoring
+
+## Weighting
+
+We begin by exploring the weight models, this gives us general information about the numerator and denominator models, as well as weight statistics before applying any limits. If you recall, we imposed weight bounds at the 99th percentile. This means that in the outcome model our weights will be bound at [0.273721, 423.185]. Note that in real analysis, we would hope the weights are stabilized further. Using the generated data, specifically with excused analysis, often result in larger-than-intended weights.
+
+We should also note here that in excused-censoring analysis, our adherance models hold `switch` as the dependent variable. In all non-excused cases, this would normally be your treatment value.
+
+### Numerator Model
+
+```
+                          MNLogit Regression Results                          
+==============================================================================
+Dep. Variable:                 switch   No. Observations:                65375
+Model:                        MNLogit   Df Residuals:                    65366
+Method:                           MLE   Df Model:                            8
+Date:                Wed, 10 Dec 2025   Pseudo R-squ.:                0.008332
+Time:                        10:18:38   Log-Likelihood:                -13986.
+converged:                       True   LL-Null:                       -14103.
+Covariance Type:            nonrobust   LLR p-value:                 2.560e-46
+===============================================================================
+   switch=1       coef    std err          z      P>|z|      [0.025      0.975]
+-------------------------------------------------------------------------------
+Intercept      -0.8797      0.692     -1.271      0.204      -2.236       0.477
+sex[T.1]       -0.0461      0.035     -1.325      0.185      -0.114       0.022
+N_bas           0.0026      0.003      0.741      0.459      -0.004       0.009
+L_bas           0.3368      0.032     10.553      0.000       0.274       0.399
+P_bas          -0.1864      0.073     -2.556      0.011      -0.329      -0.043
+followup       -0.0211      0.006     -3.822      0.000      -0.032      -0.010
+followup_sq     0.0001      0.000      0.637      0.524      -0.000       0.000
+trial          -0.0624      0.014     -4.430      0.000      -0.090      -0.035
+trial_sq        0.0004      0.000      2.309      0.021    5.78e-05       0.001
+===============================================================================
+```
+
+### Denominator Model
+
+```
+                          MNLogit Regression Results                          
+==============================================================================
+Dep. Variable:                 switch   No. Observations:                65375
+Model:                        MNLogit   Df Residuals:                    65363
+Method:                           MLE   Df Model:                           11
+Date:                Wed, 10 Dec 2025   Pseudo R-squ.:                 0.01586
+Time:                        10:18:38   Log-Likelihood:                -13880.
+converged:                       True   LL-Null:                       -14103.
+Covariance Type:            nonrobust   LLR p-value:                 5.374e-89
+===============================================================================
+   switch=1       coef    std err          z      P>|z|      [0.025      0.975]
+-------------------------------------------------------------------------------
+Intercept      -1.4384      0.715     -2.012      0.044      -2.840      -0.037
+sex[T.1]       -0.0447      0.035     -1.281      0.200      -0.113       0.024
+N              -0.0195      0.003     -5.655      0.000      -0.026      -0.013
+L               0.3719      0.062      6.025      0.000       0.251       0.493
+P               0.9362      0.139      6.723      0.000       0.663       1.209
+N_bas           0.0023      0.003      0.674      0.501      -0.004       0.009
+L_bas          -0.1703      0.092     -1.842      0.066      -0.351       0.011
+P_bas          -0.9966      0.139     -7.166      0.000      -1.269      -0.724
+followup        0.0906      0.022      4.164      0.000       0.048       0.133
+followup_sq    -0.0007      0.000     -3.678      0.000      -0.001      -0.000
+trial          -0.0695      0.014     -4.934      0.000      -0.097      -0.042
+trial_sq        0.0006      0.000      3.788      0.000       0.000       0.001
+===============================================================================
+```
+
+### Weighting Statistics
+
+|   weight_min |   weight_max |   weight_mean |   weight_std |   weight_p01 |   weight_p25 |   weight_p50 |   weight_p75 |   weight_p99 |
+|-------------:|-------------:|--------------:|-------------:|-------------:|-------------:|-------------:|-------------:|-------------:|
+|  3.11308e-08 |  9.08003e+30 |   1.97762e+26 |  3.39822e+28 |     0.260691 |     0.853367 |      1.02192 |      1.28444 |        30119 |
+
+## Outcome
+
+After weight information, we begin to gather information about the outcome model itself. This comes from the `fit` whereas survival information (or risk/incidence depending on your specifications) comes from `survival`.
+
+### Outcome Model
+
+```
+                 Generalized Linear Model Regression Results                  
+==============================================================================
+Dep. Variable:                outcome   No. Observations:               658971
+Model:                            GLM   Df Residuals:                   658961
+Model Family:                Binomial   Df Model:                            9
+Link Function:                  Logit   Scale:                          1.0000
+Method:                          IRLS   Log-Likelihood:                -2844.7
+Date:                Wed, 10 Dec 2025   Deviance:                       5689.4
+Time:                        10:18:38   Pearson chi2:                 6.80e+05
+No. Iterations:                    11   Pseudo R-squ. (CS):          0.0001638
+Covariance Type:            nonrobust                                         
+====================================================================================
+                       coef    std err          z      P>|z|      [0.025      0.975]
+------------------------------------------------------------------------------------
+Intercept          -23.1102      2.697     -8.569      0.000     -28.396     -17.824
+tx_init_bas[T.1]    -0.2221      0.185     -1.203      0.229      -0.584       0.140
+sex[T.1]            -0.5588      0.113     -4.942      0.000      -0.780      -0.337
+followup             0.0060      0.015      0.416      0.678      -0.022       0.035
+followup_sq          0.0001      0.000      0.465      0.642      -0.000       0.001
+trial                0.3377      0.054      6.274      0.000       0.232       0.443
+trial_sq            -0.0022      0.001     -4.223      0.000      -0.003      -0.001
+N_bas               -0.0007      0.011     -0.066      0.947      -0.022       0.021
+L_bas               -0.3595      0.072     -4.962      0.000      -0.501      -0.217
+P_bas                1.6141      0.281      5.752      0.000       1.064       2.164
+====================================================================================
+```
+
+### Survival
+
+If we enable `km_curves` in our options, we can extract risk information between treatment values. These will be returned in the table below. Additionally, plots you create will be stored here.
+
+To note, you can see here we have a risk plot. If you would like a different plot, you can simply specify another plot to be made when calling the class method {py:meth}`~pySEQTarget.SEQuential.plot`. This can be done on any `SEQuential` class object, or when collecting, you can also access the data used to create these plots with
+
+```python
+survival_data = my_output.retrieve_data("km_data")
+```
+
+#### Risk Differences
+
+|   A_x |   A_y |   Risk Difference |   RD 95% LCI |   RD 95% UCI |
+|------:|------:|------------------:|-------------:|-------------:|
+|     0 |     1 |        0.00859802 |    -0.169438 |     0.186634 |
+|     1 |     0 |       -0.00859802 |    -0.186634 |     0.169438 |
+
+#### Risk Ratios
+
+|   A_x |   A_y |   Risk Ratio |   RR 95% LCI |   RR 95% UCI |
+|------:|------:|-------------:|-------------:|-------------:|
+|     0 |     1 |     1.24069  |   0.0121904  |      126.272 |
+|     1 |     0 |     0.806005 |   0.00791939 |       82.032 |
+
+#### Survival Curves
+
+![Kaplan-Meier Survival Curves](SEQuential_results.png)
+
+## Diagnostic Tables
+
+After all of our primary results, we are met with a few diagnostic tables. These contain useful information to the expanded dataset. Tables with the title 'unique' indicates that one ID can attribute once to the count, e.g. ID A101 in the expanded framework has an outcome in Trial 1, and 2 while on treatment regime = 1. In the unique case, they would only attribute to one count, in the non-unique case, both trials would be included.
+
+Because we have an excused-censoring analysis, we are also provided with information about switches from treatment as well as how many of these switches were excused.
+
+### Unique Outcomes
+
+|   tx_init |   outcome |   len |
+|----------:|----------:|------:|
+|         0 |         0 |   249 |
+|         1 |         1 |     8 |
+|         0 |         1 |     4 |
+|         1 |         0 |   715 |
+
+### Nonunique Outcomes
+
+|   tx_init |   outcome |    len |
+|----------:|----------:|-------:|
+|         0 |         1 |     73 |
+|         1 |         0 | 546644 |
+|         1 |         1 |    227 |
+|         0 |         0 | 117007 |
+
+### Unique Switches
+
+|   tx_init | isExcused   |   switch |   len |
+|----------:|:------------|---------:|------:|
+|         0 | True        |        1 |    30 |
+|         1 | False       |        1 |    47 |
+|         0 | False       |        1 |    91 |
+|         1 | True        |        1 |    32 |
+|         0 | False       |        0 |   132 |
+|         1 | False       |        0 |   644 |
+
+### Nonunique Switches
+
+|   tx_init | isExcused   |   switch |    len |
+|----------:|:------------|---------:|-------:|
+|         0 | True        |        0 |  22056 |
+|         0 | False       |        1 |   3724 |
+|         1 | False       |        1 |   1256 |
+|         1 | False       |        0 | 527107 |
+|         1 | True        |        0 |  18508 |
+|         0 | False       |        0 |  91300 |
diff --git a/docs/vignettes/getting_started.md b/docs/vignettes/getting_started.md
@@ -0,0 +1,85 @@
+# Getting Started
+
+Getting started with SEQuential is hopefully quite easy. The primary flow is to define your options through `SEQopts`, and then build and modify the state of the `SEQuential` class. Let's move through a basic tutorial.
+
+## A Simple Analysis
+
+Let's create a motivating example - we are primarily interested in a treatment's effectiveness, based on the initial treatment assignment, and how this differs between `sex` in our fabricated cohort. Assuming we already have the package installed and it is accessible to our python environment, we can dive into building our options:
+
+A full list of options is available in the documentation under {py:class}`~pySEQtarget.SEQopts`
+
+## Setup
+
+```python
+from pySEQTarget import SEQopts
+my_options = SEQopts(subgroup_colname = "sex",
+                     km_curves = True)
+```
+
+We don't have too many options available to use as we run an ITT analysis. Except in certain cases, this is an unweighted analysis, which is what many of the options interact with.
+
+## Initializing our primary 'Driver'
+
+Now, we begin our analysis - this amounts to creating and modifying the state of our SEQuential class. Nothing is returned until a call to {py:meth}`~pySEQTarget.SEQuential.collect` is made, which will return all results created to the point of collection. 
+
+```python
+from pySEQTarget import SEQuential
+from pySEQTarget.data import load_data
+
+# Load sample data 
+data = load_data("SEQdata")
+
+# Initialize the class
+my_analysis = SEQuential(data,
+                         id_col="ID",
+                         time_col="time",
+                         eligible_col="eligible",
+                         treatment_col="tx_init",
+                         outcome_col="outcome",
+                         time_varying_cols=["N", "L", "P"],
+                         fixed_cols=["sex"],
+                         method="ITT",
+                         parameters=my_options)
+```
+
+## Building our analysis
+
+Now that we've initialized our class a few things have happened, our covariates have been created and stored, and our parameters have been checked. If there is no error, we are ready to build our analysis!
+
+### Creating the nested target trial framework
+
+```python
+my_analysis.expand()
+```
+
+In this code snippet, we access the class method {py:meth}`~pySEQTarget.SEQuential.expand` which builds our target trial framework. This internally creates a `DT` attribute (our expanded data).
+
+### Fitting our model
+
+```python
+my_analysis.fit()
+```
+
+Since this is a relatively simple model, we can immediately move to fitting out model. Like most other python packages, this is done by calling {py:meth}`~pySEQTarget.SEQuential.fit`. This again doesn't return anything, but will add the outcome model to our internal class state.
+At this point there are results to collect, so we could inspect them; however, let's save that for after building our survival curves and risk data.
+
+### 'Predicting' from our Model
+Canonically in Python, we usually call a `predict` method. `SEQuential` handles this internally, and instead of the usual `predict`, survival, risk, and incidence rates are derived from {py:meth}`~pySEQTarget.SEQuential.survival`. Again at this point we could collect our results and have the majority of our results; however, `SEQuential` will also plot our data for us.
+
+```python
+my_analysis.survival()
+my_analysis.plot()
+```
+
+### Collecting our results
+
+Now that we've reached the end of our analysis, we can call {py:meth}`~pySEQTarget.SEQuential.collect`. To note, we can always call collect at any step of the way if you want to collect any results and check them as they are being built, but you can also do this by accessing the internal state of the class. The collection here, formally, sends all results currently made into an output class {py:class}`~pySEQTarget.SEQoutput` which has some handy tools for accessing results.
+
+```python
+my_output = my_analysis.collect()
+```
+Now that we have created an object with our output class, the most immediate way to recover results is to dump everything to markdown or pdf using {py:meth}`~pySEQTarget.SEQoutput.to_md` or {py:meth}`~pySEQTarget.SEQoutput.to_pdf` respectively.
+
+```python
+my_output.to_md()
+```
diff --git a/docs/vignettes/more_advanced_models.md b/docs/vignettes/more_advanced_models.md
@@ -0,0 +1,77 @@
+# More Advanced Analysis
+In getting started, we covered some of the basics for getting up and running on a simple analysis, but there are many options stored within `SEQuential`, or more aptly, many more parameters to play with in {py:class}`~pySEQTarget.SEQopts`. Let's cover a more in-depth analysis.
+
+In this case, let's go over a censoring analysis with excused conditions and stabilized weighting, limiting weights to the 99th percentile, and adjusting for losses-to-followup. Futhermore, we are interested in bootstrapping our results to get a risk estimate with confidence bounds and for ease of computation, we are going to randomly downsample 30% of trials which did not initiate treatment. Because we are downsampling, we are additionally going to turn off the lag condition for our adherance weights.
+
+If you are coming from the R version, many arguments have been streamlined or inferred - take R's `bootstrap`, and `bootstrap.nboot` - these have been merged such that any `bootstrap_nboot` over 0 automatically starts the bootstrap initiation.
+
+## Setting up our analysis
+
+In similar fashion to our process in getting started, we begin by setting up our SEQopts
+
+```python
+from pySEQTarget import SEQopts
+from pySEQTarget.data import load_data
+
+data = load_data("SEQdata_LTFU")
+my_options = SEQopts(
+    bootstrap_nboot = 20,       # 20 bootstrap iterations
+    cense_colname = "LTFU",      # control for losses-to-followup as a censor
+    excused = True,             # allow excused treatment swapping
+    excused_colnames = ["excusedZero", "excusedOne"],
+    km_curves = True,           # run survival estimates
+    selection_random = True,    #  randomly sample treatment non-initiators
+    selection_sample = 0.30,     # sample 30% of treatment non-initiators
+    weighted = True,            # enables the weighting
+    weight_lag_condition=False, # turn off lag condition when weighting for adherance
+    weight_p99 = True,          # bounds weights by the 1st and 99th percentile
+    weight_preexpansion = False # weights are predicted using post-expansion data as a stabilizer
+)
+```
+
+## Running our Analysis
+
+Now that we have our setup, it is time to repeat the analytical pipeline. From here on, not much differs.
+
+```python
+from pySEQTarget import SEQuential
+
+my_analysis = SEQuential(data,
+                         id_col="ID",
+                         time_col="time",
+                         eligible_col="eligible",
+                         treatment_col="tx_init",
+                         outcome_col="outcome",
+                         time_varying_cols=["N", "L", "P"],
+                         fixed_cols=["sex"],
+                         method="censoring",
+                         parameters=my_options)
+
+# Expand the data
+my_analysis.expand()
+```
+
+### A quick note about bootstrapping
+
+The key difference, when bootstrapping, is that you will additionally have to call {py:meth}`~pySEQTarget.SEQuential.bootstrap`. This initializes the underlying randomization with replacement. Note that if you've forgotten to enable bootstrapping initially in your `SEQopts` you can do this here as well.
+
+```python
+my_analysis.bootstrap()
+```
+
+## Back to our analysis
+
+Now that the underlying bootstrap structure has been in place, we can simply continue as we would in simpler models- fit, survival, plot, collect, and dump.
+
+```python
+my_analysis.fit()
+my_analysis.survival()
+my_analysis.plot()
+
+my_output = my_analysis.collect()
+my_output.to_md()
+```
+
+## That's it?
+
+Yes! There are very few differences between the code for more straightforward and more difficult analyses using this package. Our hope is that through utilizing almost only the SEQopts to work with your analysis, that this is a streamlined process that is also easy to manipulate.