Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,16 @@
.. toctree::
:maxdepth: 2
:hidden:
:caption: Contents:
:caption: Interface:

sequential/seqopts
sequential/sequential
sequential/seqoutput

.. toctree::
:maxdepth: 2
:caption: Vignettes:

vignettes/getting_started
vignettes/more_advanced_models
vignettes/exploring_results
Binary file added docs/vignettes/SEQuential_results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
188 changes: 188 additions & 0 deletions docs/vignettes/exploring_results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# Exploring Results

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it would be worth adding a python chunk here with the code that generated the results - because without that a user has a harder time using this vignette.

Recall our previous example, {doc}`~vignettes/more_advanced_models`, where we finalized and collected our results with

```python
my_output = my_analysis.collect()
my_output.to_md()
```
Let us now go over what the dump to md looks like and explore our output in further detail.

## SEQuential Analysis: {date}: censoring

## Weighting

We begin by exploring the weight models, this gives us general information about the numerator and denominator models, as well as weight statistics before applying any limits. If you recall, we imposed weight bounds at the 99th percentile. This means that in the outcome model our weights will be bound at [0.273721, 423.185]. Note that in real analysis, we would hope the weights are stabilized further. Using the generated data, specifically with excused analysis, often result in larger-than-intended weights.

We should also note here that in excused-censoring analysis, our adherance models hold `switch` as the dependent variable. In all non-excused cases, this would normally be your treatment value.

### Numerator Model

```
MNLogit Regression Results
==============================================================================
Dep. Variable: switch No. Observations: 65375
Model: MNLogit Df Residuals: 65366
Method: MLE Df Model: 8
Date: Wed, 10 Dec 2025 Pseudo R-squ.: 0.008332
Time: 10:18:38 Log-Likelihood: -13986.
converged: True LL-Null: -14103.
Covariance Type: nonrobust LLR p-value: 2.560e-46
===============================================================================
switch=1 coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept -0.8797 0.692 -1.271 0.204 -2.236 0.477
sex[T.1] -0.0461 0.035 -1.325 0.185 -0.114 0.022
N_bas 0.0026 0.003 0.741 0.459 -0.004 0.009
L_bas 0.3368 0.032 10.553 0.000 0.274 0.399
P_bas -0.1864 0.073 -2.556 0.011 -0.329 -0.043
followup -0.0211 0.006 -3.822 0.000 -0.032 -0.010
followup_sq 0.0001 0.000 0.637 0.524 -0.000 0.000
trial -0.0624 0.014 -4.430 0.000 -0.090 -0.035
trial_sq 0.0004 0.000 2.309 0.021 5.78e-05 0.001
===============================================================================
```

### Denominator Model

```
MNLogit Regression Results
==============================================================================
Dep. Variable: switch No. Observations: 65375
Model: MNLogit Df Residuals: 65363
Method: MLE Df Model: 11
Date: Wed, 10 Dec 2025 Pseudo R-squ.: 0.01586
Time: 10:18:38 Log-Likelihood: -13880.
converged: True LL-Null: -14103.
Covariance Type: nonrobust LLR p-value: 5.374e-89
===============================================================================
switch=1 coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept -1.4384 0.715 -2.012 0.044 -2.840 -0.037
sex[T.1] -0.0447 0.035 -1.281 0.200 -0.113 0.024
N -0.0195 0.003 -5.655 0.000 -0.026 -0.013
L 0.3719 0.062 6.025 0.000 0.251 0.493
P 0.9362 0.139 6.723 0.000 0.663 1.209
N_bas 0.0023 0.003 0.674 0.501 -0.004 0.009
L_bas -0.1703 0.092 -1.842 0.066 -0.351 0.011
P_bas -0.9966 0.139 -7.166 0.000 -1.269 -0.724
followup 0.0906 0.022 4.164 0.000 0.048 0.133
followup_sq -0.0007 0.000 -3.678 0.000 -0.001 -0.000
trial -0.0695 0.014 -4.934 0.000 -0.097 -0.042
trial_sq 0.0006 0.000 3.788 0.000 0.000 0.001
===============================================================================
```

### Weighting Statistics

| weight_min | weight_max | weight_mean | weight_std | weight_p01 | weight_p25 | weight_p50 | weight_p75 | weight_p99 |
|-------------:|-------------:|--------------:|-------------:|-------------:|-------------:|-------------:|-------------:|-------------:|
| 3.11308e-08 | 9.08003e+30 | 1.97762e+26 | 3.39822e+28 | 0.260691 | 0.853367 | 1.02192 | 1.28444 | 30119 |

## Outcome

After weight information, we begin to gather information about the outcome model itself. This comes from the `fit` whereas survival information (or risk/incidence depending on your specifications) comes from `survival`.

### Outcome Model

```
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: outcome No. Observations: 658971
Model: GLM Df Residuals: 658961
Model Family: Binomial Df Model: 9
Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -2844.7
Date: Wed, 10 Dec 2025 Deviance: 5689.4
Time: 10:18:38 Pearson chi2: 6.80e+05
No. Iterations: 11 Pseudo R-squ. (CS): 0.0001638
Covariance Type: nonrobust
====================================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------------
Intercept -23.1102 2.697 -8.569 0.000 -28.396 -17.824
tx_init_bas[T.1] -0.2221 0.185 -1.203 0.229 -0.584 0.140
sex[T.1] -0.5588 0.113 -4.942 0.000 -0.780 -0.337
followup 0.0060 0.015 0.416 0.678 -0.022 0.035
followup_sq 0.0001 0.000 0.465 0.642 -0.000 0.001
trial 0.3377 0.054 6.274 0.000 0.232 0.443
trial_sq -0.0022 0.001 -4.223 0.000 -0.003 -0.001
N_bas -0.0007 0.011 -0.066 0.947 -0.022 0.021
L_bas -0.3595 0.072 -4.962 0.000 -0.501 -0.217
P_bas 1.6141 0.281 5.752 0.000 1.064 2.164
====================================================================================
```

### Survival

If we enable `km_curves` in our options, we can extract risk information between treatment values. These will be returned in the table below. Additionally, plots you create will be stored here.

To note, you can see here we have a risk plot. If you would like a different plot, you can simply specify another plot to be made when calling the class method {py:meth}`~pySEQTarget.SEQuential.plot`. This can be done on any `SEQuential` class object, or when collecting, you can also access the data used to create these plots with

```python
survival_data = my_output.retrieve_data("km_data")
```

#### Risk Differences

| A_x | A_y | Risk Difference | RD 95% LCI | RD 95% UCI |
|------:|------:|------------------:|-------------:|-------------:|
| 0 | 1 | 0.00859802 | -0.169438 | 0.186634 |
| 1 | 0 | -0.00859802 | -0.186634 | 0.169438 |

#### Risk Ratios

| A_x | A_y | Risk Ratio | RR 95% LCI | RR 95% UCI |
|------:|------:|-------------:|-------------:|-------------:|
| 0 | 1 | 1.24069 | 0.0121904 | 126.272 |
| 1 | 0 | 0.806005 | 0.00791939 | 82.032 |

#### Survival Curves

![Kaplan-Meier Survival Curves](SEQuential_results.png)

## Diagnostic Tables

After all of our primary results, we are met with a few diagnostic tables. These contain useful information to the expanded dataset. Tables with the title 'unique' indicates that one ID can attribute once to the count, e.g. ID A101 in the expanded framework has an outcome in Trial 1, and 2 while on treatment regime = 1. In the unique case, they would only attribute to one count, in the non-unique case, both trials would be included.

Because we have an excused-censoring analysis, we are also provided with information about switches from treatment as well as how many of these switches were excused.

### Unique Outcomes

| tx_init | outcome | len |
|----------:|----------:|------:|
| 0 | 0 | 249 |
| 1 | 1 | 8 |
| 0 | 1 | 4 |
| 1 | 0 | 715 |

### Nonunique Outcomes

| tx_init | outcome | len |
|----------:|----------:|-------:|
| 0 | 1 | 73 |
| 1 | 0 | 546644 |
| 1 | 1 | 227 |
| 0 | 0 | 117007 |

### Unique Switches

| tx_init | isExcused | switch | len |
|----------:|:------------|---------:|------:|
| 0 | True | 1 | 30 |
| 1 | False | 1 | 47 |
| 0 | False | 1 | 91 |
| 1 | True | 1 | 32 |
| 0 | False | 0 | 132 |
| 1 | False | 0 | 644 |

### Nonunique Switches

| tx_init | isExcused | switch | len |
|----------:|:------------|---------:|-------:|
| 0 | True | 0 | 22056 |
| 0 | False | 1 | 3724 |
| 1 | False | 1 | 1256 |
| 1 | False | 0 | 527107 |
| 1 | True | 0 | 18508 |
| 0 | False | 0 | 91300 |
85 changes: 85 additions & 0 deletions docs/vignettes/getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Getting Started

Getting started with SEQuential is hopefully quite easy. The primary flow is to define your options through `SEQopts`, and then build and modify the state of the `SEQuential` class. Let's move through a basic tutorial.

## A Simple Analysis

Let's create a motivating example - we are primarily interested in a treatment's effectiveness, based on the initial treatment assignment, and how this differs between `sex` in our fabricated cohort. Assuming we already have the package installed and it is accessible to our python environment, we can dive into building our options:

A full list of options is available in the documentation under {py:class}`~pySEQtarget.SEQopts`

## Setup

```python
from pySEQTarget import SEQopts
my_options = SEQopts(subgroup_colname = "sex",
km_curves = True)
```

We don't have too many options available to use as we run an ITT analysis. Except in certain cases, this is an unweighted analysis, which is what many of the options interact with.

## Initializing our primary 'Driver'

Now, we begin our analysis - this amounts to creating and modifying the state of our SEQuential class. Nothing is returned until a call to {py:meth}`~pySEQTarget.SEQuential.collect` is made, which will return all results created to the point of collection.

```python
from pySEQTarget import SEQuential
from pySEQTarget.data import load_data

# Load sample data
data = load_data("SEQdata")

# Initialize the class
my_analysis = SEQuential(data,
id_col="ID",
time_col="time",
eligible_col="eligible",
treatment_col="tx_init",
outcome_col="outcome",
time_varying_cols=["N", "L", "P"],
fixed_cols=["sex"],
method="ITT",
parameters=my_options)
```

## Building our analysis

Now that we've initialized our class a few things have happened, our covariates have been created and stored, and our parameters have been checked. If there is no error, we are ready to build our analysis!

### Creating the nested target trial framework

```python
my_analysis.expand()
```

In this code snippet, we access the class method {py:meth}`~pySEQTarget.SEQuential.expand` which builds our target trial framework. This internally creates a `DT` attribute (our expanded data).

### Fitting our model

```python
my_analysis.fit()
```

Since this is a relatively simple model, we can immediately move to fitting out model. Like most other python packages, this is done by calling {py:meth}`~pySEQTarget.SEQuential.fit`. This again doesn't return anything, but will add the outcome model to our internal class state.
At this point there are results to collect, so we could inspect them; however, let's save that for after building our survival curves and risk data.

### 'Predicting' from our Model
Canonically in Python, we usually call a `predict` method. `SEQuential` handles this internally, and instead of the usual `predict`, survival, risk, and incidence rates are derived from {py:meth}`~pySEQTarget.SEQuential.survival`. Again at this point we could collect our results and have the majority of our results; however, `SEQuential` will also plot our data for us.

```python
my_analysis.survival()
my_analysis.plot()
```

### Collecting our results

Now that we've reached the end of our analysis, we can call {py:meth}`~pySEQTarget.SEQuential.collect`. To note, we can always call collect at any step of the way if you want to collect any results and check them as they are being built, but you can also do this by accessing the internal state of the class. The collection here, formally, sends all results currently made into an output class {py:class}`~pySEQTarget.SEQoutput` which has some handy tools for accessing results.

```python
my_output = my_analysis.collect()
```
Now that we have created an object with our output class, the most immediate way to recover results is to dump everything to markdown or pdf using {py:meth}`~pySEQTarget.SEQoutput.to_md` or {py:meth}`~pySEQTarget.SEQoutput.to_pdf` respectively.

```python
my_output.to_md()
```
77 changes: 77 additions & 0 deletions docs/vignettes/more_advanced_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# More Advanced Analysis
In getting started, we covered some of the basics for getting up and running on a simple analysis, but there are many options stored within `SEQuential`, or more aptly, many more parameters to play with in {py:class}`~pySEQTarget.SEQopts`. Let's cover a more in-depth analysis.

In this case, let's go over a censoring analysis with excused conditions and stabilized weighting, limiting weights to the 99th percentile, and adjusting for losses-to-followup. Futhermore, we are interested in bootstrapping our results to get a risk estimate with confidence bounds and for ease of computation, we are going to randomly downsample 30% of trials which did not initiate treatment. Because we are downsampling, we are additionally going to turn off the lag condition for our adherance weights.

If you are coming from the R version, many arguments have been streamlined or inferred - take R's `bootstrap`, and `bootstrap.nboot` - these have been merged such that any `bootstrap_nboot` over 0 automatically starts the bootstrap initiation.

## Setting up our analysis

In similar fashion to our process in getting started, we begin by setting up our SEQopts

```python
from pySEQTarget import SEQopts
from pySEQTarget.data import load_data

data = load_data("SEQdata_LTFU")
my_options = SEQopts(
bootstrap_nboot = 20, # 20 bootstrap iterations
cense_colname = "LTFU", # control for losses-to-followup as a censor
excused = True, # allow excused treatment swapping
excused_colnames = ["excusedZero", "excusedOne"],
km_curves = True, # run survival estimates
selection_random = True, # randomly sample treatment non-initiators
selection_sample = 0.30, # sample 30% of treatment non-initiators
weighted = True, # enables the weighting
weight_lag_condition=False, # turn off lag condition when weighting for adherance
weight_p99 = True, # bounds weights by the 1st and 99th percentile
weight_preexpansion = False # weights are predicted using post-expansion data as a stabilizer
)
```

## Running our Analysis

Now that we have our setup, it is time to repeat the analytical pipeline. From here on, not much differs.

```python
from pySEQTarget import SEQuential

my_analysis = SEQuential(data,
id_col="ID",
time_col="time",
eligible_col="eligible",
treatment_col="tx_init",
outcome_col="outcome",
time_varying_cols=["N", "L", "P"],
fixed_cols=["sex"],
method="censoring",
parameters=my_options)

# Expand the data
my_analysis.expand()
```

### A quick note about bootstrapping

The key difference, when bootstrapping, is that you will additionally have to call {py:meth}`~pySEQTarget.SEQuential.bootstrap`. This initializes the underlying randomization with replacement. Note that if you've forgotten to enable bootstrapping initially in your `SEQopts` you can do this here as well.

```python
my_analysis.bootstrap()
```

## Back to our analysis

Now that the underlying bootstrap structure has been in place, we can simply continue as we would in simpler models- fit, survival, plot, collect, and dump.

```python
my_analysis.fit()
my_analysis.survival()
my_analysis.plot()

my_output = my_analysis.collect()
my_output.to_md()
```

## That's it?

Yes! There are very few differences between the code for more straightforward and more difficult analyses using this package. Our hope is that through utilizing almost only the SEQopts to work with your analysis, that this is a streamlined process that is also easy to manipulate.
Loading