-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit a5d66d7
Showing
151 changed files
with
40,517 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 7b035083ce19390472c8abc4f4e52395 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Empty file.
Binary file not shown.
346 changes: 346 additions & 0 deletions
346
_downloads/2f5ef11b55acdc3f7cfe565f64c3e9dd/example_01_basic_usage.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,346 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Set the environment such that multiple R processes do not crash the kernel\nimport os\nos.environ['KMP_DUPLICATE_LIB_OK']='True'%matplotlib inline" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n# 1. Basic Usage Guide\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
":code:`pymer4` comes with sample data for testing purposes which we'll utilize for most of the tutorials.\nThis sample data has:\n\n- Two kinds of dependent variables: *DV* (continuous), *DV_l* (dichotomous)\n- Three kinds of independent variables: *IV1* (continuous), *IV2* (continuous), *IV3* (categorical)\n- One grouping variable for multi-level modeling: *Group*.\n\nLet's check it out below:\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# import some basic libraries\nimport os\nimport pandas as pd\n\n# import utility function for sample data path\nfrom pymer4.utils import get_resource_path\n\n# Load and checkout sample data\ndf = pd.read_csv(os.path.join(get_resource_path(), \"sample_data.csv\"))\nprint(df.head())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Standard regression models\nFitting a standard regression model is accomplished using the :code:`Lm` model class in :code:`pymer4`. All we need to do is initialize a model with a formula, some data, and call its :code:`.fit()` method.\n\nBy default the output of :code:`.fit()` has been formated to be a blend of :code:`summary()` in R and :code:`.summary()` from [statsmodels](http://www.statsmodels.org/dev/index.html/). This includes metadata about the model, data, and overall fit as well as estimates and inference results of model terms.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Import the linear regression model class\nfrom pymer4.models import Lm\n\n# Initialize model using 2 predictors and sample data\nmodel = Lm(\"DV ~ IV1 + IV2\", data=df)\n\n# Fit it\nprint(model.fit())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"All information about the model as well as data, residuals, estimated coefficients, etc are saved as attributes and can be accessed like this:\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Print model AIC\nprint(model.AIC)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Look at residuals (just the first 10)\nprint(model.residuals[:10])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"A copy of the dataframe used to estimate the model with added columns for residuals and fits are are available at :code:`model.data`. Residuals and fits can also be directly accessed using :code:`model.residuals` and :code:`model.fits` respectively\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Look at model data\nprint(model.data.head())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This makes it easy to assess overall model fit visually, for example using seaborn\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# import dataviz\nimport seaborn as sns\n\n# plot model predicted values against true values\nsns.regplot(x=\"fits\", y=\"DV\", data=model.data, fit_reg=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Robust and WLS estimation\n:code:`Lm` models can also perform inference using robust-standard errors or perform weight-least-squares (experimental feature) for models with categorical predictors (equivalent to Welch's t-test).\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Refit previous model using robust standard errors\nprint(model.fit(robust=\"hc1\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Since WLS is only supported with 2 groups for now, filter the data first\ndf_two_groups = df.query(\"IV3 in [0.5, 1.0]\").reset_index(drop=True)\n\n# Fit new a model using a categorical predictor with unequal variances (WLS)\nmodel = Lm(\"DV ~ IV3\", data=df_two_groups)\nprint(model.fit(weights=\"IV3\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Multi-level models\nFitting a multi-level model works similarly and actually just calls :code:`lmer` or :code:`glmer` in R behind the scenes. The corresponding output is also formatted to be very similar to output of :code:`summary()` in R.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Import the lmm model class\nfrom pymer4.models import Lmer\n\n# Initialize model instance using 1 predictor with random intercepts and slopes\nmodel = Lmer(\"DV ~ IV2 + (IV2|Group)\", data=df)\n\n# Fit it\nprint(model.fit())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Similar to :code:`Lm` models, :code:`Lmer` models save details in model attributes and have additional methods that can be called using the same syntax as described above.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Get population level coefficients\nprint(model.coefs)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Get group level coefficients (just the first 5)\n# Each row here is a unique intercept and slope\n# which vary because we parameterized our rfx that way above\nprint(model.fixef.head(5))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Get group level deviates from population level coefficients (i.e. rfx)\nprint(model.ranef.head(5))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
":code:`Lmer` models also have some basic plotting abilities that :code:`Lm` models do not\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Visualize coefficients with group/cluster fits overlaid (\"forest plot\")\nmodel.plot_summary()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Plot coefficients for each group/cluster as separate regressions\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"model.plot(\"IV2\", plot_ci=True, ylabel=\"predicted DV\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Because :code:`Lmer` models rely on R, they have also some extra arguments to the :code:`.fit()` method for controlling things like optimizer behavior, as well as additional methods such for post-hoc tests and ANOVAs. See tutorial 2 for information about this functionality.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Two-stage summary statistics models\nFitting :code:`Lm2` models are also very similar\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Import the lm2 model class\nfrom pymer4.models import Lm2\n\n# This time we use the 'group' argument when initializing the model\nmodel = Lm2(\"DV ~ IV2\", group=\"Group\", data=df)\n\n# Fit it\nprint(model.fit())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Like :code:`Lmer` models, :code:`Lm2` models also store group/cluster level estimates and have some basic plotting functionality\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Get group level coefficients, just the first 5\nprint(model.fixef.head(5))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Visualize coefficients with group/cluster fits overlaid (\"forest plot\")\nmodel.plot_summary()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Model Persistence\nAll pymer4 models can be saved and loaded from disk. Doing so will persist *all* model attributes and data i.e. anything accessible with the '.' syntax. Models are saved and loaded using [Joblib](https://joblib.readthedocs.io/en/latest/persistence.html#persistence) Therefore all filenames must end with :code:`.joblib`. For :code:`Lmer` models, an additional file ending in :code:`.rds` will be saved in the same directory as the HDF5 file. This is the R model object readable in R using :code:`readRDS`.\n\nPrior to version 0.8.1 models were saved to HDF5 files using [deepdish](https://github.com/uchicago-cs/deepdish/) but this library is no longer maintained. If you have old models saved as :code:`.h5` or :code:`.hdf5` files you should use the same version of pymer4 that you used to estimate those models.\n\nTo persist models you can use the dedicated :code:`save_model` and :code:`load_model` functions from the :code:`pymer4.io` module\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Import functions\nfrom pymer4.io import save_model, load_model\n\n# Save the Lm2 model above\nsave_model(model, \"mymodel.joblib\")\n# Load it back up\nmodel = load_model(\"mymodel.joblib\")\n# Check that it looks the same\nprint(model)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Wrap Up\nThis was a quick overview of the 3 major model classes in :code:`pymer4`. However, it's highly recommended to check out the API to see *all* the features and options that each model class has including things like permutation-based inference (:code:`Lm` and :code:`Lm2` models) and fine-grain control of optimizer and tolerance settings (:code:`Lmer` models).\n\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.19" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
Oops, something went wrong.