statsmodels is an excellent project and important part of the python scientific stack. But due to resource constraints, they cannot push out bugfixes often enough for my needs. sm2 is a fork focused on bugfixes and addressing technical debt.
Ideally sm2 will be a drop-in replacement for statsmodels. In places where this fails, feel free to open an issue.
With luck, fixes made here will eventually be ported upstream.
Build Status | |
Coverage |
-
sm2 contains a subset of the functionality of statsmodels. The first big difference is that statsmodels is more feature-complete.
-
Test coverage statistics reported for sm2 are meaningful (:issue:
4331
) -
An enormous amount of code-cleanup has been done in sm2. Thousands of lines of unused, untested, or deprecated code have been removed. Many thousands of flake8 formatting issues have been cleaned up.
-
MultinomialResults.params
andpredict
will have correct column and row labels (:issue:4541
) -
VARResults.cov_params
will correctly return aDataFrame
instead of raisingValueError
. -
VARResults.acf
will return correct results (:issue:4572
) -
The
ArmaProcess
class does not have anobs
attribute. -
tsa.stattools.acf
will always return(acf, confint, qstat, pvalue)
here instead of a different subset of these depending on the inputs. -
stats.diagnostic.acorr_ljungbox will always return
(qljungbox, pval, qboxpierce, pvalbp)
here instead of a different subset of these depending on the inputs. -
summary2
methods have not been ported from upstream, will raiseNotImplementedError
. -
VARResults.test_whiteness
has been superceeded upstream bytest_whiteness_new
as the older method was not an actual statistical test (:issue:4036
).sm2
replaces the older version entirely and keeps only the nametest_whiteness
. -
ARModel.fit
incorrectly setsmodel.df_resid
upstream. That has been fixed here. -
GenericLikelihoodModelResults.__init__
incorrectly setsmodel.df_resid
andmodel.df_model
. That has been fixed here. -
GeneralizedLinearModel.fit
incorrect setsself.mu
andself.scale
. This has been fixed here. (:issue:4032
) -
LikelihoodModelResults._get_robustcov_results
incorrectly ignoresuse_self
argument. This has been fixed here. (:issue:4401
)
Issues and Pull Requests are welcome. If you are looking a place to start, here are some suggestions:
-
Search for comments starting with
# TODO:
or# FIXME:
- Some comments are copied from upstream and should have these labels but are missing them. If you find a comment that should have one of these labels (or is just unclear), add the label.
-
Many tests from upstream are marked with
pytest.mark.not_vetted
to reflect the fact that they haven't been reviewed since being ported from statsmodels. To "vet" a test, try to determine:- Is this a "smoke test"? If so, it should be marked with
pytest.mark.smoke
. - Is this a test for a specific bug? Can an Issue reference
(e.g.
# GH#1234
) be included? - Is there something specific being tested? If so, the test name should
be made informative and often a comment should be added
(e.g.
# test function foo.bar in case where baz argument is near-singular
) - Is this testing results produced by statsmodels/sm2 against results produced by another package? If so, it should be clear how those results were produced. The original authors put a lot of effort into producing these comparisons; they should be reproducible.
- Is this a "smoke test"? If so, it should be marked with
-
There are some spots where tests are meager and could use some attention:
tsa.vector_ar.irf
regression._prediction
stats.sandwich_covariance
-
As of 2018-03-19 there are still 390 flake8 warnings/errors. For many of these, fixing them requires figuring out what the writer's attention was upstream.
-
As of 2018-03-19 about 20% of statsmodels has been ported to sm2 (though a much larger percentage of the usable, non-redundant, non-deprecated code). If there are portions of statsmodels that you want or need, don't be shy.
-
If there is a change you parrticularly like, make a Pull Request upstream to get it implemented directly in statsmodels.