Description
ergm
is a big package. Based on the rough count in NAMESPACE
, it currently exports 86 functions and declares 118 S3 methods, as well as implementing at least 205 terms, proposals, constraints, and references. R CMD check
complains about its code size.
This means that the package takes a long time to build and test, no matter how small the change. Releasing a fix or enhancement for any part of ergm
to CRAN requires testing the whole of ergm
and its reverse-dependencies. Continuous integration does not provide immediate feedback, and every release to CRAN is a big todo.
In light of this, we may want to consider splitting ergm
up into hierarchically dependent components. Based on some discussions, here is one possible split. In the following list, the later packages Depend, Import, and/or are Linking-To the earlier packages.
ergm.core
: Core functions ofergm
, includingergm()
itself,simulate()
, and the functions they need to run.ergm.core
may be further split into two packages:ergm.core.api
: The C API and the low-level R functions needed to initialise models and proposals and call the C code, such as the terms API,ergm_model()
,ergm.pl()
,ergm_MCMC_sample()
, as well as the nodal attributes API.ergm.core.ui
: Core front-end functions such asergm()
andsimulate()
, as well as functions involved in estimation.
ergm.terms.core
: The terms, proposals, constraints, and references currently inergm
. (Basically, a bigergm.userterms
package.)ergm.post
: Utilities used for postprocessing and diagnostic results, such asmcmc.diagnostics()
,gof()
,predict.ergm()
, and perhapsgodfather()
.ergm
: A metapackage that Depends on the latest version of all of the above and contains few or no functions of its own but houses all of the vignettes. A typical end-user would still typelibrary(ergm)
.
The datasets can be housed in any of these, though ergm
seems like a natural place.
Notably, while circular Depends and Imports are a problem, a package (e.g., ergm.core
) can Suggest a package that Depends on it (e.g., ergm.terms.core
), which it can load for the purposes of testing. For example, ergm
currently Suggests ergm.count
, which it uses to test the valued userterms API.
The actual process of splitting up a package is not particularly difficult, particularly with Roxygen managing the namespace and the documentation files, though it can be tedious. It consists of copying the ergm
repository (with full history) and deleting the functions that do not belong in the particular subpackage. This is how tergm
was split out of ergm
and rle
out of statnet.common
.
One interesting question is whether ergm
should reexport functions from the packages it Depends on. From the point of view of the end-user, it doesn't make a difference; but from the point of view of a developer depending on ergm
, it does. The advantage of reexporting is that a developer can import from ergm
without worrying where the function actually lives. This is not elegant, but it would certainly smooth transition and make their lives easier. A disadvantage is that it commits us to maintaining up to date reexports, though it may be possible to automatically generate the code to do this by scanning the NAMESPACE
files of the Imported packages. Also, it is probably not practical to similarly "reexport" the C API.
This issue is not a high priority, but I believe it to be something worth doing in the long run, and so I am opening this ticket to flag the issue and record my current thoughts on the matter.