Skip to content

Split ergm into a meta package and modules? #186

Open
@krivit

Description

@krivit

ergm is a big package. Based on the rough count in NAMESPACE, it currently exports 86 functions and declares 118 S3 methods, as well as implementing at least 205 terms, proposals, constraints, and references. R CMD check complains about its code size.

This means that the package takes a long time to build and test, no matter how small the change. Releasing a fix or enhancement for any part of ergm to CRAN requires testing the whole of ergm and its reverse-dependencies. Continuous integration does not provide immediate feedback, and every release to CRAN is a big todo.

In light of this, we may want to consider splitting ergm up into hierarchically dependent components. Based on some discussions, here is one possible split. In the following list, the later packages Depend, Import, and/or are Linking-To the earlier packages.

  1. ergm.core: Core functions of ergm, including ergm() itself, simulate(), and the functions they need to run. ergm.core may be further split into two packages:
    1. ergm.core.api: The C API and the low-level R functions needed to initialise models and proposals and call the C code, such as the terms API, ergm_model(), ergm.pl(), ergm_MCMC_sample(), as well as the nodal attributes API.
    2. ergm.core.ui: Core front-end functions such as ergm() and simulate(), as well as functions involved in estimation.
  2. ergm.terms.core: The terms, proposals, constraints, and references currently in ergm. (Basically, a big ergm.userterms package.)
  3. ergm.post: Utilities used for postprocessing and diagnostic results, such as mcmc.diagnostics(), gof(), predict.ergm(), and perhaps godfather().
  4. ergm: A metapackage that Depends on the latest version of all of the above and contains few or no functions of its own but houses all of the vignettes. A typical end-user would still type library(ergm).

The datasets can be housed in any of these, though ergm seems like a natural place.

Notably, while circular Depends and Imports are a problem, a package (e.g., ergm.core) can Suggest a package that Depends on it (e.g., ergm.terms.core), which it can load for the purposes of testing. For example, ergm currently Suggests ergm.count, which it uses to test the valued userterms API.

The actual process of splitting up a package is not particularly difficult, particularly with Roxygen managing the namespace and the documentation files, though it can be tedious. It consists of copying the ergm repository (with full history) and deleting the functions that do not belong in the particular subpackage. This is how tergm was split out of ergm and rle out of statnet.common.

One interesting question is whether ergm should reexport functions from the packages it Depends on. From the point of view of the end-user, it doesn't make a difference; but from the point of view of a developer depending on ergm, it does. The advantage of reexporting is that a developer can import from ergm without worrying where the function actually lives. This is not elegant, but it would certainly smooth transition and make their lives easier. A disadvantage is that it commits us to maintaining up to date reexports, though it may be possible to automatically generate the code to do this by scanning the NAMESPACE files of the Imported packages. Also, it is probably not practical to similarly "reexport" the C API.

This issue is not a high priority, but I believe it to be something worth doing in the long run, and so I am opening this ticket to flag the issue and record my current thoughts on the matter.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions