kaefa is an R package that implements an automated exploratory factor analysis (aefa) engine. The core is a greedy search workflow that:
- Evaluates multiple model candidates.
- Selects the best model by information criteria (AIC, BIC, DIC).
- Assesses item fit and removes poorly fitting items.
- Iterates until convergence.
The package exposes a programmatic API and an optional Shiny interface.
R/kaefa.R: Core engine initialization, parallel and remote cluster logic, and primary workflow functions.R/kaefa-package.r: Package-level documentation and namespace imports.R/newEngine.R: Automated EFA workflow implementation and supporting utilities.R/utils.R: Helper utilities.inst/: Shiny application assets and runtime files.vignettes/andREADME.Rmd: User documentation and examples.
- Automated EFA Engine: Runs model search, fit evaluation, and iterative pruning.
- Parallel Execution: Uses
futureand cluster helpers to distribute work across cores or remote nodes. - Remote Cluster Support: SSH-based host probing with load and memory checks to select nodes.
- Shiny UI: Provides a point-and-click interface for data upload, configuration, and export.
- Theta Prior Calibration: Optional
fitdistrplusintegration for empirical prior estimation.
- Primary API:
aefa()(automated exploratory factor analysis workflow). - Cluster Setup:
aefaInit()for local or remote cluster configuration. - Shiny UI:
launchAEFA()to start the interactive application. - Theta Prior Utilities:
fitThetaPrior()estimates distribution parameters,testThetaPriorCalibration()evaluates calibration, andapplyThetaPrior()attaches the estimated parameters as metadata (it does not inject priors into mirt's calibration). Automatic application to mirt would require future configuration or upgraded support.
- Data inputs: item response data in R objects (e.g., data frames, matrices) and optional CSV/RDS via Shiny.
- Model configuration: factor extraction counts, rotation methods, and criteria selection.
- Parallel configuration: local core counts or remote host list and SSH key paths.
- Package options:
kaefaServersoption for preconfigured remote hosts.
fitThetaPrior()requires raw score inputs with at least 3 non-missing numeric observations; it will error if fewer are provided. Data can be supplied as R objects (data frames, matrices) or via CSV/RDS inputs in Shiny.
kaefaServers: Character vector of hostnames used as the defaultRemoteClustersargument foraefaInit(). Example:options(kaefaServers = c("localhost", "node1", "node2")). SSH key paths are provided separately viaaefaInit(sshKeyPath = ...)as a vector aligned withkaefaServers(or a named list keyed by host). For SSH keys, prefer absolute paths, validate paths before use, restrict file permissions (e.g.,chmod 600), and keep keys encrypted or in a secrets manager with regular rotation and least-privilege access. See Security and Privacy. Example usage withaefaInit():# vector aligned with RemoteClusters aefaInit( RemoteClusters = c("node1", "node2"), sshKeyPath = c("~/.ssh/id_rsa_node1", "~/.ssh/id_rsa_node2") ) # named list keyed by host aefaInit( RemoteClusters = c("node1", "node2"), sshKeyPath = list( node1 = "~/.ssh/id_rsa_node1", node2 = "~/.ssh/id_rsa_node2" ) )
- Selected best-fit model object.
- Fit metrics and item statistics for model comparison.
- Shiny UI export artifacts (tables, reports) as configured by the user.
- Core:
mirt(>= 1.27),psych,future,progress,listenv,parallel,NCmisc,plyr. - UI:
shiny(>= 1.7.0),DT(>= 0.20). - Optional:
fitdistrplus(theta prior calibration),goftest(required fortestThetaPriorCalibration()when usingcvmorad;ksworks withoutgoftest).
- Model search complexity scales with the number of candidate factor structures and items.
- Parallel execution is recommended for moderate to large datasets.
- Remote cluster selection uses load and memory thresholds to reduce resource contention.
- R CMD check on Windows, macOS, and Linux in CI.
- Unit tests in
tests/for core logic and regressions. - Example workflows in README and vignettes for smoke validation.
- Remote cluster execution uses SSH; restrict key permissions, validate paths, and rotate keys regularly (see README for a checklist).
- No telemetry or external data upload beyond user-controlled Shiny sessions.
- CI workflows run standard R CMD checks and dependency review.
- Releases should update
NEWS.mdandDESCRIPTIONversion fields.
- Define recommended dataset size thresholds for local vs remote execution. Issue: #34. Target: 2026 Q2. Workaround: start locally, then move to remote clusters if runtime or memory use becomes a bottleneck.
- Document minimal Shiny UI configuration required for advanced models. Issue: #35. Target: 2026 Q3. Workaround: use the R API for advanced settings until the UI guidance is documented.