Skip to content

Commit 38da1e0

Browse files
Maximilian-Stefan-ErnstalystAlexey Stukalovaaronpeikert
authored
Release/v0.4.0 (#255)
* CommutationMatrix type replace comm_matrix helper functions with a CommutationMatrix and overloaded linalg ops * simplify elimination_matrix() * simplify duplication_matrix() * add tests for commutation/dublication/elimination matrices * small unit test fixes * commutation_matrix * vec method * more comm_matrix tests * SemSpecification base type * SemSpecification: use in methods * rename identifier -> param * identifier() -> param_indices() (Dict{Symbol, Int}) * get_identifier_indices() -> param_to_indices() (Vector{Int}) * parameters -> params (Vector{Symbol}) * ParTable: columns[:identifier] => columns[:param] * getindex(EnsParTable, i) instead of get_group() * replace no-op ctors with convert(T, obj) convert() is a proper method to call to avoid unnecessary construction, ctor semantics requires that a new object is constructed * ParamTable: convert vars from Dict to fields make the type immutable * ParamTable: update StenGraph-based ctor * use graph as a main parameter * simplify rows processing * don't reallocate table.columns Co-authored-by: Maximilian-Stefan-Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * rename Base.sort() to sort_vars() because the ParTable contains rows and columns, it is not clear, what sort() actually sorts. Co-authored-by: Maximilian-Stefan-Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * don't import == * don't import push!() * don't import DataFrame * remove no-op push!() * ParTable ctor: simplify rows code * use named tuples * reduce code duplication * use colnames vector instead of position_names Dict * ParTable: full support for Iterator iface * RAMConstant: simplify * declare RAMConstant field types * refactor constants collection to avoid code duplication * RAMMatrices: optimize F_indices init * RAMMatrices: declare types for all fields * RAMMatrices: option to keep zero constants * nonunique() helper function * add check_vars() and check_params() * RAMMatrices ctor: dims and vars checks * RAMMatrices: cleanup params index * simplify parameters() function to return just a vector of params * RAMMatrices ctor: use check_params() * include RAMMatrices before EnsParTable * fix EnsParTable to Dict{RAMMatrices} convert * this method is not RAMMatrices ctor, it is Dict{K, RAMMatrices} convert * use comprehension to construct dict * DataFrame(EnsParTable) * params() API method * remove n_par.jl * remove identifier.jl * EnsParTable ctor: enforce same params in tables * fix EnsParTable container to Dict{Symbol, ParTable} * don't use keywords for main params as it complicates dispatch Co-authored-by: Maximilian-Stefan-Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * formatting fixes * ParTable ctor: allow providing columns data * update_partable!() cleanup + docstring * update_partable!(): SemFit methods use basic one * ParTable: add explicit params field * n_par() -> nparams() for clarity and aligning to Julia naming conventions * param_values(ParTable) Co-authored-by: Maximilian-Stefan-Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * lavaan_param_values(lav_fit, partable) * compare_estimates() -> test_estimates() * do tests inside * use param_values()/lavaan_param_values() * update_partable!(): dict-based generic version Co-authored-by: Maximilian-Stefan-Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * ParTable: getindex() returns NamedTuple so the downstream code doesn't rely on the order of tuple elements * ParTable: graph-based ctor supports params= kw * rename parameter_type to relation for clarity * sem_summary(): cleanup filters * fix sem_summary method for partable * show(ParTable): suppress NaNs * sort_vars!(ParTable): cleanup * Project.toml: disable SymbolicUtils 1.6 causes problems with sparsehessian(). It is a temporary fix until the compatibility issues are resolved in Symbolics.jl * Project.toml: support StenoGraphs 0.3 * RAM ctor: better error for missing meanstruct * add function param_indices * start fixing docs * fix regularization docs * introduce formatting error * update_start(): fix docstring typo Co-authored-by: Maximilian-Stefan-Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * push!(::ParTable, Tuple): check keys compat Co-authored-by: Maximilian-Stefan-Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * SemObsCov ctor: restrict n_obs to integer don't allow missing n_obs * fixup param_indices() * common.jl: common vars API methods * SemSpecification: vars API * RAMMatrices: vars API * ParamTable: vars API * SemImply: vars and params API * RAM imply: use vars API * RAMSymbolic: use vars API * start_simple(): use vars API * starts_fabin3: use vars API * remove get_colnames() replaced by observed_vars() * remove get_n_nodes() replaced by nvars() * get_data() -> samples() and add default implementation samples(::SemObserved) * SemObsData: remove rowwise * it is unused * if ever rowwise access would be required, it could be done with eachrow(data) without allocation * AbstractSemSingle: vars API * rename n_obs() -> nsamples() * rename n_man() -> nobserved_vars() for missing data pattern: nobserved_vars() -> nmeasured_vars(), obs_cov/obs_mean -> measured_cov/measured_mean * move Sem methods out of types.jl * rows(::SemObservedMissing) -> pattern_rows() * fix formatting * samples(SemObsCov) throws an exception * SemObserved tests: refactor and add var API tests * ParTable(graph): group is only valid for ensemble * ParTable(graph): fix NaN modif detection * export vars, params and observed APIs * refactor SemSpec tests * add Sem unit tests * dont allow fixed and labeled parameters * add test for labeled and fixed parameters * remove get_observed() does not seem to be used anywhere; also the method signature does not match Julia conventions * fix ridge eval * MeanStructure, HessianEvaluation traits * replace has_meanstrcture and approximate_hessian fields with trait-like typeparams * remove methods for has_meanstructure-based dispatch * obj/grad/hess: refactor evaluation API the intent of this commit is to refactor the API for objective, gradient and hessian evaluation, such that the evaluation code does not have to be duplicates across functions that calculate different combinations of those functions * introduce EvaluationTargets class that handles selection of what to evaluate * add evaluate!(EvalTargets, ...) methods for loss and imply objs that evaluate only what is required * objective!(), obj_grad!() etc calls are just a wrapper of evaluate!() with proper targets * se_hessian(): rename hessian -> method for clarity * se_hessian!(): optimize calc * explicitly use Cholesky factorization * H_scaling(): cleanup remove unnecesary arguments * SemOptOptim: remove redundant sem_fit() by dispatching over optimizer * SemOptNLopt: remove redundant sem_fit() by dispatching over optimizer * SemOptOptim: use evaluate!() directly no wrapper required * SemOptNLopt: use evaluate!() directly * SemWLS: dim checks * fixup formatting * WLS: use 5-arg mul!() to reduce allocations * ML: use 5-arg mul!() to reduce allocations * FIML: use 5-arg mul! to avoid extra allocation * fix the error message Co-authored-by: Maximilian Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * HessianEvaluation -> HessianEval * MeanStructure -> MeanStruct * SemImply: replace common type params with fields * close #216 * close #205 * update EnsembleParameterTable docs and add methods for par table equality * close #213 * close #157 * add method for * format * increase test sample size * Project.toml: update Symbolics deps * tests/examples: import -> using no declarations, so import is not required * add ParamsArray replaces RAMMatrices indices and constants vectors with dedicated class that incapsulate this logic, resulting in overall cleaner interface A_ind, S_ind, M_ind become ParamsArray F_ind becomes SparseMatrixCSC parameters.jl is not longer required and is removed * materialize!(Symm/LowTri/UpTri) * ParamsArray: faster sparse materialize! * ParamsArray: use Iterators.flatten() (faster) * Base.hash(::ParamsArray) * colnames -> vars * update_partable!(): better params unique check * start_fabin3: check obs_mean data & meanstructure * params/vars API tweaks and tests * generic imply: keep F sparse * tests helper: is_extended_tests() to consolidate ENV variable check * Optim sem_fit(): use provided optimizer * prepare_start_params(): arg-dependent dispatch * convert to argument type-dependent dispatch * replace start_val() function with prepare_start_params() * refactor start_parameter_table() into prepare_start_params(start_val::ParameterTable, ...) and use the SEM model param indices * unify processing of starting values by all optimizers * support dictionaries of values * prepare_param_bounds() API for optim * u/l_bounds support for Optim.jl * SemOptimizer(engine = ...) ctor * SEMNLOptExt for NLopt * NLopt: sem_fit(): use provided optimizer * SEMProximalOptExt for Proximal opt * merge diff/*.jl optimizer code into optimizer/*.jl * Optim: document u/l bounds * remove unused options field from Proximal optimizer * decouple optimizer from Sem model Co-authored-by: Maximilian Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * fix inequality constraints test NLopt minimum was 18.11, below what the test expected * add ProximalSEM tests * optim/documentation.jl: rename to abstract.jl * ext: change folder layout * Project.toml: fix ProximalOperators ID * docs: fix nsamples, nobserved_vars * cleanup data columns reordering define a single source_to_dest_perm() function * SemObservedCov: def as an alias of SemObservedData reduces code duplication; also annotate types of ctor args now samples(SemObsCov) returns nothing * SemObserved: store observed_vars add observed_vars(data::SemObserved) * nsamples(observed::SemObserved): unify * FIML: simplify index generation * SemObservedMissing: refactor * use SemObsMissingPattern struct to simplify code * replace O(Nvars^2) common pattern detection with Dict{} * don't store row-wise, store sub-matrices of non-missing data instead * use StatsBase.mean_and_cov() * remove cov_and_mean(): not used anymore StatsBase.mean_and_cov() is used instead * SemObserved: unify data preparation - SemObservedData: parameterize by cov/mean eltype instead of the whole container types Co-authored-by: Maximilian Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com> * tests: update SemObserved tests to match the update data preparation behaviour * prep_data: warn if obs_vars order don't match spec * SemObsData: observed_var_prefix kwarg to specify the prefix of the generated observed_vars if none provided could be inferred, defaults to :obs * ParTable: add graph-based kw-only constructor * Project.toml: fix ProximalAlgorithms to 0.5 v0.7 changed the diff interface (v0.6 was skipped) * switch to ProximalAlgorithms.jl v0.7 also drop ProximalOperators and ProximalCore weak deps * move params() to common.jl it is available for many SEM types, not just SemSpec * RAM ctor: use random parameters instead of NaNs to initialize RAM matrices simplify check_acyclic() * move check_acyclic() to abstract.jl add verbose parameter * AbstractSem: improve imply/observed API redirect * imply -> implied, SemImply -> SemImplied * imply -> implied: file renames * close #158 * close #232 * Update ext/SEMProximalOptExt/ProximalAlgorithms.jl * suppress uninformative warnings during package testing * turn simplification of symbolic terms by default off * new version of StenoGraph results in fewer deprication notices * fix exporting structs from package extensions * fix NLopt extension * fix Proximal extension * fix printing * fix regularization docs * start reworking docs * finish rewriting docs * rm ProximalSEM from docs deps * fix docs * fix docs * try to fix svgs for docs * try to fix svgs for docs * update README * bump version * give macos some slack and format * Rename params (#253) (#257) * first sweep of renaming * fix destroyed types * parameter table column renamed to label * param and param_labels, params!, seem to work * allow partial execution of unit tests * remove non existing tests * fix model unittests * remove unnessary test layer * finish replacing * all unit tests passed * rename param_values -> params * add StatsAPI as dep * add coef and coefnames * rename df => dof (#254) * rename df => dof * import dof from StatsAPI * rename dof file * rename sem_fit => fit * typo * add nobs and fix testsw * add coeftable * fix proximal tests * fix exports and StatsAPI docstrings * fix tests * fix tests * thx evie for the typo :) * fix coeftable --------- Co-authored-by: Aaron Peikert <aaron.peikert@posteo.de> * add param_labels docstring * fix docs * fix docs * fix docs --------- Co-authored-by: Alexey Stukalov <astukalov@seer.bio> Co-authored-by: Alexey Stukalov <astukalov@gmail.com> Co-authored-by: Alexey Stukalov <astukalol@seer.bio> Co-authored-by: Alexey Stukalov <alyst@users.noreply.github.com> Co-authored-by: Aaron Peikert <aaron.peikert@posteo.de>
1 parent c8d788c commit 38da1e0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+499
-395
lines changed

Project.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
1818
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
1919
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
2020
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
21+
StatsAPI = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
2122
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
2223
StenoGraphs = "78862bba-adae-4a83-bb4d-33c106177f81"
2324
Symbolics = "0c5d862f-8b57-4792-8d23-62f2024744c7"

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Models you can fit include
1818
- Multigroup SEM
1919
- Sums of arbitrary loss functions (everything the optimizer can handle).
2020

21-
# What are the merrits?
21+
# What are the merits?
2222

2323
We provide fast objective functions, gradients, and for some cases hessians as well as approximations thereof.
2424
As a user, you can easily define custom loss functions.

docs/src/developer/loss.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ model = SemFiniteDiff(
7979
loss = (SemML, myridge)
8080
)
8181
82-
model_fit = sem_fit(model)
82+
model_fit = fit(model)
8383
```
8484

8585
This is one way of specifying the model - we now have **one model** with **multiple loss functions**. Because we did not provide a gradient for `Ridge`, we have to specify a `SemFiniteDiff` model that computes numerical gradients with finite difference approximation.
@@ -117,17 +117,17 @@ model_new = Sem(
117117
loss = (SemML, myridge)
118118
)
119119
120-
model_fit = sem_fit(model_new)
120+
model_fit = fit(model_new)
121121
```
122122

123123
The results are the same, but we can verify that the computational costs are way lower (for this, the julia package `BenchmarkTools` has to be installed):
124124

125125
```julia
126126
using BenchmarkTools
127127

128-
@benchmark sem_fit(model)
128+
@benchmark fit(model)
129129

130-
@benchmark sem_fit(model_new)
130+
@benchmark fit(model_new)
131131
```
132132

133133
The exact results of those benchmarks are of course highly depended an your system (processor, RAM, etc.), but you should see that the median computation time with analytical gradients drops to about 5% of the computation without analytical gradients.
@@ -241,7 +241,7 @@ model_ml = SemFiniteDiff(
241241
loss = MaximumLikelihood()
242242
)
243243
244-
model_fit = sem_fit(model_ml)
244+
model_fit = fit(model_ml)
245245
```
246246

247247
If you want to differentiate your own loss functions via automatic differentiation, check out the [AutoDiffSEM](https://github.com/StructuralEquationModels/AutoDiffSEM) package.

docs/src/developer/optimizer.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ algorithm(optimizer::SemOptimizerName) = optimizer.algorithm
3434
options(optimizer::SemOptimizerName) = optimizer.options
3535
```
3636

37-
Note that your optimizer is a subtype of `SemOptimizer{:Name}`, where you can choose a `:Name` that can later be used as a keyword argument to `sem_fit(engine = :Name)`.
37+
Note that your optimizer is a subtype of `SemOptimizer{:Name}`, where you can choose a `:Name` that can later be used as a keyword argument to `fit(engine = :Name)`.
3838
Similarly, `SemOptimizer{:Name}(args...; kwargs...) = SemOptimizerName(args...; kwargs...)` should be defined as well as a constructor that uses only keyword arguments:
3939

4040
´´´julia
@@ -46,10 +46,10 @@ SemOptimizerName(;
4646
´´´
4747
A method for `update_observed` and additional methods might be usefull, but are not necessary.
4848

49-
Now comes the substantive part: We need to provide a method for `sem_fit`:
49+
Now comes the substantive part: We need to provide a method for `fit`:
5050

5151
```julia
52-
function sem_fit(
52+
function fit(
5353
optim::SemOptimizerName,
5454
model::AbstractSem,
5555
start_params::AbstractVector;

docs/src/internals/files.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Source code is in the `"src"` folder:
1111
- `"types.jl"` defines all abstract types and the basic type hierarchy
1212
- `"objective_gradient_hessian.jl"` contains methods for computing objective, gradient and hessian values for different model types as well as generic fallback methods
1313
- The four folders `"observed"`, `"implied"`, `"loss"` and `"diff"` contain implementations of specific subtypes (for example, the `"loss"` folder contains a file `"ML.jl"` that implements the `SemML` loss function).
14-
- `"optimizer"` contains connections to different optimization backends (aka methods for `sem_fit`)
14+
- `"optimizer"` contains connections to different optimization backends (aka methods for `fit`)
1515
- `"optim.jl"`: connection to the `Optim.jl` package
1616
- `"frontend"` contains user-facing functions
1717
- `"specification"` contains functionality for model specification

docs/src/performance/mixed_differentiation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,15 +19,15 @@ model_ridge = SemFiniteDiff(
1919

2020
model_ml_ridge = SemEnsemble(model_ml, model_ridge)
2121

22-
model_ml_ridge_fit = sem_fit(model_ml_ridge)
22+
model_ml_ridge_fit = fit(model_ml_ridge)
2323
```
2424

2525
The results of both methods will be the same, but we can verify that the computation costs differ (the package `BenchmarkTools` has to be installed for this):
2626

2727
```julia
2828
using BenchmarkTools
2929

30-
@benchmark sem_fit(model)
30+
@benchmark fit(model)
3131

32-
@benchmark sem_fit(model_ml_ridge)
32+
@benchmark fit(model_ml_ridge)
3333
```

docs/src/performance/mkl.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,9 @@ To check the performance implications for fitting a SEM, you can use the [`Bench
2727
```julia
2828
using BenchmarkTools
2929

30-
@benchmark sem_fit($your_model)
30+
@benchmark fit($your_model)
3131

3232
using MKL
3333

34-
@benchmark sem_fit($your_model)
34+
@benchmark fit($your_model)
3535
```

docs/src/performance/simulation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ models = [model1, model2]
100100
fits = Vector{SemFit}(undef, 2)
101101

102102
Threads.@threads for i in 1:2
103-
fits[i] = sem_fit(models[i])
103+
fits[i] = fit(models[i])
104104
end
105105
```
106106

docs/src/performance/starting_values.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Starting values
22

3-
The `sem_fit` function has a keyword argument that takes either a vector of starting values or a function that takes a model as input to compute starting values. Current options are `start_fabin3` for fabin 3 starting values [^Hägglund82] or `start_simple` for simple starting values. Additional keyword arguments to `sem_fit` are passed to the starting value function. For example,
3+
The `fit` function has a keyword argument that takes either a vector of starting values or a function that takes a model as input to compute starting values. Current options are `start_fabin3` for fabin 3 starting values [^Hägglund82] or `start_simple` for simple starting values. Additional keyword arguments to `fit` are passed to the starting value function. For example,
44

55
```julia
6-
sem_fit(
6+
fit(
77
model;
88
start_val = start_simple,
99
start_covariances_latent = 0.5

docs/src/tutorials/collection/multigroup.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,8 @@ model_ml_multigroup = SemEnsemble(
8181
We now fit the model and inspect the parameter estimates:
8282

8383
```@example mg; ansicolor = true
84-
fit = sem_fit(model_ml_multigroup)
85-
update_estimate!(partable, fit)
84+
sem_fit = fit(model_ml_multigroup)
85+
update_estimate!(partable, sem_fit)
8686
details(partable)
8787
```
8888

0 commit comments

Comments
 (0)