Skip to content

Commit 1e86a5b

Browse files
authored
Replace infix ~ for formulas with a model macro (#9)
1 parent b4f435b commit 1e86a5b

File tree

7 files changed

+90
-77
lines changed

7 files changed

+90
-77
lines changed

docs/src/contrasts.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The default contrast coding system is `DummyCoding`. To override this, use
2222
the `contrasts` argument when constructing a `ModelFrame`:
2323

2424
```julia
25-
mf = ModelFrame(y ~ 1 + x, df, contrasts = Dict(:x => EffectsCoding()))
25+
mf = ModelFrame(@formula(y ~ 1 + x), df, contrasts = Dict(:x => EffectsCoding()))
2626
```
2727

2828
To change the contrast coding for one or more variables in place, use

docs/src/formula.md

+16-6
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,16 @@ goal is to support any tabular data format that adheres to a minimal API,
2121
## The `Formula` type
2222

2323
The basic conceptual tool for this is the `Formula`, which has a left side and a
24-
right side, separated by `~`:
24+
right side, separated by `~`. Formulas are constructed using the `@formula` macro:
2525

2626
```jldoctest
27-
julia> y ~ 1 + a
27+
julia> @formula(y ~ 1 + a)
2828
Formula: y ~ 1 + a
2929
```
3030

31+
Note that the `@formula` macro **must** be called with parentheses to ensure that
32+
the formula is parsed properly.
33+
3134
The left side of a formula conventionally represents *dependent* variables, and
3235
the right side *independent* variables (or regressors). *Terms* are separated
3336
by `+`. Basic terms are the integers `1` or `0`—evaluated as the presence or
@@ -43,7 +46,7 @@ It's often convenient to include main effects and interactions for a number of
4346
variables. The `*` operator does this, expanding in the following way:
4447

4548
```jldoctest
46-
julia> Formula(StatsModels.Terms(y ~ 1 + a*b))
49+
julia> Formula(StatsModels.Terms(@formula(y ~ 1 + a*b)))
4750
Formula: y ~ 1 + a + b + a & b
4851
```
4952

@@ -54,21 +57,28 @@ This applies to higher-order interactions, too: `a*b*c` expands to the main
5457
effects, all two-way interactions, and the three way interaction `a&b&c`:
5558

5659
```jldoctest
57-
julia> Formula(StatsModels.Terms(y ~ 1 + a*b*c))
60+
julia> Formula(StatsModels.Terms(@formula(y ~ 1 + a*b*c)))
5861
Formula: y ~ 1 + a + b + c + a & b + a & c + b & c + &(a,b,c)
5962
```
6063

6164
Both the `*` and the `&` operators act like multiplication, and are distributive
6265
over addition:
6366

6467
```jldoctest
65-
julia> Formula(StatsModels.Terms(y ~ 1 + (a+b) & c))
68+
julia> Formula(StatsModels.Terms(@formula(y ~ 1 + (a+b) & c)))
6669
Formula: y ~ 1 + c & a + c & b
6770
68-
julia> Formula(StatsModels.Terms(y ~ 1 + (a+b) * c))
71+
julia> Formula(StatsModels.Terms(@formula(y ~ 1 + (a+b) * c)))
6972
Formula: y ~ 1 + a + b + c + c & a + c & b
7073
```
7174

75+
You may be wondering why formulas in Julia require a macro, while in R they appear "bare."
76+
R supports nonstandard evaluation, allowing the formula to remain an unevaluated object
77+
while its terms are parsed out. Julia uses a much more standard evaluation mechanism,
78+
making this impossible using normal expressions. However, unlike R, Julia provides macros to
79+
explicitly indicate when code itself will be manipulated before it's evaluated. By constructing
80+
a formula using a macro, we're able to provide convenient, R-like syntax and semantics.
81+
7282
## The `ModelFrame` and `ModelMatrix` types
7383

7484
The main use of `Formula`s is for fitting statistical models based on tabular

src/StatsModels.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ using NullableArrays
99
using CategoricalArrays
1010

1111

12-
export @~,
12+
export @formula,
1313
Formula,
1414
ModelFrame,
1515
ModelMatrix,

src/formula.jl

+14-13
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Formulas for representing and working with linear-model-type expressions
2-
# Original by Harlan D. Harris. Later modifications by John Myles White
3-
# and Douglas M. Bates.
2+
# Original by Harlan D. Harris. Later modifications by John Myles White,
3+
# Douglas M. Bates, and other contributors.
44

55
## Formulas are written as expressions and parsed by the Julia parser.
66
## For example :(y ~ a + b + log(c))
@@ -12,16 +12,19 @@
1212
## The rhs of a formula can be 1
1313

1414
type Formula
15-
lhs::@compat(Union{Symbol, Expr, Void})
16-
rhs::@compat(Union{Symbol, Expr, Integer})
15+
lhs::Union{Symbol, Expr, Void}
16+
rhs::Union{Symbol, Expr, Integer}
1717
end
1818

19-
macro ~(lhs, rhs)
20-
ex = Expr(:call,
21-
:Formula,
22-
Base.Meta.quot(lhs),
23-
Base.Meta.quot(rhs))
24-
return ex
19+
macro formula(ex)
20+
if (ex.head === :macrocall && ex.args[1] === Symbol("@~")) || (ex.head === :call && ex.args[1] === :(~))
21+
length(ex.args) == 3 || error("malformed expression in formula")
22+
lhs = Base.Meta.quot(ex.args[2])
23+
rhs = Base.Meta.quot(ex.args[3])
24+
else
25+
error("expected formula separator ~, got $(ex.head)")
26+
end
27+
return Expr(:call, :Formula, lhs, rhs)
2528
end
2629

2730
"""
@@ -46,9 +49,7 @@ end
4649
Base.:(==)(t1::Terms, t2::Terms) = all(getfield(t1, f)==getfield(t2, f) for f in fieldnames(t1))
4750

4851
function Base.show(io::IO, f::Formula)
49-
print(io,
50-
string("Formula: ",
51-
f.lhs == nothing ? "" : f.lhs, " ~ ", f.rhs))
52+
print(io, "Formula: ", f.lhs === nothing ? "" : f.lhs, " ~ ", f.rhs)
5253
end
5354

5455
# special operators in formulas

test/formula.jl

+21-19
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@ using Compat
1212
# - support more transformations with I()?
1313

1414
## Formula parsing
15-
import StatsModels: @~, Formula
16-
import StatsModels.Terms
15+
import StatsModels: @formula, Formula, Terms
1716

1817
## totally empty
1918
t = Terms(Formula(nothing, 0))
@@ -23,87 +22,90 @@ t = Terms(Formula(nothing, 0))
2322
@test t.eterms == []
2423

2524
## empty RHS
26-
t = Terms(y ~ 0)
25+
t = Terms(@formula(y ~ 0))
2726
@test t.intercept == false
2827
@test t.terms == []
2928
@test t.eterms == [:y]
30-
t = Terms(y ~ -1)
29+
t = Terms(@formula(y ~ -1))
3130
@test t.intercept == false
3231
@test t.terms == []
3332

3433
## intercept-only
35-
t = Terms(y ~ 1)
34+
t = Terms(@formula(y ~ 1))
3635
@test t.response == true
3736
@test t.intercept == true
3837
@test t.terms == []
3938
@test t.eterms == [:y]
4039

4140
## terms add
42-
t = Terms(y ~ 1 + x1 + x2)
41+
t = Terms(@formula(y ~ 1 + x1 + x2))
4342
@test t.intercept == true
4443
@test t.terms == [:x1, :x2]
4544
@test t.eterms == [:y, :x1, :x2]
4645

4746
## implicit intercept behavior:
48-
t = Terms(y ~ x1 + x2)
47+
t = Terms(@formula(y ~ x1 + x2))
4948
@test t.intercept == true
5049
@test t.terms == [:x1, :x2]
5150
@test t.eterms == [:y, :x1, :x2]
5251

5352
## no intercept
54-
t = Terms(y ~ 0 + x1 + x2)
53+
t = Terms(@formula(y ~ 0 + x1 + x2))
5554
@test t.intercept == false
5655
@test t.terms == [:x1, :x2]
5756

58-
@test t == Terms(y ~ -1 + x1 + x2) == Terms(y ~ x1 - 1 + x2) == Terms(y ~ x1 + x2 -1)
57+
@test t == Terms(@formula(y ~ -1 + x1 + x2)) == Terms(@formula(y ~ x1 - 1 + x2)) == Terms(@formula(y ~ x1 + x2 -1))
5958

6059
## can't subtract terms other than 1
61-
@test_throws ErrorException Terms(y ~ x1 - x2)
60+
@test_throws ErrorException Terms(@formula(y ~ x1 - x2))
6261

63-
t = Terms(y ~ x1 & x2)
62+
t = Terms(@formula(y ~ x1 & x2))
6463
@test t.terms == [:(x1 & x2)]
6564
@test t.eterms == [:y, :x1, :x2]
6665

6766
## `*` expansion
68-
t = Terms(y ~ x1 * x2)
67+
t = Terms(@formula(y ~ x1 * x2))
6968
@test t.terms == [:x1, :x2, :(x1 & x2)]
7069
@test t.eterms == [:y, :x1, :x2]
7170

7271
## associative rule:
7372
## +
74-
t = Terms(y ~ x1 + x2 + x3)
73+
t = Terms(@formula(y ~ x1 + x2 + x3))
7574
@test t.terms == [:x1, :x2, :x3]
7675

7776
## &
78-
t = Terms(y ~ x1 & x2 & x3)
77+
t = Terms(@formula(y ~ x1 & x2 & x3))
7978
@test t.terms == [:((&)(x1, x2, x3))]
8079
@test t.eterms == [:y, :x1, :x2, :x3]
8180

8281
## distributive property of + and &
83-
t = Terms(y ~ x1 & (x2 + x3))
82+
t = Terms(@formula(y ~ x1 & (x2 + x3)))
8483
@test t.terms == [:(x1&x2), :(x1&x3)]
8584

8685
## FAILS: ordering of expanded interaction terms is wrong
8786
## (only has an observable effect when both terms are categorical and
8887
## produce multiple model matrix columns that are multiplied together...)
8988
##
90-
## t = Terms(y ~ (x2 + x3) & x1)
89+
## t = Terms(@formula(y ~ (x2 + x3)) & x1)
9190
## @test t.terms == [:(x2&x1), :(x3&x1)]
9291

9392
## three-way *
94-
t = Terms(y ~ x1 * x2 * x3)
93+
t = Terms(@formula(y ~ x1 * x2 * x3))
9594
@test t.terms == [:x1, :x2, :x3,
9695
:(x1&x2), :(x1&x3), :(x2&x3),
9796
:((&)(x1, x2, x3))]
9897
@test t.eterms == [:y, :x1, :x2, :x3]
9998

10099
## Interactions with `1` reduce to main effect. All fail at the moment.
101-
## t = Terms(y ~ 1 & x1)
100+
## t = Terms(@formula(y ~ 1 & x1))
102101
## @test t.terms == [:x1] # == [:(1 & x1)]
103102
## @test t.eterms == [:y, :x1]
104103

105-
## t = Terms(y ~ (1 + x1) & x2)
104+
## t = Terms(@formula(y ~ (1 + x1)) & x2)
106105
## @test t.terms == [:x2, :(x1&x2)] # == [:(1 & x1)]
107106
## @test t.eterms == [:y, :x1, :x2]
108107

108+
# Incorrect formula separator
109+
@test_throws ErrorException eval(:(@formula(y => x + 1)))
110+
109111
end

0 commit comments

Comments
 (0)