Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added docstring for predict #227

Merged
merged 5 commits into from
Feb 10, 2017

Conversation

mkborregaard
Copy link
Contributor

Requested by @nalimilan here JuliaStats/GLM.jl#171

@@ -145,11 +145,18 @@ end

const adjr² = adjr2


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change here. Would be great if you could remove it, though it's not a huge deal.

abstract RegressionModel <: StatisticalModel

fitted(obj::RegressionModel) = error("fitted is not defined for $(typeof(obj)).")
model_response(obj::RegressionModel) = error("model_response is not defined for $(typeof(obj)).")
residuals(obj::RegressionModel) = error("residuals is not defined for $(typeof(obj)).")

"""
predict(obj::RegressionModel [, newX])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the other docstrings, it should be , [newX].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, fixed

Copy link
Contributor Author

@mkborregaard mkborregaard Dec 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this http://docs.julialang.org/en/stable/manual/documentation/ seems to disagree, see e.g. the second code box, or point 1 in the first numbered list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've done the docstrings in this package slightly differently. IIRC it's not entirely consistent in Base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - I already pushed the change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks :)

@ararslan
Copy link
Member

In what scenario would a one-argument predict make sense? Presumably you're always predicting a response based on some given values, which requires at least two arguments: the model from which to obtain the estimates and the data from which the response is predicted.

It would make sense to me to change the function signature to

"""
    predict(obj::RegressionModel, newX)

Predict the response of the regression model `obj` from the values in `newX`.
"""
predict(obj::RegressionModel, newX) = error(...)

Having newX be optional seems bizarre to me. What are your thoughts?

@nalimilan
Copy link
Member

Thanks. It can be useful to get the model predictions for the original dataset, e.g. when plotting a smoothed line (LOESS) -- though it's maybe redundant with fitted in that case. Anyway, the docstring should say what's the default for newX.

Would also be nice to explain what's the expected type/format for newX. AFAICT we support either a matrix or a data frame, right?

@mkborregaard
Copy link
Contributor Author

mkborregaard commented Dec 17, 2016

@ararslan Yes, as @nalimilan said it is nice to just get the fitted values when not supplying newx. Also note that it is not redundaint with fitted now that the function will have confidence intervals when JuliaStats/GLM.jl#171 is merged!

@nalimilan It is my impression that the interface is always to use a DataFrame, see the troubles I was having here: JuliaStats/GLM.jl#165 . I think that the DataFrames module has a predict function that generates the newx matrix and calls the functions in GLM.

@ararslan
Copy link
Member

If it were me, I wouldn't specify the format of newX here, then any package that extends that method would document how they use the argument. We don't use it here, we just define it here, so it seems a little iffy to say here what the format is or should be.

@nalimilan
Copy link
Member

I don't think packages should extend this method to support other input types for newX: we really don't want some model families to support some input formats, and some others to support different formats. New models should only implement the low-level version taking a matrix, and StatsModels will provide the convenience method for AbstractTable. Currently this method is implemented in DataFrames, but this is only temporary.

So for now I would document both the matrix and data frame formats.

@mkborregaard
Copy link
Contributor Author

mkborregaard commented Dec 19, 2016

Just to state though that I find the current format really confusing - the user is supposed to supply a DataFrame, but the only docstring brought up by ?predict is the GLM low-level version using AbstractMatrix, which is not supposed to be used by the user. I suggest having a DataFrame/(or really StatModels) version of the docstring specifying that this is the format expected to be used by the main user, and the low-level ones are for package developers.

@mkborregaard
Copy link
Contributor Author

I tried to incorporate a description of newX now.

Form the predicted response of model `obj`. New covariate values `newX` can be supplied as a `DataFrame`
with the same names as the original predictors, or as an AbstractMatrix (for low-level use) with a column
for the value of each predictor (including a `Vector` of 1s if an intercept was fitted).
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the place to be specific about newX. I think the right place to add documentation is here https://github.com/JuliaStats/DataFrames.jl/blob/master/src/statsmodels/statsmodel.jl#L76

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is, all these methods are going to be moved to StatsModels. So I don't think it's worth the trouble of adding partial docstrings everywhere just to remove them in a few weeks.

@mkborregaard
Copy link
Contributor Author

This PR was requested in another PR (on GLM) that was merged Jan 5. Should this be merged as well now?

@andreasnoack
Copy link
Member

Sorry for a negative comment but I still don't think this is the right place for the doc string. It belongs where the functionality is implemented.

@mkborregaard
Copy link
Contributor Author

No problem, I am just following up. If you can direct me to where this should go, in your view (if anywhere) I am happy to close this and make a new PR there.

@nalimilan
Copy link
Member

Let's just document here the method taking AbstractMatrix then, and the other one taking a data frame can go to DataFrames and StatsModels.

@andreasnoack
Copy link
Member

The method here just throws an error. The reason to have it here is to be able to import the name such that downstream packages can all use it without conflicts. If we add a docstring here, I think it should simply state that reason. Specific implementations can then add docstrings. There is already one in GLM, right?

@mkborregaard
Copy link
Contributor Author

There is one in GLM (https://github.com/JuliaStats/GLM.jl/blob/master/src/lm.jl#L161) but the problem is that it only uses AbstractMatrix (because that is what gets called here), whereas the user is supposed to call the DataFrames version (JuliaStats/GLM.jl#165 (comment)). The preferred DataFrames version is not documented, which is super confusing for the user, so I think the attempt was to remedy this.

@nalimilan
Copy link
Member

Specific implementations can then add docstrings. There is already one in GLM, right?

@andreasnoack Yet we don't want all modeling packages to write their own docstrings, since they should all follow the same pattern for the most part for consistency. There should be a minimal docstring in StatsBase/StatsModels.

Then we can document the generic data frame method in DataFrames. That one cannot be documented by each modeling package, since they should only define an AbstractMatrix method.

@andreasnoack
Copy link
Member

I think many modeling packages would need to write their own docstring to give details about the implementation but I've been convinced that it could be helpful to have a general description of how predict is supposed to work here. I would like that the use of DataFrames was changed a bit though. The desciption here should allow for many kinds of data input where DataFrames and AbstractMatrix are just examples and that this is up to downstream packages to define these.

@mkborregaard
Copy link
Contributor Author

mkborregaard commented Feb 9, 2017

Great, what is the easiest way forward - can you edit the PR directly, or do you want to merge (or close) and do the edit?
... or do you want me to give it a try?

@mkborregaard
Copy link
Contributor Author

OK, I have tried to change the docstring according to @andreasnoack 's suggestion

@andreasnoack andreasnoack merged commit addbe72 into JuliaStats:master Feb 10, 2017
@mkborregaard mkborregaard deleted the docstring-for-predict branch February 10, 2017 12:23
@mkborregaard
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants