-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create rule S6970: The Scikit-learn \"fit\" method should be called before methods yielding results #3869
base: master
Are you sure you want to change the base?
Conversation
…efore methods yielding results
96c24ae
to
249eea5
Compare
rules/S6970/python/rule.adoc
Outdated
|
||
== Why is this an issue? | ||
|
||
When using the Scikit-learn library it is crucial to train the model before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe replace "model" by "estimator" or "transformer", or both
rules/S6970/python/rule.adoc
Outdated
attempting to get results. Failing to do so can lead to incorrect results or runtime errors. | ||
The training is done with the help of the `fit` method and retrieving results can be done for example with the `predict` method. | ||
|
||
If the `predict` method is called without a prior call to the `fit` method, a `NotFittedError` will be thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe specify it's an exception ?
rules/S6970/python/rule.adoc
Outdated
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} | ||
svc = svm.SVC() | ||
clf = GridSearchCV(svc, parameters) | ||
|
||
results = clf.cv_results_ # raises an AttributeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a more minimalistic exemple like :
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
knn = KNeighborsClassifier(1)
knn.n_samples_fit_ # Raises an AttributeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and it makes me notice that I did not put the list of attributes that would raise an error.
rules/S6970/python/rule.adoc
Outdated
results = clf.cv_results_ # raises an AttributeError | ||
---- | ||
|
||
In the example above failing to train the model on the iris dataset with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ,
after "above"
rules/S6970/python/rule.adoc
Outdated
|
||
== How to fix it | ||
|
||
To fix the issue train the model by using the `fit` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ","
@@ -75,16 +75,16 @@ kmeans.predict(X) # Compliant | |||
== Resources | |||
=== Documentation | |||
|
|||
* Scikit-learn Documentation - https://scikit-learn.org/stable/glossary.html#term-fit[term fit reference] | |||
* Scikit-learn Documentation - https://scikit-learn.org/stable/glossary.html#term-fit[Glossary fit reference] | |||
* Scikit-learn Documentation - https://scikit-learn.org/stable/modules/generated/sklearn.exceptions.NotFittedError.html#sklearn.exceptions.NotFittedError[NotFittedError reference] | |||
|
|||
ifdef::env-github,rspecator-view[] | |||
|
|||
Implementation details: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check if the object is deserialized through pickle/joblib/... ? Or are we going to lose the type anyways ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we would be able to get the proper type back if it is deserialized that's true. I don't think we will be able to raise the error correctly in that case.
iris = datasets.load_iris() | ||
X = iris.data | ||
|
||
kmeans = KMeans(n_clusters=3, random_state=42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have a correct random_state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what sense this is not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it's my bad. I got confused with the numpy.random.Generator
rules/S6970/python/metadata.json
Outdated
"sqKey": "S6970", | ||
"scope": "All", | ||
"defaultQualityProfiles": ["Sonar way"], | ||
"quickfix": "unknown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: maybe set the quick fix to infeasible ?
Quality Gate passed for 'rspec-frontend'Issues Measures |
Quality Gate passed for 'rspec-tools'Issues Measures |
You can preview this rule here (updated a few minutes after each push).
Review
A dedicated reviewer checked the rule description successfully for: