Skip to content

Python - Different handling of similar looking data in LGBMClassifier #621

Closed
@vivekk0903

Description

Related to the question: https://stackoverflow.com/questions/44558435/scikit-learn-predicting-a-single-observation.

The following lines are handled differently by the python LGBMClassifier:

clf.predict_proba(X[1])             # Throws an error about data not being 2-d.
clf.predict_proba(list(X[1]))      # Takes the data as single row, and produces results.

Environment info

Linux-3.16.0-77-generic-x86_64-with-Ubuntu-14.04-trusty
('Python', '2.7.6 (default, Oct 26 2016, 20:30:19) \n[GCC 4.8.4]')
('NumPy', '1.13.0')
('SciPy', '0.19.0')
('Scikit-Learn', '0.18.1')

Reproducible example

from sklearn.datasets import load_iris
data = load_iris()
X = data.data
y = data.target

from lightgbm import LGBMClassifier

clf = LGBMClassifier()
clf.fit(X,y)

# Case 1
clf.predict_proba(X[1])

#Case 2
clf.predict_proba(list(X[1]))

Explanation

Case 1 raises an error:

raise ValueError('Input numpy.ndarray must be 2 dimensional')

Case 2 though looks almost same (a list instead of numpy array) but interpreted as a row of data with its elements considered as the features of that row and the output is correct.

I wanted to know if this is the intended behaviour or just a case missed by the prediciton code. So please feel free to close this if not suitable.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions