[SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemble tree model in PySpark #18120

facaiy · 2017-05-26T08:54:37Z

What changes were proposed in this pull request?

add getMaxDepth method for ensemble tree models:

RandomForestClassifierModel
RandomForestRegressionModel
GBTClassificationModel
GBTRegressionModel

How was this patch tested?

pass all unit tests.
add new doctest.

facaiy · 2017-05-26T08:57:15Z

@keypointt Hi, could you help check whether the pr is consistent with your #17207 ? Thanks.

AmplabJenkins · 2017-05-26T08:57:18Z

Can one of the admins verify this patch?

keypointt · 2017-05-27T19:26:52Z

Hi Facai, you exposed api getMaxDepth() in TreeEnsembleModel, and the rest changes are in comment and not being run.

I'd suggest you put the tests in test module and run them to verify

facaiy · 2017-05-27T21:15:11Z

Hi, @keypointt . It's the feature of Python. The doctest is both document and unit test.

keypointt · 2017-05-27T21:32:22Z

Oh sorry i missed it, thanks for the heads up.

…

On Sat, May 27, 2017 at 2:16 PM Yan Facai (颜发才) ***@***.***> wrote: Hi, @keypointt <https://github.com/keypointt> . It's the feature of Python. The doctest is both document and unit test. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18120 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADvmieVs45e2uCq-g3NV2ViMyc8AbiDEks5r-JJ4gaJpZM4NnVv4> .

sethah · 2017-05-30T17:45:28Z

cc @BryanCutler.

Bryan did some work on #17849. It seems even with that patch, we still need to add methods like these, hoping Bryan can confirm. If we're going to add some param accessors to the models, best to do them all at once yes? E.g. we can also add getMaxBins and others.

BryanCutler · 2017-05-30T22:26:19Z

Thanks @facaiy for the PR. This might be enough to simply retrieve the value from the Java model, but I think the Python model also needs to "own" the param. For example, if we have a DecisionTreeRegressor called dt and a DecisionTreeRegressionModel called model then

In [8]: dt.hasParam("maxDepth")
Out[8]: True

In [9]: model.hasParam("maxDepth")
Out[9]: False

This is because the Python object does not have an instance of the param, its only getting a value from the Java model. Additionally, many of the methods you would expect to work from class Params would raise an error like

In [4]: dt.explainParam("maxDepth")
Out[4]: 'maxDepth: Maximum depth of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (default: 5, current: 2)

In [5]: model.explainParam("maxDepth")
...
AttributeError: 'DecisionTreeRegressionModel' object has no attribute 'maxDepth'

As @sethah pointed out #17849 has the fix so that the Python models would have an instance of each param, so that should go in first. Then, the accessor could be written like this:

def getMaxDepth(self):
    return self.getOrDefault(self.maxDepth)

I'm not sure what the best approach for adding these accessors, all at once or one by one as needed, like with maxDepth?

cc @holdenk @jkbradley for your input

facaiy · 2017-05-31T02:57:48Z

Thanks, @BryanCutler.
It seems that #17849 copys Params from Estimator to Model automatically, which is pretty useful. However, getter method is still missing and need to be added one by one manually as the pr.

If only model could inherit both params and its getter method, perhaps it will be better in my opinion.

Anyway, it is OK whether the pr will be merged or not, as it is trivial.

facaiy added 2 commits May 26, 2017 16:45

ENH: add maxDepth for resemble tree model

96e797f

TST: add doctest

5081592

facaiy closed this Aug 26, 2017

facaiy deleted the ENH/pyspark_rf_add_get_max_depth branch August 26, 2017 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemble tree model in PySpark #18120

[SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemble tree model in PySpark #18120

Uh oh!

facaiy commented May 26, 2017 •

edited

Loading

Uh oh!

facaiy commented May 26, 2017 •

edited

Loading

Uh oh!

AmplabJenkins commented May 26, 2017

Uh oh!

keypointt commented May 27, 2017

Uh oh!

facaiy commented May 27, 2017

Uh oh!

keypointt commented May 27, 2017 via email

Uh oh!

sethah commented May 30, 2017

Uh oh!

BryanCutler commented May 30, 2017

Uh oh!

facaiy commented May 31, 2017 •

edited

Loading

Uh oh!

Uh oh!

[SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemble tree model in PySpark #18120

[SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemble tree model in PySpark #18120

Uh oh!

Conversation

facaiy commented May 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

facaiy commented May 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmplabJenkins commented May 26, 2017

Uh oh!

keypointt commented May 27, 2017

Uh oh!

facaiy commented May 27, 2017

Uh oh!

keypointt commented May 27, 2017 via email

Uh oh!

sethah commented May 30, 2017

Uh oh!

BryanCutler commented May 30, 2017

Uh oh!

facaiy commented May 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

facaiy commented May 26, 2017 •

edited

Loading

facaiy commented May 26, 2017 •

edited

Loading

facaiy commented May 31, 2017 •

edited

Loading