Skip to content

[ML] Support multinomial logistic regression in data frame analysis #982

Closed
@tveasey

Description

@tveasey

This issue tracks introducing multinomial logistic regression. The main complication of this is that the loss function has multiple parameters and the forest needs to predict all of them. This breaks the assumption that we have a single prediction, gradient and curvature. The task therefore breaks down into some preliminary tasks independent of actually implementing the new loss function.

  • Extend CLoss and CArgMinLoss to support multivalued predictions, gradients and a Hessian matrix
  • Rework caching to store multivalued predictions, gradients and the Hessian. This affects how we index values in the data frame, the memory calculation, etc
  • Rework the code to compute aggregate derivatives for candidate splits
  • Rework the code to choose the best candidate split
  • Extend CBoostedTreeNode to support multi-valued leaves
  • Extend the inference model definition to support multi-valued leaves
  • Extend feature importance to support multi-valued leaves
  • Write the new loss function
  • Wire in to classification api

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions