Skip to content

Discussion around tabular data related taxonomy #137

Closed
@merveenoyan

Description

I don’t know if it’s right place to discuss but I kinda have an objection for tabular tasks. I recently opened a PR to rename structured data classification to tabular classification, see here. If we will invest in this I don’t want to change the name of this pipeline for now or find something that covers regression as well (see below).

My main concern is that I looked at structured-data-classification and thought regression couldn't be done with this. First thing you learn in ML101 is the difference between the two, it's too fundamental imo yet can be fixed with a small change.

The taxonomy according to outputs should be like this:

  • Classification: in output you get a categorical variable (type “object” in python)
  • Regression: you get a numerical variable
  • Above two can be handled with the same widget, where you could output them as strings. This is one concern, because there’s a strict distinction between two problems, and I felt like we should come up with a name that would cover both if we want to use same widget or we can put two widgets just named differently.
  • Clustering: in output you get clusters. This is different and would require something else if we ever invest into it.

These are three main task types, I wanted to open this discussion to everyone before moving on.
So I see three ways:
We can either have two different ones, one will output str and other will float or int. this is too much work.
We can have two different things that have different names referring to same object/widget for better visibility.
We come up with a name that will cover both. (e.g. tabular-classification as suggested by @adrinjalali)

Pinging @lhoestq @osanseviero @julien-c

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions