Skip to content

terminology

Manlio Morini edited this page Jul 25, 2018 · 5 revisions

The following terms will come up repeatedly in other documents about Vita.

Instance

The thing about which you want to make a prediction. For example, the instance might be an image that you want to classify as either hotdog or not hotdog (Silicon Valley: Season 4 Episode 4).

Label

An answer for a prediction task, either the answer produced by the machine learning system, or the right answer supplied in training data.

For example, the label for an image might be hotdog.

Consider that:

  • symbolic regression tasks have a number as label and it can be accessed via the label_as function;

  • although classification tasks might codify classes with different, problem-specific data types, Vita always uses its own scheme encoding classes with an integer (class_t). This allows a simpler, uniform, manipulation.

    The actual label can be accessed via the label(example) function and the original label via the dataframe::class_name method.

Feature

A property of an instance used in a prediction task. For example, a car might have a feature Mileage.

NOTE: in Machine Learning a feature has several meanings depending on the context. It can be a data type (e.g. Mileage) or a data type plus its value. Many people use the words attribute and feature interchangeably.

Example

An instance (with its features) and a label.

In Vita the class representing an example is dataframe::example (see dataframe.h and dataframe.cc).

Model

A statistical representation of a prediction task. You train a model on examples then use the model to make predictions.

Metric

A number that you care about. May or may not be directly optimized.

Objective

A metric that your algorithm is trying to optimize.

Pipeline

The infrastructure surrounding a machine learning algorithm. Includes gathering the data from the front end, putting it into training data files, training one or more models, and exporting the models to production.

Clone this wiki locally