Description
This is the first time I've looked into machine learning but I have a use case I'd like to test with it.
To get started I've created a simple example from the house pricing scenario which somewhat closely matches my use case but the results I'm getting are not at all close to what I expected. The data I'm providing is simply linear in terms of just the SqftLiving
input parameter to the Price
where Price = SqftLiving * 100
. The SqftLot
is held constant for training and prediction so it should be a non-factor.
I'm just trying to predict the Price when the SqftLiving
is 1500 which with the linear model created by the provided data should make it about $150,000.
However, the results I get vary wildly from the negative to the postive 10's of millions every time I run the program which is unexpected. Could someone look into this simple example and let me know what if anything I'm doing is causing these poor results?
class Program
{
static void Main(string[] args)
{
var filePath = "C://Temp/kc_house_data.csv";
File.WriteAllText(filePath, @"100000,1000,8000
200000,2000,8000
400000,4000,8000");
var pipeline = new LearningPipeline
{
new TextLoader<HousePriceData>(filePath, separator: ","),
new ColumnConcatenator("Features", "SqftLiving", "SqftLot"),
new StochasticDualCoordinateAscentRegressor()
};
var model = pipeline.Train<HousePriceData, HousePricePrediction>();
var prediction = model.Predict(new HousePriceData { SqftLiving = 1500, SqftLot = 8000 });
Console.WriteLine(prediction.Price);
Console.ReadLine();
}
}
public class HousePriceData
{
[Column(ordinal: "0", name: "Label")]
public float Price;
[Column(ordinal: "1")]
public float SqftLiving;
[Column(ordinal: "2")]
public float SqftLot;
}
public class HousePricePrediction
{
[ColumnName("Score")]
public float Price;
}