Skip to content

Commit

Permalink
mbok#2 Don't fail aggregation in case of algorithmic conditions not s…
Browse files Browse the repository at this point in the history
…atisfied by the data => just serve the empty aggregation
  • Loading branch information
mbok committed Jul 16, 2017
1 parent fe54c29 commit 2faad6d
Show file tree
Hide file tree
Showing 22 changed files with 286 additions and 179 deletions.
110 changes: 94 additions & 16 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,9 @@ regarding the estimated model with respect to a set of given input values for th
`value`:: The predicted value for the response variable computed using the estimated linear hypothesis
function ``h(x)`` with `x` given by `C` input values for the explanatory variables
`x = [x~1~, x~2~,...,x~C~]`.
`coefficients`:: Estimated slope coefficients
image:http://latex.codecogs.com/gif.latex?\theta_1,%20\theta_2,%20\theta_3,.%20.%20.,%20\theta_C%20[]
`coefficients`:: Estimated coefficients
image:http://latex.codecogs.com/gif.latex?\theta_0,%20\theta_1,%20\theta_2,%20\theta_3,.%20.%20.,%20\theta_C%20[]
of the linear linear hypothesis function ``h(x)``.
`intercept`:: Estimated intercept coefficient image:http://latex.codecogs.com/gif.latex?\theta_0%20[]
of the linear hypothesis function ``h(x)``.

Assuming the data consists of documents representing sold house prices with features
like number of bedrooms, bathrooms and size etc. we can let predict or validate
Expand Down Expand Up @@ -80,11 +78,11 @@ And the following may be the response with the estimated price of around $ 581,4
"my_house_price": {
"value": 581458.3087492324,
"coefficients": [
227990.63952712028,
248.92285661317254,
-68297.7720278421,
64406.52205356777
],
"intercept": 227990.63952712028
]
}
}
}
Expand All @@ -99,11 +97,9 @@ The `linreg_stats` aggregation computes statistics for the estimated linear regr
`rss`:: Residual sum of squares as a measure of the discrepancy between the data and the estimated model.
The lower the `rss` number, the smaller the error of the prediction, and the better the model.
`mse`:: Mean squared error or rather `rss` divided by the number of documents consumed for model estimation.
`coefficients`:: Slope coefficients
image:http://latex.codecogs.com/gif.latex?\theta_1,%20\theta_2,%20\theta_3,.%20.%20.,%20\theta_C%20[]
`coefficients`:: Estimated coefficients
image:http://latex.codecogs.com/gif.latex?\theta_0,%20\theta_1,%20\theta_2,%20\theta_3,.%20.%20.,%20\theta_C%20[]
of the linear linear hypothesis function ``h(x)``.
`intercept`:: Intercept coefficient image:http://latex.codecogs.com/gif.latex?\theta_0%20[]
of the linear hypothesis function ``h(x)``.

Assuming the data consists of documents representing house prices we can compute statistics for
the estimated best fitting linear hypothesis function which predicts house prices based on number of
Expand Down Expand Up @@ -135,11 +131,11 @@ and the last for the response variable. The above request returns the following
"rss": 49523788338938.734,
"mse": 63410740510.80504,
"coefficients": [
47553.18737564783,
-100544.0725894584,
45981.15827544966,
309.6013051477475
],
"intercept": 47553.18737564783
]
}
}
}
Expand Down Expand Up @@ -180,7 +176,8 @@ Do not forget to restart the node after installing.
[frame="all"]
|===
| Plugin version | Elasticsearch version | Release date
| https://github.com/scaleborn/elasticsearch-linear-regression/releases/download/5.3.0.1/elasticsearch-linear-regression-5.3.0.1.zip[5.3.0.1] | 5.3.0 | Jun 1, 2017
| https://github.com/scaleborn/elasticsearch-linear-regression/releases/download/5.3.0.1/elasticsearch-linear-regression-5.3.0.2.zip[5.3.0.2] | 5.3.0 | Jul 16, 2017
| https://github.com/scaleborn/elasticsearch-linear-regression/releases/download/5.3.0.1/elasticsearch-linear-regression-5.3.0.1.zip[5.3.0.1] | 5.3.0 | Jun 30, 2017
|===

## Examples
Expand All @@ -198,7 +195,7 @@ https://github.com/scaleborn/elasticsearch-linear-regression/tree/master/example
./bin/logstash -f house-prices-import.conf
....

The indexed data will have this form:
The indexed documents will have this form:
[source,js]
--------------------------------------------------
{
Expand Down Expand Up @@ -250,16 +247,97 @@ $ 650,000 to pay for the desired house in "Morro Bay".
"dream_house_price": {
"value": 649918.0709489314,
"coefficients": [
228318.6161854365,
249.02340193904183,
-68314.4830871133,
64248.05007337558
],
"intercept": 228318.6161854365
]
}
}
}
--------------------------------------------------

By using sub aggregations we are able to find out the estimated prices per location:
[source,js]
--------------------------------------------------
/houses/_search?size=0
{
"aggs": {
"locations": {
"terms": {
"field": "location.keyword",
"size": 15
},
"aggs": {
"dream_house_price": {
"linreg_predict": {
"fields": ["size", "bedrooms", "bathrooms", "price"],
"inputs": [2000, 3, 2]
}
}
}
}
}
}
--------------------------------------------------

The response uncovers that "Arroyo Grande" would be
the most expensive region for our dream house:

[source,js]
--------------------------------------------------
{
"aggregations": {
"locations": {
"buckets": [
{
"key": "Santa Maria-Orcutt",
"doc_count": 265,
"dream_house_price": {
"value": 256251.9105297585,
"coefficients": [
26437.192829649313,
81.19071633227178,
6825.9128627023265,
23477.773223729317
]
}
},
{
"key": "Paso Robles",
"doc_count": 85,
"dream_house_price": {
"value": 365620.0386191703,
"coefficients": [
42958.257094706176,
151.7000907380368,
6486.477078139843,
-98.91559301451247
]
}
},
...
{
"key": " Arroyo Grande",
"doc_count": 12,
"dream_house_price": {
"value": 1140196.791331573,
"coefficients": [
728566.7474390095,
1956.6474540196602,
-706891.620925945,
-690495.0006844609
]
}
}
...
]
}
}
}
--------------------------------------------------


## License
Copyright 2017 Scaleborn UG (haftungsbeschränkt).

Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ wagon-ssh-external.version=2.10
commons-math3.version=3.6.1
group=org.scaleborn.elasticsearch.plugin
name=elasticsearch-linear-regression
version=5.3.0.1
version=5.3.0.2
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
import org.elasticsearch.common.logging.Loggers;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.scaleborn.elasticsearch.linreg.aggregation.support.BaseInternalAggregation;
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
import org.scaleborn.linereg.estimation.SlopeCoefficients;

/**
* Created by mbok on 11.04.17.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.search.aggregations.InternalAggregation.CommonFields;
import org.scaleborn.elasticsearch.linreg.aggregation.support.ModelResults;
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
import org.scaleborn.linereg.estimation.SlopeCoefficients;

/**
* Created by mbok on 11.04.17.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
import org.scaleborn.linereg.calculation.statistics.Statistics;
import org.scaleborn.linereg.calculation.statistics.StatsCalculator;
import org.scaleborn.linereg.calculation.statistics.StatsModel;
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
import org.scaleborn.linereg.estimation.SlopeCoefficients;

/**
* Created by mbok on 21.03.17.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
import org.scaleborn.elasticsearch.linreg.aggregation.support.ModelResults;
import org.scaleborn.linereg.calculation.statistics.Statistics;
import org.scaleborn.linereg.calculation.statistics.Statistics.DefaultStatistics;
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
import org.scaleborn.linereg.estimation.SlopeCoefficients;

/**
* Created by mbok on 07.04.17.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,12 @@
import org.elasticsearch.search.aggregations.InternalAggregation;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.scaleborn.linereg.calculation.intercept.InterceptCalculator;
import org.scaleborn.linereg.evaluation.DerivationEquation;
import org.scaleborn.linereg.evaluation.DerivationEquationBuilder;
import org.scaleborn.linereg.evaluation.DerivationEquationSolver;
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
import org.scaleborn.linereg.evaluation.commons.CommonsMathSolver;
import org.scaleborn.linereg.estimation.DerivationEquation;
import org.scaleborn.linereg.estimation.DerivationEquationBuilder;
import org.scaleborn.linereg.estimation.DerivationEquationSolver;
import org.scaleborn.linereg.estimation.DerivationEquationSolver.EstimationException;
import org.scaleborn.linereg.estimation.SlopeCoefficients;
import org.scaleborn.linereg.estimation.commons.CommonsMathSolver;

/**
* Created by mbok on 07.04.17.
Expand Down Expand Up @@ -142,9 +143,7 @@ public InternalAggregation doReduce(final List<InternalAggregation> aggregations

// return empty result if all samples are null
if (aggs.isEmpty()) {
return buildInternalAggregation(this.name, this.featuresCount, null, null,
pipelineAggregators(),
getMetaData());
return buildEmptyInternalAggregation();
}

final S composedSampling = buildSampling(this.featuresCount);
Expand All @@ -154,14 +153,34 @@ public InternalAggregation doReduce(final List<InternalAggregation> aggregations
composedSampling.merge((S) ((BaseInternalAggregation) aggs.get(i)).sampling);
}

final M evaluatedResults = evaluateResults(composedSampling);
if (composedSampling.getCount() <= composedSampling.getFeaturesCount()) {
LOGGER.debug(
"Insufficient amount of training data for model estimation, at least {} are required, given {}",
composedSampling.getFeaturesCount() + 1, composedSampling.getCount());
return buildEmptyInternalAggregation();
}

M evaluatedResults = null;
try {
evaluatedResults = evaluateResults(composedSampling);
} catch (final EstimationException e) {
LOGGER.debug(
"Failed to estimate model", e);
return buildEmptyInternalAggregation();
}

LOGGER.debug("Evaluated results: {}", evaluatedResults);
return buildInternalAggregation(this.name, this.featuresCount, composedSampling,
evaluatedResults,
pipelineAggregators(), getMetaData());
}

private InternalAggregation buildEmptyInternalAggregation() {
return buildInternalAggregation(this.name, this.featuresCount, null, null,
pipelineAggregators(),
getMetaData());
}

protected abstract A buildInternalAggregation(final String name, final int featuresCount,
final S linRegSampling,
final M results,
Expand All @@ -171,12 +190,12 @@ protected abstract M buildResults(S composedSampling, SlopeCoefficients slopeCoe
double intercept);


private M evaluateResults(final S composedSampling) {
// Linear regression evaluation
private M evaluateResults(final S composedSampling) throws EstimationException {
// Linear regression estimation
final DerivationEquation derivationEquation = derivationEquationBuilder
.buildDerivationEquation(composedSampling);
final SlopeCoefficients slopeCoefficients = derivationEquationSolver
.solveCoefficients(derivationEquation);
.estimateCoefficients(derivationEquation);
final M buildResults = buildResults(composedSampling, slopeCoefficients,
interceptCalculator.calculate(slopeCoefficients, composedSampling, composedSampling));
return buildResults;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
package org.scaleborn.elasticsearch.linreg.aggregation.support;

import java.io.IOException;
import org.scaleborn.linereg.evaluation.SlopeCoefficientsSampling.SlopeCoefficientsSamplingProxy;
import org.scaleborn.linereg.estimation.SlopeCoefficientsSampling.SlopeCoefficientsSamplingProxy;
import org.scaleborn.linereg.sampling.Sampling.InterceptSampling;
import org.scaleborn.linereg.sampling.io.StateInputStream;
import org.scaleborn.linereg.sampling.io.StateOutputStream;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,69 +17,52 @@
package org.scaleborn.elasticsearch.linreg.aggregation.support;

import java.io.IOException;
import java.util.Arrays;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.common.io.stream.Writeable;
import org.elasticsearch.common.xcontent.ToXContent;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
import org.scaleborn.linereg.evaluation.SlopeCoefficients.DefaultSlopeCoefficients;
import org.scaleborn.linereg.estimation.SlopeCoefficients;

/**
* Created by mbok on 07.04.17.
*/
public class ModelResults implements Writeable, ToXContent {

private SlopeCoefficients slopeCoefficients;

private double intercept;
private final double[] coefficients;

public ModelResults(final SlopeCoefficients slopeCoefficients, final double intercept) {
this.slopeCoefficients = slopeCoefficients;
this.intercept = intercept;
final int slopeLen = slopeCoefficients.getCoefficients().length;
this.coefficients = new double[slopeLen + 1];
System.arraycopy(slopeCoefficients.getCoefficients(), 0, this.coefficients, 1, slopeLen);
this.coefficients[0] = intercept;
}

public ModelResults(final StreamInput in) throws IOException {
this.slopeCoefficients = new DefaultSlopeCoefficients(in.readDoubleArray());
this.intercept = in.readDouble();
this.coefficients = in.readDoubleArray();
}

@Override
public void writeTo(final StreamOutput out) throws IOException {
out.writeDoubleArray(this.slopeCoefficients.getCoefficients());
out.writeDouble(this.intercept);
}

public SlopeCoefficients getSlopeCoefficients() {
return this.slopeCoefficients;
}

public void setSlopeCoefficients(final SlopeCoefficients slopeCoefficients) {
this.slopeCoefficients = slopeCoefficients;
}

public double getIntercept() {
return this.intercept;
out.writeDoubleArray(this.coefficients);
}

public void setIntercept(final double intercept) {
this.intercept = intercept;
public double[] getCoefficients() {
return this.coefficients;
}


@Override
public String toString() {
return "ModelResults{" +
"slopeCoefficients=" + this.slopeCoefficients +
", intercept=" + this.intercept +
"coefficients=" + Arrays.toString(this.coefficients) +
'}';
}

@Override
public XContentBuilder toXContent(final XContentBuilder builder, final Params params)
throws IOException {
builder.array("coefficients", this.getSlopeCoefficients().getCoefficients());
builder.field("intercept", this.getIntercept());
builder.array("coefficients", this.coefficients);
return builder;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

package org.scaleborn.linereg.calculation.intercept;

import org.scaleborn.linereg.evaluation.SlopeCoefficients;
import org.scaleborn.linereg.estimation.SlopeCoefficients;
import org.scaleborn.linereg.sampling.Sampling.InterceptSampling;
import org.scaleborn.linereg.sampling.Sampling.SamplingContext;

Expand Down
Loading

0 comments on commit 2faad6d

Please sign in to comment.