Skip to content
vchahun edited this page Sep 19, 2012 · 6 revisions

This page contains some documentation for the python interface to creg.

Installation

Using a virtualenv:

$ git clone https://github.com/redpony/creg.git && cd creg/
$ virtualenv env/
$ . env/bin/activate
$ python setup.py install
$ python -c "import creg; print creg.BIAS"
***BIAS***

API

  • Define your data using a RealvaluedDataset (linear regression), a CategoricalDataset (logistic regression) or an OrdinalDataset (ordinal regression) by passing to the constructor an iterator of (dict feature vector, response value) pairs
  • Create one of the available models: LinearRegression, LogisticRegression or OrdinalRegression by passing optional configuration parameters as arguments
  • fit the model with the training data as an argument
  • Retrieve the learned feature weights using the weights field of the model
  • predict or evaluate your models with an evaluation dataset

Linear regression example

import creg
import random

sigma = 0.5
beta = (1, 2, 3)
def synthetic(N):
    """ Generate noisy data (polynomial + gaussian noise) """
    for i in range(N):
        x = i/float(N)
        y = beta[0] + beta[1] * x + beta[2] * x**2 + random.normalvariate(0, sigma)
        yield {'x': x, 'x^2': x**2}, y

train_data = creg.RealvaluedDataset(synthetic(1000))
print train_data # <Dataset: 1000 instances, 3 features>

model = creg.LinearRegression()
model.fit(train_data)
print model.weights # <Weights: 3 values, 3 non-zero>

est_beta = tuple(model.weights[fn] for fn in (creg.BIAS, 'x', 'x^2'))
print '     Real: y = %.3f + %.3f * x + %.3f * x^2' % beta
print 'Estimated: y = %.3f + %.3f * x + %.3f * x^2' % est_beta

test_data = creg.RealvaluedDataset(synthetic(100))
predictions = model.predict(test_data)
truth = (y for x, y in test_data)
errors = sum(abs(pred-real) for (pred, real) in zip(predictions, truth))
print 'MAE: %.3f' % (errors/float(len(test_data)))

Logistic regression example

import creg
import random

randn = random.normalvariate

sigma = 1
def synthetic(N):
    """ Generate two gaussian components """
    for _ in range(N):
        z = random.randint(0, 1)
        center = ((-1, -1) if z == 0 else (1, 1))
        a, b = randn(center[0], sigma), randn(center[1], sigma)
        yield {'a': a, 'b': b}, z

train_data = creg.CategoricalDataset(synthetic(1000))
print train_data # <Dataset: 1000 instances, 3 features>

model = creg.LogisticRegression()
model.fit(train_data)
print model.weights

test_data = creg.CategoricalDataset(synthetic(100))
predictions = model.predict(test_data)
truth = (y for x, y in test_data)
errors = sum(1 if pred != real else 0 for (pred, real) in zip(predictions, truth))
print 'Accuracy: %.3f' % (1-errors/float(len(test_data)))
Clone this wiki locally