-
Notifications
You must be signed in to change notification settings - Fork 12
Python module
vchahun edited this page Sep 19, 2012
·
6 revisions
This page contains some documentation for the python interface to creg
.
Using a virtualenv:
$ git clone https://github.com/redpony/creg.git && cd creg/
$ virtualenv env/
$ . env/bin/activate
$ python setup.py install
$ python -c "import creg; print creg.BIAS"
***BIAS***
- Define your data using a
RealvaluedDataset
(linear regression), aCategoricalDataset
(logistic regression) or anOrdinalDataset
(ordinal regression) by passing to the constructor an iterator of (dict
feature vector, response value) pairs - Create one of the available models:
LinearRegression
,LogisticRegression
orOrdinalRegression
by passing optional configuration parameters as arguments -
fit
the model with the training data as an argument - Retrieve the learned feature weights using the
weights
field of the model -
predict
orevaluate
your models with an evaluation dataset
import creg
import random
sigma = 0.5
beta = (1, 2, 3)
def synthetic(N):
""" Generate noisy data (polynomial + gaussian noise) """
for i in range(N):
x = i/float(N)
y = beta[0] + beta[1] * x + beta[2] * x**2 + random.normalvariate(0, sigma)
yield {'x': x, 'x^2': x**2}, y
train_data = creg.RealvaluedDataset(synthetic(1000))
print train_data # <Dataset: 1000 instances, 3 features>
model = creg.LinearRegression()
model.fit(train_data)
print model.weights # <Weights: 3 values, 3 non-zero>
est_beta = tuple(model.weights[fn] for fn in (creg.BIAS, 'x', 'x^2'))
print ' Real: y = %.3f + %.3f * x + %.3f * x^2' % beta
print 'Estimated: y = %.3f + %.3f * x + %.3f * x^2' % est_beta
test_data = creg.RealvaluedDataset(synthetic(100))
predictions = model.predict(test_data)
truth = (y for x, y in test_data)
errors = sum(abs(pred-real) for (pred, real) in zip(predictions, truth))
print 'MAE: %.3f' % (errors/float(len(test_data)))
import creg
import random
randn = random.normalvariate
sigma = 1
def synthetic(N):
""" Generate two gaussian components """
for _ in range(N):
z = random.randint(0, 1)
center = ((-1, -1) if z == 0 else (1, 1))
a, b = randn(center[0], sigma), randn(center[1], sigma)
yield {'a': a, 'b': b}, z
train_data = creg.CategoricalDataset(synthetic(1000))
print train_data # <Dataset: 1000 instances, 3 features>
model = creg.LogisticRegression()
model.fit(train_data)
print model.weights
test_data = creg.CategoricalDataset(synthetic(100))
predictions = model.predict(test_data)
truth = (y for x, y in test_data)
errors = sum(1 if pred != real else 0 for (pred, real) in zip(predictions, truth))
print 'Accuracy: %.3f' % (1-errors/float(len(test_data)))