Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayes integration test of Memory and Redis backends with real data #92

Merged
merged 5 commits into from
Jan 6, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
Gemfile.lock
pkg
*.rdb
2 changes: 1 addition & 1 deletion test/bayes/bayesian_common_tests.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

module BayesianCommonTests
def test_good_training
assert_equal ['love'], @classifier.train_interesting('love')
assert_equal ['love'], @classifier.train_interesting('love')
end

def test_training_with_utf8
Expand Down
54 changes: 54 additions & 0 deletions test/bayes/bayesian_integration_test.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# encoding: utf-8

require File.dirname(__FILE__) + '/../test_helper'

class BayesianIntegrationTest < Minitest::Test
def setup
begin
@memory_classifier = ClassifierReborn::Bayes.new 'Ham', 'Spam'
@redis_backend = ClassifierReborn::BayesRedisBackend.new
@redis_classifier = ClassifierReborn::Bayes.new 'Ham', 'Spam', backend: @redis_backend
rescue Redis::CannotConnectError => e
skip(e)
end
sms_spam_collection = File.expand_path(File.dirname(__FILE__) + '/../data/corpus/SMSSpamCollection.tsv')
@training_set = File.read(sms_spam_collection).force_encoding("utf-8").split("\n")
@testing_set = @training_set.pop(1000)
end

def teardown
@redis_backend.instance_variable_get(:@redis).flushdb
end

def test_equality_of_backends
train_model @memory_classifier
train_model @redis_classifier
assert_equal classification_scores(@memory_classifier).hash, classification_scores(@redis_classifier).hash
untrain_model @memory_classifier, 2000
untrain_model @redis_classifier, 2000
assert_equal classification_scores(@memory_classifier).hash, classification_scores(@redis_classifier).hash
end

def train_model(classifier)
@training_set.each do |line|
parts = line.strip.split("\t")
classifier.train(parts.first, parts.last)
end
end

def untrain_model(classifier, limit=Float::INFINITY)
@training_set.each_with_index do |line, i|
break if i >= limit
parts = line.strip.split("\t")
classifier.untrain(parts.first, parts.last)
end
end

def classification_scores(classifier)
@testing_set.collect do |line|
parts = line.strip.split("\t")
result, score = classifier.classify_with_score(parts.last)
"#{result}:#{score}"
end
end
end
13 changes: 8 additions & 5 deletions test/bayes/bayesian_redis_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,28 @@ class BayesianRedisTest < Minitest::Test

def setup
begin
@classifier = ClassifierReborn::Bayes.new 'Interesting', 'Uninteresting', backend: ClassifierReborn::BayesRedisBackend.new
@redis_backend = ClassifierReborn::BayesRedisBackend.new
@alternate_redis_backend = ClassifierReborn::BayesRedisBackend.new(db: 1)
@classifier = ClassifierReborn::Bayes.new 'Interesting', 'Uninteresting', backend: @redis_backend
rescue Redis::CannotConnectError => e
skip(e)
end
end

def teardown
@classifier.instance_variable_get(:@backend).instance_variable_get(:@redis).flushall
@redis_backend.instance_variable_get(:@redis).flushdb
@alternate_redis_backend.instance_variable_get(:@redis).flushdb
end

def another_classifier
ClassifierReborn::Bayes.new %w(Interesting Uninteresting), backend: ClassifierReborn::BayesRedisBackend.new(db: 1)
ClassifierReborn::Bayes.new %w(Interesting Uninteresting), backend: @alternate_redis_backend
end

def auto_categorize_classifier
ClassifierReborn::Bayes.new 'Interesting', 'Uninteresting', auto_categorize: true, backend: ClassifierReborn::BayesRedisBackend.new(db: 1)
ClassifierReborn::Bayes.new 'Interesting', 'Uninteresting', auto_categorize: true, backend: @alternate_redis_backend
end

def threshold_classifier(category)
ClassifierReborn::Bayes.new category, backend: ClassifierReborn::BayesRedisBackend.new(db: 1)
ClassifierReborn::Bayes.new category, backend: @alternate_redis_backend
end
end
5 changes: 5 additions & 0 deletions test/data/corpus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Sample dtasets for training and testing

## SMSSpamCollection.tsv

The [SMS Spam Collection v.1](http://dcomp.sor.ufscar.br/talmeida/smsspamcollection/) is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being `ham` (legitimate) or `spam`.
Loading