You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ability to add custom stopwords at classifier initialization (#129)
* Abbility to add custom stopwords at classifier initialization
* Downcased custom test stopwords
* Documented and improved custom stopwords handling
* Added test cases for custom stopwords and empty trainings, #125 and #130
* Added documentation for auto-categorization and custom stopwords
When you ask a Bayesian classifier to classify text against a set of trained categories it does so by generating a score (as a Float) for each possible category.
Copy file name to clipboardExpand all lines: lib/classifier-reborn/bayes.rb
+24-1Lines changed: 24 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,8 @@
2
2
# Copyright:: Copyright (c) 2005 Lucas Carlson
3
3
# License:: LGPL
4
4
5
+
require'set'
6
+
5
7
require_relative'category_namer'
6
8
require_relative'backends/bayes_memory_backend'
7
9
require_relative'backends/bayes_redis_backend'
@@ -16,10 +18,11 @@ class Bayes
16
18
#
17
19
# Options available are:
18
20
# language: 'en' Used to select language specific stop words
19
-
# auto_categorize: false When true, enables ability to dynamically declare a category
21
+
# auto_categorize: false When true, enables ability to dynamically declare a category; the default is true if no initial categories are provided
20
22
# enable_threshold: false When true, enables a threshold requirement for classifition
21
23
# threshold: 0.0 Default threshold, only used when enabled
22
24
# enable_stemmer: true When false, disables word stemming
25
+
# stopwords: nil Accepts path to a text file or an array of words, when supplied, overwrites the default stopwords; assign empty string or array to disable stopwords
23
26
# backend: BayesMemoryBackend.new Alternatively, BayesRedisBackend.new for persistent storage
24
27
definitialize(*args)
25
28
initial_categories=[]
@@ -51,6 +54,10 @@ def initialize(*args)
51
54
initial_categories.eachdo |c|
52
55
add_category(c)
53
56
end
57
+
58
+
ifoptions.key?(:stopwords)
59
+
custom_stopwordsoptions[:stopwords]
60
+
end
54
61
end
55
62
56
63
# Provides a general training method for all categories specified in Bayes#new
@@ -236,5 +243,21 @@ def add_category(category)
236
243
end
237
244
238
245
alias_method:append_category,:add_category
246
+
247
+
private
248
+
249
+
# Overwrites the default stopwords for current language with supplied list of stopwords or file
0 commit comments