From the introductionary blog we know that the Naive Bayes Classifier is based on the bag-of-words model.
With the bag-of-words model we check which word of the text-document appears in a positive-words-list or a negative-words-list. If the word appears in a positive-words-list the total score of the text is updated with +1 and vice versa. If at the end the total score is positive, the text is classified as positive and if it is negative, the text is classified as negative. Simple enough!
With the Naive Bayes model, we do not take only a small set of positive and negative words into account, but all words the NB Classifier was trained with, i.e. all words presents in the training set. If a word has not appeared in the training set, we have no data available and apply Laplacian smoothing (use 1 instead of the conditional probability of the word). The probability a document belongs to a class \(C\) is given by the class probability [latex] P(C)[/latex] multiplied by the products of the conditional probabilities of each word for that class.
\[P = P(C) \cdot \prod_i P(d_i|C) = P(C) \cdot \prod_i^n \frac{count(d_i, C)}{\sum_i count(d_i, C)} = P(C) \cdot \prod_i \frac{count(d_i, C)}{V_C}\]
Here \(count(d_i,C)\) is the number of occurences of word \(d_i\) in class \(C\) , \(V_C\) is the total number of words in class \(C\) and \(n\) is the number of words in the document we are currently classifying. \(V_C\) does not change (unless the training set is expanded), so it can be placed outside of the product:
\[P = \frac{P(C)}{V_C^n} \cdot \prod_i^n count(d_i, C)\]
Implementing Naive Bayes Text Classification
With this information it is easy to implement a Naive Bayes Text Classifier. (Naive Bayes can also be used to classify non-text / numerical datasets, for an explanation see this notebook).
We have a NaiveBayesText class, which accepts the input values for X and Y as parameters for the “train()” method. Here X is a list of lists, where each lower level list contains all the words in the document. Y is a list containing the label/class of each document.
class NaiveBaseClass: def calculate_relative_occurences(self, list1): no_examples = len(list1) ro_dict = dict(Counter(list1)) for key in ro_dict.keys(): ro_dict[key] = ro_dict[key] / float(no_examples) return ro_dict
1
2
3
4
5
6
7
8
9
10
11
def get_max_value_key(self, d1):
values = d1.values()
keys = d1.keys()
max_value_index = values.index(max(values))
max_key = keys[max_value_index]
return max_key
def initialize_nb_dict(self):
self.nb_dict = {}
for label in self.labels:
self.nb_dict[label] = defaultdict(list)
class NaiveBayesText(NaiveBaseClass): “””” When the goal is classifying text, it is better to give the input X in the form of a list of lists containing words. X = [ [‘this’, ‘is’, ‘a’,…], (…) ] Y still is a 1D array / list containing the labels of each entry
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def initialize_nb_dict(self):
self.nb_dict = {}
for label in self.labels:
self.nb_dict[label] = []
def train(self, X, Y):
self.class_probabilities = self.calculate_relative_occurences(Y)
self.labels = np.unique(Y)
self.no_examples = len(Y)
self.initialize_nb_dict()
for ii in range(0,len(Y)):
label = Y[ii]
self.nb_dict[label] += X[ii]
#transform the list with all occurences to a dict with relative occurences
for label in self.labels:
self.nb_dict[label] = self.calculate_relative_occurences(self.nb_dict[label])
As we can see, the training of the Naive Bayes Classifier is done by iterating through all of the documents in the training set. From all of the documents, a Hash table (dictionary in python language) with the relative occurence of each word per class is constructed.
This is done in two steps: 1. construct a huge list of all occuring words per class:
1
2
3
for ii in range(0,len(Y)):
label = Y[ii]
self.nb_dict[label] += X[ii]
2. calculate the relative occurence of each word in this huge list, with the “calculate_relative_occurences” method. This method simply uses Python’s Counter module to count how much each word occurs and then divides this number with the total number of words. The result is saved in the dictionary nb_dict.
As we can see, it is easy to train the Naive Bayes Classifier. We simply calculate the relative occurence of each word per class, and save the result in the “nb_dict” dictionary.
This dictionary can be updated, saved to file, and loaded back from file. It contains the results of Naive Bayes Classifier training.
Classifying new documents is also done quite easily by calculating the class probability for each class and then selecting the class with the highest probability.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def classify_single_elem(self, X_elem):
Y_dict = {}
for label in self.labels:
class_probability = self.class_probabilities[label]
nb_dict_features = self.nb_dict[label]
for word in X_elem:
if word in nb_dict_features.keys():
relative_word_occurence = nb_dict_features[word]
class_probability *= relative_word_occurence
else:
class_probability *= 0
Y_dict[label] = class_probability
return self.get_max_value_key(Y_dict)
def classify(self, X):
self.predicted_Y_values = []
n = len(X)
for ii in range(0,n):
X_elem = X[ii]
prediction = self.classify_single_elem(X_elem)
self.predicted_Y_values.append(prediction)
return self.predicted_Y_values
Next blog:
In the next blog we will look at the results of this naively implemented algorithm for the Naive Bayes Classifier and see how it performs under various conditions; we will see the influence of varying training set sizes and whether the use of n-gram features will improve the accuracy of the classifier.