Sentiment Analysis with the Naive Bayes Classifier

Posted on Posted in Machine Learning, Sentiment Analytics

From the introductionary blog we know that the Naive Bayes Classifier is based on the bag-of-words model.

With the bag-of-words model we check which word of the text-document appears in a positive-words-list or a negative-words-list. If the word appears in a positive-words-list the total score of the text is updated with +1 and vice versa. If at the end the total score is positive, the text is classified as positive and if it is negative, the text is classified as negative. Simple enough!

With the Naive Bayes model, we do not take only a small set of positive and negative words into account, but all words the NB Classifier was trained with, i.e. all words presents in the training set. If a word has not appeared in the training set, we have no data available and apply Laplacian smoothing (use 1 instead of the conditional probability of the word).
The probability a document belongs to a class C is given by the class probability  P(C) multiplied by the products of the conditional probabilities of each word  for that class.

P = P(C) \cdot \prod_i P(d_i|C) = P(C) \cdot \prod_i^n \frac{count(d_i, C)}{\sum_i count(d_i, C)} = P(C) \cdot \prod_i \frac{count(d_i, C)}{V_C}


Here count(d_i,C) is the number of occurences of word d_i in class C , V_C is the total number of words in class C and n is the number of words in the document we are currently classifying.
V_C does not change (unless the training set is expanded), so it can be placed outside of the product:

P = \frac{P(C)}{V_C^n} \cdot \prod_i^n count(d_i, C)


Implementing Naive Bayes Text Classification

With this information it is easy to implement a Naive Bayes Text Classifier. (Naive Bayes can also be used to classify non-text / numerical datasets, for an explanation see this notebook).

We have a NaiveBayesText class, which accepts the input values for X and Y as parameters for the “train()” method. Here X is a list of lists, where each lower level list contains all the words in the document. Y is a list containing the label/class of each document.


As we can see, the training of the Naive Bayes Classifier is done by iterating through all of the documents in the training set. From all of the documents, a Hash table (dictionary in python language) with the relative occurence of each word per class is constructed.

This is done in two steps:
1. construct a huge list of all occuring words per class:

2. calculate the relative occurence of each word in this huge list, with the “calculate_relative_occurences” method. This method simply uses Python’s Counter module to count how much each word occurs and then divides this number with the total number of words.
The result is saved in the dictionary nb_dict.


As we can see, it is easy to train the Naive Bayes Classifier. We simply calculate the relative occurence of each word per class, and save the result in the “nb_dict” dictionary.

This dictionary can be updated, saved to file, and loaded back from file. It contains the results of Naive Bayes Classifier training.


Classifying new documents is also done quite easily by calculating the class probability for each class and then selecting the class with the highest probability.


Next blog:

In the next blog we will look at the results of this naively implemented algorithm for the Naive Bayes Classifier and see how it performs under various conditions; we will see the influence of varying training set sizes and whether the use of n-gram features will improve the accuracy of the classifier.

Share This:

11 thoughts on “Sentiment Analysis with the Naive Bayes Classifier

  1. Thank you for the posting. But, in the part of your code, I don’t understand one part.

    When we try to do the test, if the word is not in our bag of words (training set), you mentioned we will use the Laplacian smoothing (use 1 instead of the conditional probability of the word). But in the code,
    class_probability *= 0
    it multiply by 0 not 1. Why you do this?

    1. Hi Jason,
      In earlier parts of the code I was using laplacian smoothing indeed, but changed it later on.
      Let me change it back when I have some time.
      Thanks for the heads up.

Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Verplichte velden zijn gemarkeerd met *