Sentiment Analysis with the Naive Bayes Classifier

From the introductionary blog we know that the Naive Bayes Classifier is based on the bag-of-words model.

With the bag-of-words model we check which word of the text-document appears in a positive-words-list or a negative-words-list. If the word appears in a positive-words-list the total score of the text is updated with +1 and vice versa. If at the end the total score is positive, the text is classified as positive and if it is negative, the text is classified as negative. Simple enough!

With the Naive Bayes model, we do not take only a small set of positive and negative words into account, but all words the NB Classifier was trained with, i.e. all words presents in the training set. If a word has not appeared in the training set, we have no data available and apply Laplacian smoothing (use 1 instead of the conditional probability of the word).
The probability a document belongs to a class $C$ is given by the class probability $P(C)$ multiplied by the products of the conditional probabilities of each word for that class.

$P = P(C) \cdot \prod_i P(d_i|C) = P(C) \cdot \prod_i^n \frac{count(d_i, C)}{\sum_i count(d_i, C)} = P(C) \cdot \prod_i \frac{count(d_i, C)}{V_C}$

Here $count(d_i,C)$ is the number of occurences of word $d_i$ in class $C$ , $V_C$ is the total number of words in class $C$ and $n$ is the number of words in the document we are currently classifying.
$V_C$ does not change (unless the training set is expanded), so it can be placed outside of the product:

$P = \frac{P(C)}{V_C^n} \cdot \prod_i^n count(d_i, C)$

Implementing Naive Bayes Text Classification

With this information it is easy to implement a Naive Bayes Text Classifier. (Naive Bayes can also be used to classify non-text / numerical datasets, for an explanation see this notebook).

We have a NaiveBayesText class, which accepts the input values for X and Y as parameters for the “train()” method. Here X is a list of lists, where each lower level list contains all the words in the document. Y is a list containing the label/class of each document.

class NaiveBaseClass:
    def calculate_relative_occurences(self, list1):
        no_examples = len(list1)
        ro_dict = dict(Counter(list1))
        for key in ro_dict.keys():
            ro_dict[key] = ro_dict[key] / float(no_examples)
        return ro_dict

    def get_max_value_key(self, d1):
        values = d1.values()
        keys = d1.keys()
        max_value_index = values.index(max(values))
        max_key = keys[max_value_index]
        return max_key
        
    def initialize_nb_dict(self):
        self.nb_dict = {}
        for label in self.labels:
            self.nb_dict[label] = defaultdict(list)

class NaiveBayesText(NaiveBaseClass):
    """"
    When the goal is classifying text, it is better to give the input X in the form of a list of lists containing words.
    X = [
    ['this', 'is', 'a',...],
    (...)
    ]
    Y still is a 1D array / list containing the labels of each entry

    def initialize_nb_dict(self):
        self.nb_dict = {}
        for label in self.labels:
            self.nb_dict[label] = []
            
    def train(self, X, Y):
        self.class_probabilities = self.calculate_relative_occurences(Y)
        self.labels = np.unique(Y)
        self.no_examples = len(Y)
        self.initialize_nb_dict()
        for ii in range(0,len(Y)):
            label = Y[ii]
            self.nb_dict[label] += X[ii]
        #transform the list with all occurences to a dict with relative occurences
        for label in self.labels:
            self.nb_dict[label] = self.calculate_relative_occurences(self.nb_dict[label])

class NaiveBaseClass:

def calculate_relative_occurences(self, list1):

no_examples = len(list1)

ro_dict = dict(Counter(list1))

for key in ro_dict.keys():

ro_dict[key] = ro_dict[key] / float(no_examples)

return ro_dict

def get_max_value_key(self, d1):

values = d1.values()

keys = d1.keys()

max_value_index = values.index(max(values))

max_key = keys[max_value_index]

return max_key

def initialize_nb_dict(self):

self.nb_dict = {}

for label in self.labels:

self.nb_dict[label] = defaultdict(list)

class NaiveBayesText(NaiveBaseClass):

""""

When the goal is classifying text, it is better to give the input X in the form of a list of lists containing words.

X = [

['this', 'is', 'a',...],

(...)

]

Y still is a 1D array / list containing the labels of each entry

def initialize_nb_dict(self):

self.nb_dict = {}

for label in self.labels:

self.nb_dict[label] = []

def train(self, X, Y):

self.class_probabilities = self.calculate_relative_occurences(Y)

self.labels = np.unique(Y)

self.no_examples = len(Y)

self.initialize_nb_dict()

for ii in range(0,len(Y)):

label = Y[ii]

self.nb_dict[label] += X[ii]

#transform the list with all occurences to a dict with relative occurences

for label in self.labels:

self.nb_dict[label] = self.calculate_relative_occurences(self.nb_dict[label])

As we can see, the training of the Naive Bayes Classifier is done by iterating through all of the documents in the training set. From all of the documents, a Hash table (dictionary in python language) with the relative occurence of each word per class is constructed.

This is done in two steps:

1. construct a huge list of all occuring words per class:

        for ii in range(0,len(Y)):
            label = Y[ii]
            self.nb_dict[label] += X[ii]

for ii in range(0,len(Y)):

label = Y[ii]

self.nb_dict[label] += X[ii]

2. calculate the relative occurence of each word in this huge list, with the “calculate_relative_occurences” method. This method simply uses Python’s Counter module to count how much each word occurs and then divides this number with the total number of words.
The result is saved in the dictionary nb_dict.

As we can see, it is easy to train the Naive Bayes Classifier. We simply calculate the relative occurence of each word per class, and save the result in the “nb_dict” dictionary.

This dictionary can be updated, saved to file, and loaded back from file. It contains the results of Naive Bayes Classifier training.

Classifying new documents is also done quite easily by calculating the class probability for each class and then selecting the class with the highest probability.

    def classify_single_elem(self, X_elem):
        Y_dict = {}
        for label in self.labels:
            class_probability = self.class_probabilities[label]
            nb_dict_features = self.nb_dict[label]
            for word in X_elem:
                if word in nb_dict_features.keys():
                    relative_word_occurence = nb_dict_features[word]
                    class_probability *= relative_word_occurence
                else:
                    class_probability *= 0
            Y_dict[label] = class_probability
        return self.get_max_value_key(Y_dict)

    def classify(self, X):
        self.predicted_Y_values = []
        n = len(X)
        for ii in range(0,n):
            X_elem = X[ii]
            prediction = self.classify_single_elem(X_elem)
            self.predicted_Y_values.append(prediction)  
        return self.predicted_Y_values

def classify_single_elem(self, X_elem):

Y_dict = {}

for label in self.labels:

class_probability = self.class_probabilities[label]

nb_dict_features = self.nb_dict[label]

for word in X_elem:

if word in nb_dict_features.keys():

relative_word_occurence = nb_dict_features[word]

class_probability *= relative_word_occurence

else:

class_probability *= 0

Y_dict[label] = class_probability

return self.get_max_value_key(Y_dict)

def classify(self, X):

self.predicted_Y_values = []

n = len(X)

for ii in range(0,n):

X_elem = X[ii]

prediction = self.classify_single_elem(X_elem)

self.predicted_Y_values.append(prediction)

return self.predicted_Y_values

Next blog:

In the next blog we will look at the results of this naively implemented algorithm for the Naive Bayes Classifier and see how it performs under various conditions; we will see the influence of varying training set sizes and whether the use of n-gram features will improve the accuracy of the classifier.

13 gedachten over “Sentiment Analysis with the Naive Bayes Classifier”

Pingback: Regression, Logistic Regression and Maximum Entropy – Ahmet Taspinar
Pingback: Regression, Logistic Regression and Maximum Entropy part 2 (code + examples) – Ahmet Taspinar
Pingback: Logistic Regression and Maximum Entropy explained with examples and code - CloudJocks
counterstrike schreef:

mei 30, 2016 om 4:57 pm

Great looking site. Think you did a great deal of your ownyour very own coding

Beantwoorden
Chupacabra schreef:

juni 23, 2016 om 10:57 am

Could you please teach me how to do the prediciton for a new article instead of doing evalution by using the test data with corresponding categoires?

Beantwoorden
Williamdida schreef:

juni 24, 2016 om 7:27 am

Really Cool. Bednarczyk

Beantwoorden
Pingback: Regression, Logistic Regression and Maximum Entropy – WebProfIT Consulting
Pingback: The Perceptron Algorithm explained with Python code – Cloud Data Architect
Pingback: The Perceptron – Ahmet Taspinar
Jason schreef:

februari 14, 2017 om 6:20 pm

Thank you for the posting. But, in the part of your code, I don’t understand one part.

When we try to do the test, if the word is not in our bag of words (training set), you mentioned we will use the Laplacian smoothing (use 1 instead of the conditional probability of the word). But in the code,
else:
class_probability *= 0
it multiply by 0 not 1. Why you do this?

Beantwoorden
1. admin schreef:
  
  februari 24, 2017 om 10:44 am
  
  Hi Jason,
  In earlier parts of the code I was using laplacian smoothing indeed, but changed it later on.
  Let me change it back when I have some time.
  Thanks for the heads up.
  
  Beantwoorden
Pingback: Classification with Scikit-Learn – Ahmet Taspinar
Pingback: DataSciencePython – Pythonを使用した一般的なデータ解析と機械学習のタスク – GitHubじゃ！Pythonじゃ！

ML Fundamentals

Sentiment Analysis with the Naive Bayes Classifier

Implementing Naive Bayes Text Classification

Next blog:

13 gedachten over “Sentiment Analysis with the Naive Bayes Classifier”

Geef een antwoord Reactie annuleren

Implementing Naive Bayes Text Classification

Next blog:

Delen:

13 gedachten over “Sentiment Analysis with the Naive Bayes Classifier”

Geef een antwoord Reactie annuleren