Consumer Complaint Classification Using Python

The task of consumer complaint classification falls under Natural Language Processing (NLP) and Multiclass Classification. To approach this problem, we need a dataset that contains real consumer complaints. We found a suitable dataset that includes:

The type of complaint reported by the consumer
The specific issue highlighted in the complaint
The complete description of the consumer’s grievance

This dataset can be leveraged to build a Machine Learning model capable of classifying consumer complaints in real time. In the next section, we’ll walk through how to classify consumer complaints using Machine Learning with Python.

Table of Contents

Consumer Complaint Classification using Python

Let’s begin the consumer complaint classification task by importing the required Python libraries and loading the dataset:

from google.colab import drive
drive.mount('/content/drive')

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import SGDClassifier
import nltk
import re
from nltk.corpus import stopwords
import string

data=pd.read_csv("/content/drive/MyDrive/consumercomplaints.csv")
print(data.head())

The dataset includes an unnecessary ‘Unnamed’ column, which we’ll remove before moving forward

data=data.drop("Unnamed: 0", axis=1)
print(data.head())

Next, let’s check the dataset to see if it contains any null values:

print(data.isnull().sum())

The dataset has a significant number of null values, so we’ll drop all rows containing null entries before proceeding:

data=data.dropna()
print(data.isnull().sum())

The product column in the dataset holds the labels, which represent the type of complaints reported by consumers. Let’s examine all the labels along with their frequencies:

print(data["Product"].value_counts())

Training the Consumer Complaint Classification Model

The consumer complaint narrative column contains the full text descriptions of complaints submitted by consumers. Before feeding this data into a Machine Learning model, we’ll clean and preprocess the text to ensure it’s ready for training.

nltk.download('stopwords')
stemmer=nltk.SnowballStemmer("english")
stopword=set(stopwords.words("english"))
def clean(text):
  text=str(text).lower()
  text=re.sub('\[.*?\]', '', text)
  text=re.sub('https?://\S+|www\.\S+', '', text)
  text=re.sub('<.*?>+', '', text)
  text=re.sub('[%s]' % re.escape(string.punctuation), '', text)
  text=re.sub('\n', '', text)
  text=re.sub('\w*\d\w*', '', text)
  text=[word for word in text.split(' ') if word not in stopword]
  text=" ".join(text)
  text=[stemmer.stem(word) for word in text.split(' ')]
  text=" ".join(text)
  return text
data["Consumer complaint narrative"] = data["Consumer complaint narrative"].apply(clean)

Next, we’ll split the dataset into training and testing sets:

data = data[["Consumer complaint narrative", "Product"]]
x = np.array(data["Consumer complaint narrative"])
y = np.array(data["Product"])

cv = CountVectorizer()
X = cv.fit_transform(x)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.33, 
                                                    random_state=42)

Now, we’ll train our Machine Learning model using the Stochastic Gradient Descent (SGD) classifier:

sgdmodel = SGDClassifier()
sgdmodel.fit(X_train,y_train)

Next, we’ll use our trained model to generate predictions:

user = input("Enter a Text: ")
data = cv.transform([user]).toarray()
output = sgdmodel.predict(data)
print(output)

Enter a Text: On 20/08/2025, I called BOA Credit Card Customer Service at 10123657894. I clearly stated that I did not want to continue paying the ${120.00} annual membership fee and wanted to cancel my credit card account.  The customer service representative persuaded me to keep the account open by offering a promotion: if I paid the ${120.00} membership fee and spent ${1500.00} within 3 months, I would receive [77,000 reward points / cash back / miles]. Trusting this offer, I paid the membership fee on 17/08/2025 and met the spending requirement well within the 3-month period.  On 27/08/2025, I contacted customer service again regarding the promised reward. However, I was informed that I would only receive [much fewer points / different rewards] than what was promised. This directly contradicts what I was told when I agreed to keep the account.  I believe this is a clear case of misrepresentation and deceptive business practice. The company failed to honor its commitment, and I feel cheated by the misleading information provided by their customer service.

user = input("Enter a Text: ")
data = cv.transform([user]).toarray()
output = sgdmodel.predict(data)
print(output)

Enter a Text: It has been well over 30 days since I filed my dispute, yet no corrections have been made to my credit report. There are still false, misleading, and inaccurate items listed, even though I submitted clear documentation to prove the errors. This is a direct violation of my rights under the law, and the evidence I provided is undeniable. I expect immediate action and compliance with federal regulations.

Conclusion

By following the steps outlined here, from data cleaning, preprocessing, to model training and evaluation, you can build a robust system to classify consumer complaints effectively. Such models not only streamline the handling of large volumes of feedback but also help organizations identify key issues faster, ultimately improving customer service and operational efficiency.

The same NLP pipeline is highly adaptable and can be applied to many other real-world problems. For example, similar techniques are used in detecting YouTube spam comments, where models are trained to separate genuine conversations from unwanted or malicious spam. This highlights the versatility of text classification methods and their potential to enhance user experience across industries when combined with continuous monitoring and regular model updates.

Consumer Complaint Classification using Python

Training the Consumer Complaint Classification Model

Conclusion

Leave a Comment Cancel reply