Credit Card Clustering Using Python

Credit card clustering, also known as credit card segmentation, is the process of grouping credit card holders based on their buying habits, credit limits, and other financial factors. This type of clustering analysis helps businesses identify potential customers and design effective marketing strategies.

In this article, we will walk through the task of credit card clustering using Machine Learning in Python. For this, we will use credit card dataset for clustering containing the buying history and key financial features of credit card holders, which provides enough information to perform meaningful clustering analysis.

Table of Contents

Credit Card Clustering Using Python

We’ll begin our credit card clustering analysis by importing the required Python libraries and loading the dataset:

import pandas as pd
import numpy as np
from sklearn import cluster

data=pd.read_csv("/Users/rahul_anand/Downloads/Credit card data/CC GENERAL.csv")
print(data.tail())

Before proceeding, let’s verify if the dataset contains any null values:

data.isnull().sum()

The dataset contains some null values in the minimum payments column. We will drop those rows and proceed with the analysis:

data=data.dropna()
data.isnull().sum()

For the task of credit card segmentation, three key features from the dataset stand out as highly valuable:

BALANCE: The remaining balance in the accounts of credit card customers.
PURCHASES: The total amount of purchases made by customers.
CREDIT_LIMIT: The maximum limit available on the credit card.

These features capture essential insights into customers’ buying behavior, account balances, and credit capacity, making them sufficient for creating meaningful clusters.

Let’s move forward by using these features to build the clusters:

clustering_data=data[["BALANCE", "PURCHASES", "CREDIT_LIMIT"]]
from sklearn.preprocessing import MinMaxScaler
for i in clustering_data.columns:
    MinMaxScaler(i)
    
from sklearn.cluster import KMeans
kmeans=KMeans(n_clusters=5)
clusters=kmeans.fit_predict(clustering_data)
data["CREDIT_CARD_SEGMENTS"]=clusters

A new column named “CREDIT_CARD_SEGMENTS” has been added to the dataset, which contains labels representing groups of credit card customers. These clusters are numbered from 0 to 4. For better clarity, we will rename these clusters with more descriptive labels:

data["CREDIT_CARD_SEGMENTS"]=data["CREDIT_CARD_SEGMENTS"].map({0:"Cluster 1",
                                                              1:"Cluster 2",
                                                              2:"Cluster 3",
                                                              3:"Cluster 4",
                                                              4:"Cluster 5"})
print(data["CREDIT_CARD_SEGMENTS"].head(10))

Now, let’s visualize the credit card clusters obtained from our clustering analysis:

import plotly.graph_objects as go
PLOT=go.Figure()
for i in list(data["CREDIT_CARD_SEGMENTS"].unique()):
    PLOT.add_trace(go.Scatter3d(x=data[data["CREDIT_CARD_SEGMENTS"]==i]['BALANCE'],
                               y=data[data["CREDIT_CARD_SEGMENTS"]==i]['PURCHASES'],
                               z=data[data["CREDIT_CARD_SEGMENTS"]==i]['CREDIT_LIMIT'],
                               mode='markers', marker_size=6, marker_line_width=1,
                               name=str(i)))
PLOT.update_traces(hovertemplate='BALANCE: %{x} 
PURCHASES %{y} 
DCREDIT_LIMIT: %{z}')

PLOT.update_layout(width=800, height=800, autosize=True, showlegend=True,
                  scene=dict(xaxis=dict(title='BALANCE', titlefont_color='black'),
                            yaxis=dict(title='PURCHASES', titlefont_color='black'),
                            zaxis=dict(title='CREDIT_LIMIT', titlefont_color='black')),
                  font=dict(family="Gilroy", color='black', size=12))

import plotly.io as pio
pio.renderers.default = "notebook" 

PLOT.show()

Cluster Interpretations

Cluster 1 (Blue – bottom left, small group)

Low balance, low purchases, low credit limit
Likely inactive or low-usage customers.
Business insight: These may be dormant accounts, customers who rarely use their cards, or people with poor credit histories. Banks may want to encourage usage with cashback offers or minimum-spend rewards.

Cluster 2 (Purple – large dense group)

Moderate balance, moderate purchases, moderate-to-high credit limit
Represents the average customer segment.
Business insight: These are stable, mid-range spenders who form the core customer base. They’re consistent but may not bring in very high profits individually. Retention strategies and loyalty rewards would work well here.

Cluster 3 (Orange – spread out, higher along BALANCE and PURCHASES)

High balance, high purchases, moderate-to-high credit limit
These are premium customers who spend more and keep higher balances.
Business insight: They are very profitable, prime candidates for exclusive offers, premium cards, and personalized financial services. Retaining them is critical.

Cluster 4 (Red – small, close to Cluster 2)

Low balance, low purchases, low credit limit (similar to Cluster 1 but slightly different profile).
Likely new customers or financially constrained users.
Business insight: They could be upsold into higher usage through introductory offers or low-risk credit extensions.

Cluster 5 (Green – moderately spread out)

Moderate balance, high purchases, high credit limit
Represents active, high-spending customers with strong credit profiles.
Business insight: They are profitable, reliable customers who may respond well to upgrades, investment products, or travel/rewards cards.

Overall Insights: The bank’s strategy could focus on:

Retaining Cluster 3 & Cluster 5 (high-value customers).
Growing usage in Cluster 2 (average customers).
Activating Cluster 1 & Cluster 4 (low-usage or dormant customers).

Conclusion

In conclusion, the credit card clustering project using Python provides valuable insights into customer segmentation by analyzing key financial features such as balance, purchases, and credit limit. By applying clustering techniques, businesses can identify distinct customer groups, enabling tailored marketing strategies and personalized services. This approach not only enhances customer satisfaction but also optimizes resource allocation and improves overall business performance. The project’s methodology and findings underscore the importance of data-driven decision-making in the financial sector.

Credit Card Clustering Using Python

Cluster Interpretations

Conclusion

Leave a Comment Cancel reply