Password Strength Checker

Password Strength Checker is a tool designed to evaluate the strength of a password. Many modern password strength meters even leverage machine learning algorithms to predict how secure a password is. In this article, we’ll walk through the process of building a password strength checker with machine learning in Python.

Password Strength Checker Using Python

A password strength checker evaluates the mix of digits, letters, and special characters used in a password. It is built by training a machine learning model on a labeled dataset containing various password combinations. From the data, the model learns to distinguish which patterns can be classified as strong or weak passwords.

To build an application that checks password strength, we need a labeled dataset containing different combinations of letters and symbols. We found a dataset on Kaggle to train a machine learning model that predicts password strength. This dataset can be used for our task, and you can download it from here.

Let’s get started by loading the necessary Python libraries and importing the dataset that we’ll use to build our password strength checker:

from google.colab import drive
drive.mount('/content/drive')

file_id="13wPr5IZyoFBp2uJYVZCFmego4eJd8wse"
url=f"https://drive.google.com/uc?id={file_id}"

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

data=pd.read_csv(url)
print(data.head())

The dataset contains two columns: password and strength. In the strength column:

0 → Weak password
1 → Medium password
2 → Strong password

Before proceeding, we’ll replace the numeric values (0, 1, 2) with their corresponding labels: weak, medium, and strong.

data=data.dropna()
data["Strength"]=data["Strength"].map({0:"Weak",
                                       1:"Medium",
                                       2:"Strong"})
print(data.sample(5))

Password Strength Prediction Model

Now, let’s train a machine learning model to predict password strength. Before building the model, we need to tokenize the passwords so it can learn from the combinations of digits, letters, and symbols. After tokenization, we’ll split the dataset into training and test sets to prepare it for model training.

def word(password):
  character=[]
  for i in password:
    character.append(i)
  return character

x=np.array(data["Password"])
y=np.array(data["Strength"])

tdif=TfidfVectorizer(tokenizer=word)
x=tdif.fit_transform(x)
xtrain, xtest, ytrain, ytest=train_test_split(x,y,random_state=42,test_size=0.05)

Now, let’s train a classification model that can predict the strength of a password:

model = RandomForestClassifier()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))

Now that our model is trained, we can test it by passing in a sample password and checking how the model classifies its strength. Let’s see how to do this step by step:

import getpass
user=getpass.getpass("Enter Passowrd:")
data=tdif.transform([user]).toarray()
output=model.predict(data)
print(output)

Conclusion

In conclusion, we successfully built a password strength checker using Python and machine learning. By preparing a labeled dataset, tokenizing the passwords, and training a classification model, we created a system capable of predicting whether a password is weak, medium, or strong. This project not only demonstrates the practical use of machine learning in cybersecurity but also opens the door to enhancements such as experimenting with advanced models, adding more password features, or integrating the checker into real-world applications.




Leave a Comment