Skip to content

KrisameReimu/Password-Strength-Classification_ex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EIE4121 Password Strength Classification Project

Project Overview

This is a mini-project for the EIE4121 Machine Learning for Cyber-security course at The Hong Kong Polytechnic University. The goal is to develop a machine learning model to classify password strength on a scale from 0 to 4, where 0 represents very weak passwords and 4 represents very strong passwords.

Team Members

  • 21106181D Chen Chen
  • JamesHsu-porcupine

Repository Structure

├── data/                      # Password datasets
│   └── password_Set1.csv      # Main dataset file
├── docs/                      # Project documentation and reports
│   ├── MiniProject.doc        # Project document (Word format)
│   └── Miniproject.pdf        # Project document (PDF format)
├── model/MachineLearning/     # Traditional ML models
│   ├── KNN                    # K-Nearest Neighbors model
│   └── RF                     # Random Forest model
├── notebooks/DeepLearning/    # Deep learning implementation notebooks
│   ├── GP19project_EIE4121_DEEPLEA...  # Deep learning implementation
│   └── GP19project_EIE4121_EDA_Che...  # EDA notebook
├── .gitattributes             # Git attributes file
└── README.md                  # This file

Project Description

This project focuses on developing and comparing different machine learning approaches for password strength classification. We implement both traditional machine learning algorithms (KNN, Random Forest) and deep learning models to classify passwords into five strength categories.

Dataset

The dataset (password_Set1.csv) contains password samples with the following features:

  • password: The password string
  • strength: Password strength level (0-4)
    • 0: Very Weak
    • 1: Weak
    • 2: Average
    • 3: Strong
    • 4: Very Strong

Methodology

Our approach involves:

  1. Exploratory Data Analysis (EDA) to understand password characteristics
  2. Feature Engineering to extract meaningful features from passwords:
    • Length, character diversity, entropy
    • Character type counts and ratios
    • Pattern detection (sequential and repeated characters)
  3. Model Implementation:
    • Traditional ML: K-Nearest Neighbors, Random Forest
    • Deep Learning: Hybrid CNN-LSTM model with character embeddings

Deep Learning Model

Our deep learning approach combines:

  • Character-level embeddings to capture semantic information
  • CNN layers to detect local patterns
  • LSTM layers to understand sequential patterns
  • Numerical features to incorporate password characteristics
  • Class weighting to handle imbalanced data

Results

Performance metrics for each model are evaluated using:

  • Accuracy
  • Precision, Recall, F1-score
  • Confusion matrix
  • Per-class performance

Usage

To use the notebooks:

  1. Clone the repository
  2. Install required dependencies:
    pip install pandas numpy tensorflow scikit-learn matplotlib seaborn
    
  3. Run the notebooks in the following order:
    • EDA notebook
    • Deep learning implementation

Future Work

  • Implement ensemble methods
  • Explore additional feature engineering techniques
  • Optimize hyperparameters
  • Develop a user-friendly password strength checker

References

  • Course materials from EIE4121
  • Relevant research papers on password strength classification
  • Documentation for scikit-learn, TensorFlow, and other libraries used

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •