Skip to content

vranisch/spam-mail-filtering

Repository files navigation

📧 Spam Mail Filtering using Machine Learning & NLP

This project aims to classify email messages as Spam or Ham (Not Spam) using Natural Language Processing and machine learning models in Python.

We experimented with two different approaches:

1️⃣ LSTM Model built from scratch using PyTorch
2️⃣ Pretrained RoBERTa model from Hugging Face


📊 Results

Below are the accuracy results from each model:

Model Accuracy Notes
LSTM (PyTorch) ~88% Trained on cleaned spam/ham dataset
RoBERTa (HuggingFace) ~66% Stronger performance on unseen emails

🖼️ Model Performance Visuals

LSTM Performance

LSTM Results

RoBERTa Performance

RoBERTa Results


🗂 Dataset

Spam/Ham labeled dataset included in repository:
📄 spam_ham_dataset.csv


🔧 Project Files

File Description
mail_classification_sample.py Classify a single email
evaluate_LSTM.py Evaluate LSTM model performance
evaluate_model_with_RoBERTa.py Evaluate RoBERTa classifier
requirements.txt Dependencies

🧠 Models Used

🔹 LSTM Text Classifier

  • Tokenization, embedding, LSTM layers
  • Binary classification (spam vs ham)

🔹 RoBERTa-based Transformer

Using pretrained model from Hugging Face:

➡️ https://huggingface.co/roshana1s/spam-message-classifier

This model is fine-tuned for spam text detection.


📚 References / Sources

A large part of this project is inspired by:

Thanks to the authors for their great tutorials and models.


🚀 How to Run

conda activate mail_verific
pip install -r requirements.txt

python mail_classification_sample.py

About

spam mail filtering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages