📧 Spam Mail Filtering using Machine Learning & NLP

This project aims to classify email messages as Spam or Ham (Not Spam) using Natural Language Processing and machine learning models in Python.

We experimented with two different approaches:

1️⃣ LSTM Model built from scratch using PyTorch
2️⃣ Pretrained RoBERTa model from Hugging Face

📊 Results

Below are the accuracy results from each model:

Model	Accuracy	Notes
LSTM (PyTorch)	~88%	Trained on cleaned spam/ham dataset
RoBERTa (HuggingFace)	~66%	Stronger performance on unseen emails

🖼️ Model Performance Visuals

LSTM Performance

RoBERTa Performance

🗂 Dataset

Spam/Ham labeled dataset included in repository:
📄 spam_ham_dataset.csv

🔧 Project Files

File	Description
`mail_classification_sample.py`	Classify a single email
`evaluate_LSTM.py`	Evaluate LSTM model performance
`evaluate_model_with_RoBERTa.py`	Evaluate RoBERTa classifier
`requirements.txt`	Dependencies

🧠 Models Used

🔹 LSTM Text Classifier

Tokenization, embedding, LSTM layers
Binary classification (spam vs ham)

🔹 RoBERTa-based Transformer

Using pretrained model from Hugging Face:

➡️ https://huggingface.co/roshana1s/spam-message-classifier

This model is fine-tuned for spam text detection.

📚 References / Sources

A large part of this project is inspired by:

Detecting spam emails using TensorFlow (GeeksForGeeks):
https://www.geeksforgeeks.org/nlp/detecting-spam-emails-using-tensorflow-in-python/
RoBERTa Spam Classifier (HuggingFace):
https://huggingface.co/roshana1s/spam-message-classifier

Thanks to the authors for their great tutorials and models.

🚀 How to Run

conda activate mail_verific
pip install -r requirements.txt

python mail_classification_sample.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
evaluate_LSTM.py		evaluate_LSTM.py
evaluate_model_with_RoBERTa.py		evaluate_model_with_RoBERTa.py
from_RoBERTa.png		from_RoBERTa.png
from_lstm.png		from_lstm.png
mail_classification_sample.py		mail_classification_sample.py
requirements.txt		requirements.txt
spam_ham_dataset.csv		spam_ham_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📧 Spam Mail Filtering using Machine Learning & NLP

📊 Results

🖼️ Model Performance Visuals

LSTM Performance

RoBERTa Performance

🗂 Dataset

🔧 Project Files

🧠 Models Used

🔹 LSTM Text Classifier

🔹 RoBERTa-based Transformer

📚 References / Sources

🚀 How to Run

About

Uh oh!

Releases

Packages

Languages

vranisch/spam-mail-filtering

Folders and files

Latest commit

History

Repository files navigation

📧 Spam Mail Filtering using Machine Learning & NLP

📊 Results

🖼️ Model Performance Visuals

LSTM Performance

RoBERTa Performance

🗂 Dataset

🔧 Project Files

🧠 Models Used

🔹 LSTM Text Classifier

🔹 RoBERTa-based Transformer

📚 References / Sources

🚀 How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages