This project demonstrates the process of data preprocessing and feature engineering on the Titanic dataset. The goal is to clean and transform the data, making it suitable for machine learning models.
The dataset used in this project is the Titanic dataset, which contains information about the passengers aboard the Titanic. You can download it from Kaggle.
- Import the necessary libraries
- Load the dataset
- Explore the data
- Perform data preprocessing and feature engineering:
- Handle missing values
- Create new features
- Encode categorical variables
- Visualize the results
- Perform a simple unit test
- Pandas
- Matplotlib
- Seaborn
The project includes visualization using Matplotlib and Seaborn to help you understand the distribution of features and their relationships with the target variable (Survived).
A simple unit test is included to ensure the correctness of the data preprocessing and feature engineering steps. The test checks that the resulting DataFrame has the expected columns and the correct number of columns.