Skip to content

nadinejackson1/titanic-data-preprocessing-feature-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

100DaysofML-Day11

Titanic Data Preprocessing and Feature Engineering

This project demonstrates the process of data preprocessing and feature engineering on the Titanic dataset. The goal is to clean and transform the data, making it suitable for machine learning models.

Dataset

The dataset used in this project is the Titanic dataset, which contains information about the passengers aboard the Titanic. You can download it from Kaggle.

Steps

  1. Import the necessary libraries
  2. Load the dataset
  3. Explore the data
  4. Perform data preprocessing and feature engineering:
    • Handle missing values
    • Create new features
    • Encode categorical variables
  5. Visualize the results
  6. Perform a simple unit test

Dependencies

  • Pandas
  • Matplotlib
  • Seaborn

Visualization

The project includes visualization using Matplotlib and Seaborn to help you understand the distribution of features and their relationships with the target variable (Survived).

Unit Test

A simple unit test is included to ensure the correctness of the data preprocessing and feature engineering steps. The test checks that the resulting DataFrame has the expected columns and the correct number of columns.