0

I have a pandas dataframe df, it has a column which I use to create a color code column for matplotlib this way

df['color-code'] = np.where(df['Community School?']=='Yes', 'blue', 'red')

I also create a separate dataframe to use without null values for plotting

sc_income = df[~df['Economic Need Index'].isnull() & ~df['School Income Estimate'].isnull()]

Then I plot it using

#make plot bigger
plt.rcParams['figure.figsize'] = (40,20)

#plot Economic Need Index vs School Income Estimate
scatter(sc_income['Economic Need Index'], sc_income['School Income Estimate'], c=sc_income['color-code'])
plt.xlabel('Economic Need')
plt.ylabel('School Income $')

plt.title('Economic Need vs. School Income')
plt.legend()
plt.show()

Final plot looks like this

The legend that's needed though should specify blue means community school, red means not community school.

enter image description here

1

1 Answer 1

1

You try to colour point by groups. There are many ways to do it. Using matplotlib:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# generate data
n_obs = 100
df = pd.DataFrame({'Community School?': np.random.choice(['Yes', 'No'], size=n_obs),
                   'Economic Need Index': np.random.uniform(size=n_obs),
                   'School Income Estimate': np.random.normal(loc=n_obs, size=n_obs)})

# your data pre-processing steps
df['color-code'] = np.where(df['Community School?']=='Yes', 'blue', 'red')
sc_income = df[~df['Economic Need Index'].isnull() & ~df['School Income Estimate'].isnull()]

# plot Economic Need Index vs School Income Estimate by group
groups = sc_income.groupby('Community School?')

fig, ax = plt.subplots(1, figsize=(40,20))

for label, group in groups:
    ax.scatter(group['Economic Need Index'], group['School Income Estimate'], 
               c=group['color-code'], label=label)

ax.set(xlabel='Economic Need', ylabel='School Income $', 
       title='Economic Need vs. School Income')
ax.legend(title='Community School?')
plt.show()

Or using seaborn and pairplot for example:

g = sns.pairplot(x_vars='Economic Need Index', y_vars='School Income Estimate', data=sc_income, 
                 hue="Community School?", size=5)
g.set(xlabel='Economic Need', ylabel='School Income $', title='Economic Need vs. School Income')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.