2

I am currently working through the book Hands On Machine Learning and am trying to replicate a visualization where we plot the lat and lon co-ordinates on a scatter plot of San Diego. I have taken the plot code from the book which uses the code below (matplotlib method). I would like to replicate the same visualization using plotnine. Could someone help me with the translation.

matplotlib method

# DATA INGEST -------------------------------------------------------------    
# Import the file from github
url = "https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.csv" # Make sure the url is the raw version of the file on GitHub
download = requests.get(url).content

# Reading the downloaded content and turning it into a pandas dataframe
housing = pd.read_csv(io.StringIO(download.decode('utf-8')))

# Then plot
import matplotlib.pyplot as plt

# The size is now related to population divided by 100
# the colour is related to the median house value
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4, 
              s=housing["population"]/100, label="population", figsize=(10,7),
              c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True)
plt.legend()
plt.show()

plotnine method

from plotnine import ggplot, geom_point, aes, stat_smooth, scale_color_cmap

# Lets try the same thing in ggplot
(ggplot(housing, aes('longitude', 'latitude', size = "population", color = "median_house_value"))
 + geom_point(alpha = 0.1)
 + scale_color_cmap(name="jet"))
 

1 Answer 1

2

If your question was the colour mapping, then you were close: just needed cmap_name='jet' instead of name='jet'.

If it is a broader styling thing, below is close to what you had with matplotlib.

matplotlib method

enter image description here

plotline method enter image description here

p = (ggplot(housing, aes(x='longitude', y='latitude', size='population', color='median_house_value'))
  + theme_matplotlib()
  + geom_point(alpha=0.4)
  + annotate('text', x=-114.6, y=42, label='population', size=8)
  + annotate('point', x=-115.65, y=42, size=5, color='#6495ED', fill='#6495ED', alpha=0.8)
  + labs(x=None, color='Median house value')
  + scale_y_continuous(breaks=np.arange(34,44,2))
  + scale_color_cmap(cmap_name='jet')
  + scale_size_continuous(range=(0.05, 6))
  + guides(size=False)
  + theme(
        text = element_text(family='DejaVu Sans', size=8),
        axis_text_x = element_blank(),
        axis_ticks_minor=element_blank(),
        legend_key_height = 34,
        legend_key_width = 9,        
  )
 )
p

I am not sure to what capacity it's possible to modify the formatting of colour bar in plotnine. If others have additional ideas, I would be most interested - I think the matplotlib colour bar looks nicer.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.