Complete the tasks below. Please turn in a single Jupyter notebook named 8_first_last.ipynb (substitute your first and last name). Please run Kernel > Restart & Run All on your notebook before turning in.
For this assignment, we will use a file containing daily precipitation data in La Jolla from February 2009 to November 2018, which were downloaded from NOAA using "standard" (imperial) units (inches for precipitation, feet for elevation). Download the file from GitHub.
- Import the CSV file as a Pandas DataFrame with default header and index.
- Change the 'DATE' column to timestamps using
pd.datetime(). Alternatively, import the CSV file withparse_datesset to the columns you want to parse as datetime. - What was the maximum daily precipitation (in inches) during this time period and when was it?
- We don't need the columns 'SNOW' and 'SNOW_ATTRIBUTES' because there was no recorded snow in the dataset. Delete those columns "in place".
- Find out about the sampling stations. Notice that the column values are similar between rows except 'DATE' and 'PRCP'. Explore these other columns using three different commands (we may not have covered all of them yet, but they are easy to use and you can always google them):
- Use the
value_counts()method for each of these columns: 'STATION', 'NAME', 'LATITUDE', 'LONGITUDE', and 'ELEVATION'. There should only be one cateogory for each calculation because all the data come from the same station. To see what output looks like for a more diverse series, use thevalue_counts()method on 'PRCP' and 'PRCP_ATTRIBUTES'. - Make a DataFrame with just the columns 'STATION', 'NAME', 'LATITUDE', 'LONGITUDE', and 'ELEVATION' and use the method
drop_duplicates()to see all the unique combinations of values in those five columns. - Create a groupby object using
groupby('STATION'), then use thecount()method on that groupby object to count the number of values for each station.
- Use the
- Create columns for 'YEAR', 'MONTH', 'DAY', and 'DAY_OF_YEAR'. Hint: use attributes such as the
.yearattribute, or use a regular expression, with a list comprehension.
- Use the Matplotlib function
plot()to plot precipitation versus date as a single line, using a single color from ColorBrewer or xkcd (via Seaborn). Save your plot as a PDF file. - Plot precipitation versus day of year for each year separately, with each year as a different colored line, the x-axis going from the beginning to the end of the year, and a legend indicating the year. Hint: use the 'DAY_OF_YEAR' and 'YEAR' columns you created in B3. Color using a qualitative color palette from Seaborn or ColorBrewer (via Seaborn). Save your plot as a PDF file.
- Create a set of subplots with a grid of 2 columns and enough rows to have one subplot per year (e.g., to show 10 years, your set of subplots would be 5 by 2). Plot precipitation versus day of year with each year as a separate subplot. Save your figure as a PDF file.
- Plot a histogram of precipitation values using the Matplotlib function
hist(). - Plot a histogram with kernel density and rugplot with the Seaborn function
distplot(). Play around with the settings to make a histogram that represents the data well. - Use groupby to group the data by year or by month. Which year was the rainiest? Which month was the rainiest?
- Make boxplots by year and by month using the Seaborn function
boxplot(). Hint: If you make a boxplot of your DataFrame without grouping, the boxplots will be centered on zero, because there are so many days with zero precipitation. Instead, use groupby to group the data by year and month (use a list containing these columns), setas_index=False, average over those groups, and save this as a new DataFrame; then use this for your boxplots.
- Use
pivot_table()to produce a new DataFrame, where rows=years and columns=months, containing the mean precipitation of each month. - Draw a heatmap of years x months where each square is a month colored by mean precipitation. Adjust the colormap to highlight months with heavy precipitation. Hint: Seaborn's
heatmap()function makes this very easy. - Stack the monthly precipitation table using
stack(). View the values for 2017. Which month had the most precipitation in 2017? Describe the distribution statistics for all months usingdescribe(). How many months are included in this dataset? What was the median month (daily average) value in this time period? - Use
unstack()to stack precipitation pivot table by month and then year. View the values for December. Which year had the wettest December?