Complete the tasks below. Please turn in a single Jupyter notebook named 6_first_last.ipynb (substitute your first and last name). Please run Kernel > Restart & Run All on your notebook before turning in.
For this assignment, we will use Pandas to examine metadata from the Earth Microbiome Project.
First, download the metadata file for a 2,000-sample subset of the >27,000 samples in the Release 1 16S rRNA dataset.
curl -O "ftp://ftp.microbio.me/emp/release1/mapping_files/emp_qiime_mapping_subset_2k.tsv"
- Import the tab-separated values file
emp_qiime_mapping_subset_2k.tsvas a DataFrame calleddfwith default data types, with the first row as column labels (columns) and the first column as row labels (indexes). - The indexes should be the sample IDs. How many samples are in this DataFrame? How many metadata columns?
- What are the minimum and maximum pH values in the dataset?
- What are the average and standard deviation temperature values in the dataset?
- Make a new Series called
tempwith the temperature column as its own Series object. Remove NaN values (np.nan) from this Series. How many values are left? - Make a new DataFrame called
df_seqsfrom columnssequences_split_librariesthroughobservations_deblur_150bp(column positions 17-23) of the existing DataFrame. What is the mean value of columnobservations_deblur_90bp? - Save
df_seqsas a csv file.
- Store the first 5 rows of
df_seqsas a new dataframe calleddf_seqs_head. Store the last 5 rows ofdf_seqsas a new dataframe calleddf_seqs_tail. - Concatenate
df_seqs_headanddf_seqs_tailusing theconcat()function. - Append
df_seqs_tailtodf_seqs_headusing theappend()function. - Make a new DataFrame called
df_physwith the pH, temperature, and salinity columns (hint: you will need to know the exact column names; these are some of the last few columns). Make another new DataFrame calleddf_empowith the columnempo_3(note: this will actually be a Series because it has only one column, but you can treat it like a DataFrame). - Merge
df_physwithdf_seqsusing themerge()function with the indexes of both DataFrames to make a new DataFrame calleddf_merged. - Join
df_mergedwithdf_empousing thejoin()function and store the result asdf_merged.
- Use a list comprehension to add a new column to
df_mergedcalledtemperature_deg_fwhich takes the values intemperature_deg_cand converts them to degrees Fahrenheit. - Create a function that changes a single numerical value from Celsius to Fahrenheit. Apply this function to the values in
temperature_deg_cusingapply()to create a new column calledtemperature_deg_f_2.
- Sort the rows in
df_mergedbysequences_split_librariesvalues from high to low and store the result asdf_merged. (Hint: you can useinplace=True.) - Sort the columns in
df_mergedby column name from A to Z and store the result asdf_merged. (Hint: you can useinplace=True.)