0

This is my where I got my data set from here. When I fist read in the data set, and do the head function to double check and this is the output I get the first 7 variable shown, but then I'll get

Variables not shown: Status 11/18 (chr), Location 11/18 (chr), Age 11/18 (dbl), age grp (chr), Gender (chr),
  Ethnic (chr), Prev Relig Aff (chr), Adult/Minor (chr), Resident US Pre-Guyana (chr), Occup US Pre Guyana
  (chr), Govt Income (chr), JT Residence (chr), Occup in JT ~77 (chr), Occup JT ~Aug 78 (chr)

Here is my code

library(readxl)
require(mosaic)
Jonestown = read_excel("C:/Users/Deborah/Desktop/School/STA 418/Homework/jonestown.xls", sheet = 1, col_names = TRUE, skip=0)
head(Jonestown)

Next, I need to create a data set named minors that includes only
(a) people identified as a minor
(b) who were born in the United States and
(c) has only the variables Birth State, Guyana Entry, Status 11/18, Age 11/18, Gender and Ethnic. You should end up with 293 observations and 6 variables. This is what I have so far

minor = Jonestown$`Adult/Minor`=="Minor" & Jonestown$`Birth Country`=="USA"
Minors = Jonestown[minor,]

I am not show where to go next. Can someone help me?

2 Answers 2

1

Use this package instead. Works for me:

install.packages("xlsx")#Excel
require("xlsx")#Excel
read.xlsx("C:/Users/Deborah/Desktop/School/STA 418/Homework/jonestown.xls",1) 
Sign up to request clarification or add additional context in comments.

Comments

0

First, the output you first question ("Variables not shown") is perfectly normal and expected when dealing with read_excel. More the point, you may notice (with experimentation) that read_excel returns a class tbl_df, something from the Hadleyverse of packages that intends to provide a more intuitive/graceful presentation and handling of data.frames. It tries very hard to limit the output or "peek" at the contents of a data.frame based on the width (number of characters) of your current window. (Of course, it only does this if you have previously done library(dplyr).)

Second, for filtering, since you're already using one component of the Hadleyverse, I'll suggest a second (though it is not strictly necessary here):

library(dplyr)
dat %>%
    filter(`Adult/Minor` == 'Minor', `Birth Country` == 'USA') %>%
    select(`Birth State`, `Guyana Entry`)
## Source: local data frame [293 x 2]
##    Birth State Guyana Entry
##          (chr)       (time)
## 1           CA         <NA>
## 2           MI   1977-09-23
## 3           MS   1977-08-28
## 4           CA         <NA>
## 5           CA   1977-07-23
## 6           CA         <NA>
## ...

This is just a start, you should be able to bring in the other variables you want. I strongly recommend a dplyr tutorial, such as the vignette that comes with it.

NB: because the column names are not strictly legal (too harsh a word?) for R (see ?data.frame and ?make.names), you need to use the backtick instead of single or double quotes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.