61

I have a data frame with a character column of date-times.

When I use as.Date, most of my strings are parsed correctly, except for a few instances. The example below will hopefully show you what is going on.

# my attempt to parse the string to Date -- uses the stringr package
prods.all$Date2 <- as.Date(str_sub(prods.all$Date, 1, 
                str_locate(prods.all$Date, " ")[1]-1), 
                "%m/%d/%Y")

# grab two rows to highlight my issue
temp <- prods.all[c(1925:1926), c(1,8)]
temp
#                    Date      Date2
# 1925  10/9/2009 0:00:00 2009-10-09
# 1926 10/15/2009 0:00:00 0200-10-15

As you can see, the year of some of the dates is inaccurate. The pattern seems to occur when the day is double digit.

Any help you can provide will be greatly appreciated.

1
  • 1
    The reason you are getting the invalid 0200 date is that the character lengths of the day are different (two digits for 15-Oct, one digit for 9-Oct) - and your string substitute code is not accounting for that. At any rate you can probably use as.Date or strptime directly with the format agument, without processing the characters further. Commented Nov 30, 2010 at 4:21

4 Answers 4

93

The easiest way is to use lubridate:

library(lubridate)
prods.all$Date2 <- mdy(prods.all$Date2)

This function automatically returns objects of class POSIXct and will work with either factors or characters.

Sign up to request clarification or add additional context in comments.

3 Comments

I will mention the existence of things like ymd(), ymd_hms(), myd_hms(), etc. in that library to handle date and time fields together. Awesome library btw. My hats off to you...
lubridate is an awesome package. am still using it in 2018 and can't get enough of it. There is a 'lubridate' cheat sheet at github.com/rstudio/cheatsheets/raw/master/lubridate.pdf
@hadley When I am King, you shall be knighted.
80

You may be overcomplicating things, is there any reason you need the stringr package? You can use as.Date and its format argument to specify the input format of your string.

 df <- data.frame(Date = c("10/9/2009 0:00:00", "10/15/2009 0:00:00"))
 as.Date(df$Date, format =  "%m/%d/%Y %H:%M:%S")
 # [1] "2009-10-09" "2009-10-15"

Note the Details section of ?as.Date:

Character strings are processed as far as necessary for the format specified: any trailing characters are ignored

Thus, this also works:

as.Date(df$Date, format =  "%m/%d/%Y")
# [1] "2009-10-09" "2009-10-15"

All the conversion specifications that can be used to specify the input format are found in the Details section in ?strptime. Make sure that the order of the conversion specification as well as any separators correspond exactly with the format of your input string.


More generally and if you need the time component as well, use as.POSIXct or strptime:

as.POSIXct(df$Date, "%m/%d/%Y %H:%M:%S")    
strptime(df$Date, "%m/%d/%Y %H:%M:%S")

I'm guessing at what your actual data might look at from the partial results you give.

5 Comments

I would caution against strptime because it returns a POSIXlt object, which tends to give new users fits because they don't realize it's a list. If you need the time, use as.POSIXct but beware if your "dates" are really factors...
true, but since R 2.11.0 "length(<POSIXlt>) now returns the length of the corresponding abstract timedate-vector rather than always 9 (the length of the underlying list structure). (Wish of PR#14073 and PR#10507.)" so I wondered if that was worth complicating things with. You can just as.POSIXct(strptime(x)) anyway.
I didn't realize that. Thanks for the pointer. Though I wonder if it could still be confusing if you have a POSIXlt column in a data.frame...
I realized after that it's not completely helpful - in a data.frame you will still get into trouble, though I think it's possible to put lists and arrays etc. in data.frames as columns. But I think better to understand the difference of lt/ct and use them carefully.
This seems misleading to me since the Date class that as.Date returns does not actually handle time. The answer implies that it does.
1

If you don't know the format you could use anytime::anydate, which tries to match to common formats:

library(anytime)

date <- c("01/01/2000 0:00:00", "Jan 1, 2000 0:00:00", "2000-Jan-01 0:00:00")

anydate(date)
[1] "2000-01-01" "2000-01-01" "2000-01-01"

Comments

0

library(lubridate) if your date format is like this '04/24/2017 05:35:00'then change it like below prods.all$Date2<-gsub("/","-",prods.all$Date2) then change the date format parse_date_time(prods.all$Date2, orders="mdy hms")

1 Comment

There's no need to change slashes to dashes, parse_date_time will parse it either way

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.