0

I have been trying to clean a dataset named logbook.csv. The dataset focuses on analyzing fuel usage of users Globally. The first step is to clean a column named "date_fueled" which consists of the date that the users purchased fuel. This column has dates in the format e.g; "Apr 12 2020" but also has non-date values that have also have commas in them e.g; "Cooling System, Heating System, Lights, Spark Plugs". I have been trying to clean this data using various libraries namely: lubridate, parsedate, dplyr and readr but I keep getting either errors or all my dates get turned into NA values. I restarted my RStudio and tried to start over and realised that I get a warning message after importing my dataset.

The warning message is as follows:

> library(readr)
> logbook <- read_csv("C:/Users/theet/Downloads/logbook.csv")
Rows: 1174870 Columns: 9                                                                                                
── Column specification ─────────────────────────────────────────────────────────────
Delimiter: ","
chr (5): date_fueled, date_captured, cost_per_gallon, total_spent, user_url
dbl (3): gallons, mpg, miles
num (1): odometer

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning message:
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat) 
> View(logbook)

After reading the above I ran the "problems(dat)" code and received the following feedback:

problems(logbook)
# A tibble: 398 × 5
     row   col expected actual     file                                
   <int> <int> <chr>    <chr>      <chr>                               
 1  5409     4 a double 8,583.478  C:/Users/theet/Downloads/logbook.csv
 2  5790     8 a double 1,182.5    C:/Users/theet/Downloads/logbook.csv
 3  9681     8 a double 1,888.2    C:/Users/theet/Downloads/logbook.csv
 4 12023     4 a double 10,738.000 C:/Users/theet/Downloads/logbook.csv
 5 12140     7 a double 1,049.2    C:/Users/theet/Downloads/logbook.csv
 6 12140     8 a double 2,713.3    C:/Users/theet/Downloads/logbook.csv
 7 13609     8 a double 132,388.0  C:/Users/theet/Downloads/logbook.csv
 8 16234     4 a double 2,817.502  C:/Users/theet/Downloads/logbook.csv
 9 20879     4 a double 16,378.667 C:/Users/theet/Downloads/logbook.csv
10 26262     8 a double 49,725.2   C:/Users/theet/Downloads/logbook.csv
# ℹ 388 more rows
# ℹ Use `print(n = ...)` to see more rows

The link to my dataset is: https://drive.google.com/file/d/18TbpdmNS7hsBtUU-wkItEK9IBEfy9Hqr/view?usp=drive_link

Here is the code I wrote using the lubridate library:

library(parsedate)
library(lubridate)
library(dplyr)
library(readr)

logbook2 <- read_csv("C:/Users/theet/Downloads/logbook.csv")

# Convert date_fueled to actual date objects
logbook2  <- logbook2 %>% 
  mutate(date_fueled = as.Date(date_fueled, format = "%b %d %Y")

# Replace NA values in date_fueled with NA

logbook2 <- logbook2 %>% 
  mutate(date_fueled = ifelse(is.na(date_fueled), NA, date_fueled))

head(logbook2)

the above code gave me this error:

Error: unexpected symbol in:
"#Replace NA values in date_fueled with NA
logbook2"
> 

Please help me fix this error and also notify if there might be additional mistakes in my code.

2
  • This is confusing because you are replacing NA with NA. Is it possible that you mean replace "NA" with NA? That is that the variable is currently using a string rather than a true NA? Commented Mar 27, 2024 at 12:34
  • I meant the non-date values in the date_fueled column. Commented Mar 27, 2024 at 14:26

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.