CSV file data manipulation in R

Ask Question

Asked 1 year, 8 months ago

Modified 1 year, 8 months ago

Viewed 58 times

Part of R Language Collective

I have been trying to clean a dataset named logbook.csv. The dataset focuses on analyzing fuel usage of users Globally. The first step is to clean a column named "date_fueled" which consists of the date that the users purchased fuel. This column has dates in the format e.g; "Apr 12 2020" but also has non-date values that have also have commas in them e.g; "Cooling System, Heating System, Lights, Spark Plugs". I have been trying to clean this data using various libraries namely: lubridate, parsedate, dplyr and readr but I keep getting either errors or all my dates get turned into NA values. I restarted my RStudio and tried to start over and realised that I get a warning message after importing my dataset.

The warning message is as follows:

> library(readr)
> logbook <- read_csv("C:/Users/theet/Downloads/logbook.csv")
Rows: 1174870 Columns: 9                                                                                                
── Column specification ─────────────────────────────────────────────────────────────
Delimiter: ","
chr (5): date_fueled, date_captured, cost_per_gallon, total_spent, user_url
dbl (3): gallons, mpg, miles
num (1): odometer

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning message:
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat) 
> View(logbook)

After reading the above I ran the "problems(dat)" code and received the following feedback:

problems(logbook)
# A tibble: 398 × 5
     row   col expected actual     file                                
   <int> <int> <chr>    <chr>      <chr>                               
 1  5409     4 a double 8,583.478  C:/Users/theet/Downloads/logbook.csv
 2  5790     8 a double 1,182.5    C:/Users/theet/Downloads/logbook.csv
 3  9681     8 a double 1,888.2    C:/Users/theet/Downloads/logbook.csv
 4 12023     4 a double 10,738.000 C:/Users/theet/Downloads/logbook.csv
 5 12140     7 a double 1,049.2    C:/Users/theet/Downloads/logbook.csv
 6 12140     8 a double 2,713.3    C:/Users/theet/Downloads/logbook.csv
 7 13609     8 a double 132,388.0  C:/Users/theet/Downloads/logbook.csv
 8 16234     4 a double 2,817.502  C:/Users/theet/Downloads/logbook.csv
 9 20879     4 a double 16,378.667 C:/Users/theet/Downloads/logbook.csv
10 26262     8 a double 49,725.2   C:/Users/theet/Downloads/logbook.csv
# ℹ 388 more rows
# ℹ Use `print(n = ...)` to see more rows

The link to my dataset is: https://drive.google.com/file/d/18TbpdmNS7hsBtUU-wkItEK9IBEfy9Hqr/view?usp=drive_link

Here is the code I wrote using the lubridate library:

library(parsedate)
library(lubridate)
library(dplyr)
library(readr)

logbook2 <- read_csv("C:/Users/theet/Downloads/logbook.csv")

# Convert date_fueled to actual date objects
logbook2  <- logbook2 %>% 
  mutate(date_fueled = as.Date(date_fueled, format = "%b %d %Y")

# Replace NA values in date_fueled with NA

logbook2 <- logbook2 %>% 
  mutate(date_fueled = ifelse(is.na(date_fueled), NA, date_fueled))

head(logbook2)

the above code gave me this error:

Error: unexpected symbol in:
"#Replace NA values in date_fueled with NA
logbook2"
>

Please help me fix this error and also notify if there might be additional mistakes in my code.

edited Mar 28, 2024 at 0:13

Phil

8,1973 gold badges42 silver badges76 bronze badges

asked Mar 27, 2024 at 7:01

tshepo

11 bronze badge

This is confusing because you are replacing NA with NA. Is it possible that you mean replace "NA" with NA? That is that the variable is currently using a string rather than a true NA?

Elin
– Elin

2024-03-27 12:34:48 +00:00
Commented Mar 27, 2024 at 12:34
I meant the non-date values in the date_fueled column.

tshepo
– tshepo

2024-03-27 14:26:43 +00:00
Commented Mar 27, 2024 at 14:26

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

CSV file data manipulation in R

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest