pandas read_csv parse dates

Question

I have written this date parsing function

def date_parser(string):
   try:
       date = pd.datetime.strptime(string, "%d/%m/%Y")
   except:
       date = pd.NaT
   return date

and I call it in pd.read_csv like this

df = pd.read_csv(os.path.join(path, file),
                 sep=";",
                 encoding="latin-1",
                 keep_default_na=False,
                 na_values=na_values,
                 index_col=False,
                 usecols=keep,
                 dtype=dtype,
                 date_parser=date_parser,
                 parse_dates=dates)

The problem is that in one of my dates column, I end up with mixed data types

df[data].apply(type).value_counts()

class 'datetime.datetime'
class 'pandas._libs.tslibs.timestamps.Timestamp'
class 'pandas._libs.tslibs.nattype.NaTType'

I should only have the last two right?

jezrael · Accepted Answer · 2019-12-05 11:24:43Z

3

I suggest change your function by to_datetime with errors='coerce' for return NaT if not matched format %d/%m/%Y:

def date_parser(string):
   return pd.to_datetime(string, format="%d/%m/%Y", errors='coerce')

answered Dec 5, 2019 at 11:24

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pandas read_csv parse dates

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related