0

I need to run pd.read_sql with no date parsing.

Under parse_dates parameter in the documentation for pd.read_sql, it states that it can be Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.

In the to_datetime documentation, by default, errors='raise'. This issue should be fixed if I can change it to errors='ignore' or errors='coerce'.

I tried implementing this like this, see below:

pd.read_sql(query, con, parse_dates={'col_name': {'errors': 'ignore'}}, chunksize=10**5)

This runs without errors but still parses dates.

The code is not very relevant for this issue. It's basically just:

df = pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=10**5)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html

Need to turn off date parsing to prevent this error:


  File "expense.py", line 20, in <module>

    for df in gen:

  File "C:\Users\rfrigo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\sql.py", line 1453, in _query_iterator

    data = cursor.fetchmany(chunksize)

ValueError: year -6371 is out of range

1 Answer 1

1

your problem is when you specigy the chunksize , look at this example :

if __name__ == '__main__':
    empty_query = 'select * from some_table where id = 8456314523;'
    df =pd.DataFrame()
    df = pd.read_sql(empty_query,connection,chunksize=10**5)
    print "df : {}".format(df if not df.empty else "df is empty")
    print 'END'

when i don't specify chunksize=10**5 the df is just empty but when i specify chunksize it cause

AttributeError: 'generator' object has no attribute 'empty' 

maybe try to first run smaller query for example with limit 1 and i this succeed run your query with chunksize

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the help! The query runs great for about 1.9 million rows, and then when it hits this one row, which has the an error in a datetime column (as year -6371), it hits the error. Any iteration which doesn't include this line runs perfect. Am I not understanding the problem?
it seems that you figure that out , can you please mark vi as answered for my answer ? @R.F.
No I haven't figured it out. I was trying to say that its not the generator. It runs great with or without the generator for any iteration that doesn't include the one cell in the one line with a date that has -6071 as the year. So this really did not help me at all. I was just thanking you for trying to help. xP
The error is "ValueError: year -6371 is out of range"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.