4

How is Pandas parse_date supposed to work when retrieving data from a MySQL database?

The documentation of Pandas 0.23 gives this information:

parse_dates : list or dict, default: None

List of column names to parse as dates.

Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.

I would like to retrieve for example some data from the MySQL Sakila database.

create table actor
(
    actor_id smallint(5) unsigned auto_increment
        primary key,
    first_name varchar(45) not null,
    last_name varchar(45) not null,
    last_update timestamp not null on update CURRENT_TIMESTAMP,
    constraint idx_unique_id_name
        unique (actor_id, last_name)
)

Here is some sample data:

INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (1, 'PENELOPE', 'None', '2018-05-17 11:08:03');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (2, 'NICK', 'WAHLBERG', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (3, 'ED', 'CHASE', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (4, 'JENNIFER', 'DAVIS', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (5, 'JOHNNY', 'LOLLOBRIGIDA', '2018-05-17 11:14:15');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (6, 'BETTE', 'Echt', '2018-05-17 11:13:57');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (7, 'GRACE', 'MOSTEL', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (8, 'MATTHEW', 'JOHANSSON', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (9, 'JOE', 'SWANK', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (10, 'CHRISTIAN', 'GABLE', '2006-02-15 04:34:33');

I use the default MySQL Python Connector:

    db_connection_url = 'mysql+mysqlconnector://' \
                        + mysql_config_dict['user'] \
                        + ":"  \
                        + mysql_config_dict['password'] \
                        + "@" \
                        + mysql_config_dict['host'] \
                        + ":" \
                        + mysql_config_dict['port'] \
                        + "/"  \
                        + mysql_config_dict['db_name']

    if('ssl_cert' in mysql_config_dict):

        ssl_args = {'ssl_ca':mysql_config_dict['ssl_ca']}

    else:
        ssl_args = ''

With these parameters

mysql_config_dict = {
    'user': 'root',
    'password': '',
    'host':  '127.0.0.1',
    'port': '3306',
    'db_name':  'sakila',
    'ssl_cert': os.getenv('SSL_CERT'),
    'ssl_key': os.getenv('SSL_KEY'),
    'ssl_ca': os.getenv('SSL_CA')
}

for obtaining an engine.

The Python snippet to retrieve the result set:

df = pd.read_sql_query('SELECT a.actor_id, a.last_name, a.last_update FROM sakila.actor a',parse_dates={'last_update':'%Y%m%d %H:%M:%S'},con=mysql_conn)

I obtain a KeyError:

Traceback (most recent call last):
  File "~/Development/python-virtual-env/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2442, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)
  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: 'last_update'

When I use

df = pd.read_sql_query('SELECT a.actor_id, a.last_name, a.last_update FROM sakila.actor a',parse_dates=True,con=mysql_conn)

it works but I can see in the DataFrame view of IntelliJ that the column name of the column 'last_update' is prefixed with the Byte literal: b'last_update', which is strange.

What is the correct usage here when I want to treat multiple columns as date columns. Thanks!

1 Answer 1

4

I pass the field names in a list to parse_dates when I invoke pd.read_sql with:

df= pd.read_sql(query, 
                connection, 
                parse_dates=['Date_of_creation', 
                             'Date_of_termination']
                )

You mentioned doing it with a dictionary for custom formatting:

fmt='%Y%m%d %H:%M:%S'

df= pd.read_sql(query, 
                connection, 
                parse_dates={'Date_of_creation':fmt,
                             'Date_of_termination':fmt}
                )
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.