Skip to content

Discussion: Pushing batches of data to online store: Should conn.commit() happen in the for loop or after? #4036

@job-almekinders

Description

@job-almekinders

This piece of code in the online_write_batch function in the Postgres online store pushes data in batches to the online store.

I was wondering whether it makes more sense to put the conn.commit() inside the for loop, or after the for loop. I would love to hear the different trade-offs between the two options!

Copy of the code snippet here:

batch_size = 5000
for i in range(0, len(insert_values), batch_size):
    cur_batch = insert_values[i : i + batch_size]
    execute_values(
        cur,
        sql.SQL(
            """
            INSERT INTO {}
            (entity_key, feature_name, value, event_ts, created_ts)
            VALUES %s
            ON CONFLICT (entity_key, feature_name) DO
            UPDATE SET
                value = EXCLUDED.value,
                event_ts = EXCLUDED.event_ts,
                created_ts = EXCLUDED.created_ts;
            """,
        ).format(sql.Identifier(_table_id(project, table))),
        cur_batch,
        page_size=batch_size,
    )
    conn.commit() # << This is the point of interest. Should this be placed in the loop or after?  

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions