`Client.insert_rows_json()`: add option to disable best-effort deduplication

Currently, the `Client.insert_rows_json()` method for streaming inserts always inserts an `insertId` unique identifier for each row provided.
This row identifier can be user-provided; if the user doesn't provide any identifiers, the library automatically fills the row IDs by using UUID4.

Here's the code:
```python
        for index, row in enumerate(json_rows):
            info = {"json": row}
            if row_ids is not None:
                info["insertId"] = row_ids[index]
            else:
                info["insertId"] = str(uuid.uuid4())
            rows_info.append(info)
```

However, insert IDs are entirely optional, and there are actually valid use cases _not_ to use them. From the [BigQuery documentation](https://cloud.google.com/bigquery/streaming-data-into-bigquery#disabling_best_effort_de-duplication):

> You can disable best effort de-duplication by not populating the insertId field for each row inserted. When you do not populate insertId, you get higher streaming ingest quotas in certain regions. This is the recommended way to get higher streaming ingest quota limits.

The BigQuery Python client library provides no way of omitting the `insertId`s. it would be nice to have a parameter for that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Client.insert_rows_json()`: add option to disable best-effort deduplication #720

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Client.insert_rows_json(): add option to disable best-effort deduplication #720

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Client.insert_rows_json()`: add option to disable best-effort deduplication #720