BigQuery External Table Bad Rows Issue

I have a BigQuery table that's configured to be an External Table looking at Cloud Storage. The Source URI is: gs://[bucketname]/test_file*

I did not specify maxBadRecords in my table creation request, and the default value is 0. I also did not specify ignoreUnknownValues, and the default for that is False. So any extra, or fewer fields in my source json files would result in a "bad row", and an error should result. The schema for the table is:

[
  {
    "name": "second_column",
    "mode": "NULLABLE",
    "type": "INTEGER",
    "description": null,
    "fields": []
  },
  {
    "name": "first_column",
    "mode": "NULLABLE",
    "type": "STRING",
    "description": null,
    "fields": []
  }
]

I have 3 files:

test_file_1_1_1_1_1_1.json:

{"first_column": "a good value goes here", "second_column": 9}

test_file_2_2_2_2_2_2.json:

{"first_column": "a good value goes here", "second_column": 9, "extraneous_column":  "uh oh"}

test_file_3_3_3_3_3_3.json:

{"first_column": "a good value goes here"}

So file 1 matches the schema, file 2 has an extra column, and file 3 is missing a column.

Querying the table when only file 1 is in the bucket returns the expected data from the file. When I add file 2 to the bucket, I get this result:

which is not what I'd expect. When I also add the third file, I get:

Is this expected behavior? I'm confused, as I'd expect any number of bad rows to result in such an error.

asked Jun 12, 2024 at 22:58

Jeffrey Van Laethem

2,6311 gold badge23 silver badges35 bronze badges

Seems like the error is due to the extra column “extraneous_column”. So whenever you're trying to load files that contain columns that are not represented in the table schema it is recommended to use the Ignore unknown values and Number of bad records allowed flags based on your requirement. You can find more information from this link.

kiran mathew
– kiran mathew

2024-06-13 08:04:53 +00:00
Commented Jun 13, 2024 at 8:04
@kiranmathew but it succeeds when that file and the "good" file are in the bucket. It only starts failing when the third file is also added.

Jeffrey Van Laethem
– Jeffrey Van Laethem

2024-06-13 14:08:32 +00:00
Commented Jun 13, 2024 at 14:08
Also, I have confirmed that maxBadRecords and ignoreUnknownValues are indeed not set, via a get_table request from the Python client.

Jeffrey Van Laethem
– Jeffrey Van Laethem

2024-06-13 14:56:01 +00:00
Commented Jun 13, 2024 at 14:56
Hi @Jeffrey Van Laethem, It appears that this issue has to be investigated further, so if you have a support plan please create a new GCP support case. Otherwise, you can open a new issue on the issue tracker describing your issue.

kiran mathew
– kiran mathew

2024-06-17 12:17:40 +00:00
Commented Jun 17, 2024 at 12:17

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

BigQuery External Table Bad Rows Issue

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest