I have a BigQuery table that's configured to be an External Table looking at Cloud Storage. The Source URI is:
gs://[bucketname]/test_file*
I did not specify maxBadRecords in my table creation request, and the default value is 0. I also did not specify ignoreUnknownValues, and the default for that is False. So any extra, or fewer fields in my source json files would result in a "bad row", and an error should result. The schema for the table is:
[
{
"name": "second_column",
"mode": "NULLABLE",
"type": "INTEGER",
"description": null,
"fields": []
},
{
"name": "first_column",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
}
]
I have 3 files:
test_file_1_1_1_1_1_1.json:
{"first_column": "a good value goes here", "second_column": 9}
test_file_2_2_2_2_2_2.json:
{"first_column": "a good value goes here", "second_column": 9, "extraneous_column": "uh oh"}
test_file_3_3_3_3_3_3.json:
{"first_column": "a good value goes here"}
So file 1 matches the schema, file 2 has an extra column, and file 3 is missing a column.
Querying the table when only file 1 is in the bucket returns the expected data from the file. When I add file 2 to the bucket, I get this result:
which is not what I'd expect. When I also add the third file, I get:
Is this expected behavior? I'm confused, as I'd expect any number of bad rows to result in such an error.


Ignore unknown valuesandNumber of bad records allowedflags based on your requirement. You can find more information from this link.