0

I have a BigQuery Table with about 200,000,000 rows. I have an external table that holds rows that need to have a value updated by having the row to be updated's unique id, and thestring that needs to be updated in the table. The external table is about 1m rows.

When we run an update query to update all the rows in the static table with the values in the external table on matching IDs, we get the following error:

"Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 110% of limit."

Query:

UPDATE `target_table` AS TARGET
SET target.string_to_update=source.string_to_update
FROM `external_table` AS SOURCE
WHERE target.id = source.id;

This is a simple update query and should be distributed, so I'm guessing the external table causes issues? What can I do to have this update complete as expected?

1
  • well the error is clear, probably update in batches by adding a condition like and target.id between 10000 and 999999 and so on. Commented Dec 19, 2024 at 20:33

2 Answers 2

0

BigQuery queries do have a maximum response size. One possible workaround is to simply optimize your query and break the data into smaller pieces in order to avoid hitting that memory limit, or you can also use the AllowLargeResults flag and specify a destination table to store the queried results.

Just to give you some context in legacy SQL, writing large results is subject to some limitations:

  • You’re required to specify a destination table.

  • You cannot specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.

  • Window functions can return large query results only if used in conjunction with a PARTITION BY clause.

Also, feel free to check this relevant case.

Sign up to request clarification or add additional context in comments.

Comments

0

Here is a script to batch the 200m rows into customisable batches and loop over them.

DECLARE batch_num INT64 DEFAULT 0;
DECLARE total_rows INT64 DEFAULT 0;
DECLARE num_of_batches INT64 DEFAULT 0;
DECLARE batch_size INT64 DEFAULT 0;
DECLARE rownum_start INT64 DEFAULT 1;

SET num_of_batches = 10; --this is customisable;
SET total_rows = (SELECT count(*) from `target_table`);
SET batch_size = CEIL(total_rows / num_of_batches);


--assign each row in the target table a rownumber
WITH tmp AS (
    SELECT
        ROW_NUMBER() OVER(ORDER BY id) AS rownum,
        id
    FROM
        `target_table`
)

LOOP
    --increment the batch number
    SET batch_num = batch_num + 1;

    IF batch_num > num_of_batches THEN LEAVE;
    END IF;

    --Update the target table for records between the min and max rownumbers in a batch
    UPDATE `target_table` AS target
    SET target.string_to_update = source.string_to_update
    FROM `external_table` AS source
    INNER JOIN tmp
        ON target.id = tmp.id
        AND tmp.id = source.id
    WHERE
        tmp.rownum BETWEEN rownum_start AND rownum_start + batch_size -1;

    --set the starting rownum for the next batch
    SET rownum_start = rownum_start + batch_size;

END LOOP;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.