0

Simplified scenario:

I want to update one table A (10 million rows) with a value from another table B (20 mil rows). The two tables are linked with ids.

It took more than 7hrs to update the whole thing in one go. (I don't know exactly how long as I stopped the script in the middle.)
So my idea is to update table A in batches using OFFSET and LIMIT clauses. So far with no luck.

Wrapped up in a procedure, the code looks like below:

DECLARE 
    offset_number integer := 0;
    batch_size integer := 1000; 

BEGIN
    LOOP 
        UPDATE TableA temp1
        SET TableA_column_value_to_be_updated = (
            SELECT 
                tableB_column_value
            FROM 
                TableB temp2
            WHERE 
                temp2.id = temp1.id 
                AND some_other_conditions_in_TableB    
            ) 
        WHERE    
             some_other_conditions_in_Table
        OFFSET offset_number 
        LIMIT batch_size ; 
            
        COMMIT; 
    
        offset_number := offset_number + batch_size;
        EXIT WHEN NOT FOUND; 
    END LOOP;
END;

The engine reports an error with exception:

org.jkiss.dbeaver.model.sql.DBSQLException:
SQL Error [42601]: ERROR: syntax error at or near "OFFSET"

I have no idea what it is. Notably, it seems to work without OFFSET and LIMIT.

Any ideas why this would happen? Should I use other loop statement?

6
  • Please show the complete procedure. The header is an essential part to get the full picture. And your version of Postgres. Commented Jan 2 at 6:13
  • 1
    The "why" bit is fairly simple to answer — the update statement does not take limit or offset clauses! Commented Jan 2 at 6:24
  • Didn't you ask this exact question just the other day? Commented Jan 2 at 8:23
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. Commented Jan 2 at 11:18
  • @DaleK, this is the only question I asked so you may confuse me with someone else. I did search relative questions before I posted this one. there were no direct answers to my query. Commented Jan 2 at 22:38

2 Answers 2

1

LIMIT and OFFSET are not in the syntax of an SQL UPDATE statement. You need SELECT for that.
Also, OFFSET scales poorly to "paginate" a big table. Remember the upper bound from the last iteration instead.

Something like this could work:

CREATE OR REPLACE PROCEDURE upd_in_batches(_batch_size int = 1000)
  LANGUAGE plpgsql AS
$proc$
DECLARE
   _id_bound int = 0;  -- or whatever?
BEGIN
   LOOP
      WITH sel AS (
         SELECT a.id  -- id = PK!
         FROM   tablea a
         WHERE  a.id > _id_bound
      -- AND    <some other conditions in Table A>
         ORDER  BY a.id
         LIMIT  _batch_size
         FOR    UPDATE
         )
      , upd AS (
         UPDATE tablea a
         SET    target_col = b.b_source_col
         FROM   sel s
         JOIN   tableb b USING (id)
         WHERE  a.id = s.id
         AND    a.target_col IS DISTINCT FROM b.b_source_col
         )
      SELECT max(id)  -- always returns a row
      FROM   sel
      INTO   _id_bound;

      IF _id_bound IS NULL THEN
          EXIT;  -- no more rows found; we're done, exit loop
      ELSE
         COMMIT;
      END IF;
   END LOOP;
END
$proc$

Use a SELECT statement instead to apply your LIMIT. To avoid race conditions with concurrent writes, throw in a locking clause (FOR UPDATE). You may or may not need that.

You might be able to iUPDATE directly and just increment lower & upper bound for the filter on id instead, which is cheaper. Depends on the details of your setup and requirements. Each has its caveats.

See:

Sign up to request clarification or add additional context in comments.

1 Comment

thank you @Erwin. it's a clever approach. I will try that.
0

The issue here is in using OFFSET and LIMIT within the UPDATE statement. Most SQL dialects (like PostgreSQL) do not support using OFFSET and LIMIT in an UPDATE query as they are usually used in SELECT statements.

Here’s an alternative solution:

DECLARE 
    offset_number INTEGER := 0;
    batch_size INTEGER := 1000; 

BEGIN
    LOOP 
        -- Update rows in batches using a subquery to limit the rows processed
        WITH cte AS (
            SELECT temp1.id
            FROM TableA temp1
            WHERE some_other_conditions_in_Table
            ORDER BY temp1.id
            OFFSET offset_number
            LIMIT batch_size
        )
        UPDATE TableA temp1
        SET TableA_column_value_to_be_updated = (
            SELECT tableB_column_value
            FROM TableB temp2
            WHERE temp2.id = temp1.id AND some_other_conditions_in_TableB
        )
        WHERE temp1.id IN (SELECT id FROM cte);

        -- Commit after each batch
        COMMIT;

        -- Exit loop if no more rows are returned
        EXIT WHEN NOT FOUND;

        -- Increment the offset for the next batch
        offset_number := offset_number + batch_size;
    END LOOP;
END;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.