Skip to content

Commit 8114224

Browse files
committed
For inplace update, send nontransactional invalidations.
The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v14 - v17. This is a back-patch of commits: - 243e9b4 (main change, on master, before v18 branched) - 0bada39 (defect fix, on master, before v18 branched) - bae8ca8 (cosmetics from post-commit review, on REL_18_STABLE) It reverses commit c1099dd, my revert of the original back-patch of 243e9b4. This back-patch omits the non-comment heap_decode() changes. I find those changes removed harmless code that was last necessary in v13. See discussion thread for details. The back branches aren't the place to remove such code. Like the original back-patch, this doesn't change WAL, because these branches use end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Expect ".abi-compliance-history" edits to follow. Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Surya Poondla <s_poondla@apple.com> Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Nitin Motiani <nitinmotiani@google.com> (in earlier versions) Reviewed-by: Andres Freund <andres@anarazel.de> (in earlier versions) Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com Backpatch-through: 14-17
1 parent 8989919 commit 8114224

File tree

13 files changed

+334
-114
lines changed

13 files changed

+334
-114
lines changed

src/backend/access/heap/README.tuplock

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,3 +201,35 @@ wider than four bytes, and current readers don't need consistency across
201201
fields. Hence, they get by with just fetching each field once. XXX such a
202202
caller may also read a value that has not reached WAL; see
203203
systable_inplace_update_finish().
204+
205+
During logical decoding, caches reflect an inplace update no later than the
206+
next XLOG_XACT_INVALIDATIONS. That record witnesses the end of a command.
207+
Tuples of its cmin are then visible to decoding, as are inplace updates of any
208+
lower LSN. Inplace updates of a higher LSN may also be visible, even if those
209+
updates would have been invisible to a non-historic snapshot matching
210+
decoding's historic snapshot. (In other words, decoding may see inplace
211+
updates that were not visible to a similar snapshot taken during original
212+
transaction processing.) That's a consequence of inplace update violating
213+
MVCC: there are no snapshot-specific versions of inplace-updated values. This
214+
all makes it hard to reason about inplace-updated column reads during logical
215+
decoding, but the behavior does suffice for relhasindex. A relhasindex=t in
216+
CREATE INDEX becomes visible no later than the new pg_index row. While it may
217+
be visible earlier, that's harmless. Finding zero indexes despite
218+
relhasindex=t is normal in more cases than this, e.g. after DROP INDEX.
219+
Example of a case that meaningfully reacts to the inplace inval:
220+
221+
CREATE TABLE cat (c int) WITH (user_catalog_table = true);
222+
CREATE TABLE normal (d int);
223+
...
224+
CREATE INDEX ON cat (c)\; INSERT INTO normal VALUES (1);
225+
226+
If the output plugin reads "cat" during decoding of the INSERT, it's fair to
227+
want that read to see relhasindex=t and use the new index.
228+
229+
An alternative would be to have decoding of XLOG_HEAP_INPLACE immediately
230+
execute its invals. That would behave more like invals during original
231+
transaction processing. It would remove the decoding-specific delay in e.g. a
232+
decoding plugin witnessing a relfrozenxid change. However, a good use case
233+
for that is unlikely, since the plugin would still witness relfrozenxid
234+
changes prematurely. Hence, inplace update takes the trivial approach of
235+
delegating to XLOG_XACT_INVALIDATIONS.

src/backend/access/heap/heapam.c

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6307,6 +6307,19 @@ heap_inplace_lock(Relation relation,
63076307

63086308
Assert(BufferIsValid(buffer));
63096309

6310+
/*
6311+
* Register shared cache invals if necessary. Other sessions may finish
6312+
* inplace updates of this tuple between this step and LockTuple(). Since
6313+
* inplace updates don't change cache keys, that's harmless.
6314+
*
6315+
* While it's tempting to register invals only after confirming we can
6316+
* return true, the following obstacle precludes reordering steps that
6317+
* way. Registering invals might reach a CatalogCacheInitializeCache()
6318+
* that locks "buffer". That would hang indefinitely if running after our
6319+
* own LockBuffer(). Hence, we must register invals before LockBuffer().
6320+
*/
6321+
CacheInvalidateHeapTupleInplace(relation, oldtup_ptr);
6322+
63106323
LockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
63116324
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
63126325

@@ -6402,6 +6415,7 @@ heap_inplace_lock(Relation relation,
64026415
if (!ret)
64036416
{
64046417
UnlockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
6418+
ForgetInplace_Inval();
64056419
InvalidateCatalogSnapshot();
64066420
}
64076421
return ret;
@@ -6430,6 +6444,16 @@ heap_inplace_update_and_unlock(Relation relation,
64306444
if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
64316445
elog(ERROR, "wrong tuple length");
64326446

6447+
/*
6448+
* Unlink relcache init files as needed. If unlinking, acquire
6449+
* RelCacheInitLock until after associated invalidations. By doing this
6450+
* in advance, if we checkpoint and then crash between inplace
6451+
* XLogInsert() and inval, we don't rely on StartupXLOG() ->
6452+
* RelationCacheInitFileRemove(). That uses elevel==LOG, so replay would
6453+
* neglect to PANIC on EIO.
6454+
*/
6455+
PreInplace_Inval();
6456+
64336457
/* NO EREPORT(ERROR) from here till changes are logged */
64346458
START_CRIT_SECTION();
64356459

@@ -6473,17 +6497,24 @@ heap_inplace_update_and_unlock(Relation relation,
64736497
PageSetLSN(BufferGetPage(buffer), recptr);
64746498
}
64756499

6500+
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
6501+
6502+
/*
6503+
* Send invalidations to shared queue. SearchSysCacheLocked1() assumes we
6504+
* do this before UnlockTuple().
6505+
*/
6506+
AtInplace_Inval();
6507+
64766508
END_CRIT_SECTION();
6509+
UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
64776510

6478-
heap_inplace_unlock(relation, oldtup, buffer);
6511+
AcceptInvalidationMessages(); /* local processing of just-sent inval */
64796512

64806513
/*
6481-
* Send out shared cache inval if necessary. Note that because we only
6482-
* pass the new version of the tuple, this mustn't be used for any
6483-
* operations that could change catcache lookup keys. But we aren't
6484-
* bothering with index updates either, so that's true a fortiori.
6485-
*
6486-
* XXX ROLLBACK discards the invalidation. See test inplace-inval.spec.
6514+
* Queue a transactional inval, for logical decoding and for third-party
6515+
* code that might have been relying on it since long before inplace
6516+
* update adopted immediate invalidation. See README.tuplock section
6517+
* "Reading inplace-updated columns" for logical decoding details.
64876518
*/
64886519
if (!IsBootstrapProcessingMode())
64896520
CacheInvalidateHeapTuple(relation, tuple, NULL);
@@ -6498,6 +6529,7 @@ heap_inplace_unlock(Relation relation,
64986529
{
64996530
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
65006531
UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
6532+
ForgetInplace_Inval();
65016533
}
65026534

65036535
/*

src/backend/access/transam/xact.c

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1276,14 +1276,24 @@ RecordTransactionCommit(void)
12761276

12771277
/*
12781278
* Transactions without an assigned xid can contain invalidation
1279-
* messages (e.g. explicit relcache invalidations or catcache
1280-
* invalidations for inplace updates); standbys need to process those.
1281-
* We can't emit a commit record without an xid, and we don't want to
1282-
* force assigning an xid, because that'd be problematic for e.g.
1283-
* vacuum. Hence we emit a bespoke record for the invalidations. We
1284-
* don't want to use that in case a commit record is emitted, so they
1285-
* happen synchronously with commits (besides not wanting to emit more
1286-
* WAL records).
1279+
* messages. While inplace updates do this, this is not known to be
1280+
* necessary; see comment at inplace CacheInvalidateHeapTuple().
1281+
* Extensions might still rely on this capability, and standbys may
1282+
* need to process those invals. We can't emit a commit record
1283+
* without an xid, and we don't want to force assigning an xid,
1284+
* because that'd be problematic for e.g. vacuum. Hence we emit a
1285+
* bespoke record for the invalidations. We don't want to use that in
1286+
* case a commit record is emitted, so they happen synchronously with
1287+
* commits (besides not wanting to emit more WAL records).
1288+
*
1289+
* XXX Every known use of this capability is a defect. Since an XID
1290+
* isn't controlling visibility of the change that prompted invals,
1291+
* other sessions need the inval even if this transactions aborts.
1292+
*
1293+
* ON COMMIT DELETE ROWS does a nontransactional index_build(), which
1294+
* queues a relcache inval, including in transactions without an xid
1295+
* that had read the (empty) table. Standbys don't need any ON COMMIT
1296+
* DELETE ROWS invals, but we've not done the work to withhold them.
12871297
*/
12881298
if (nmsgs != 0)
12891299
{

src/backend/catalog/index.c

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2911,12 +2911,19 @@ index_update_stats(Relation rel,
29112911
if (dirty)
29122912
{
29132913
systable_inplace_update_finish(state, tuple);
2914-
/* the above sends a cache inval message */
2914+
/* the above sends transactional and immediate cache inval messages */
29152915
}
29162916
else
29172917
{
29182918
systable_inplace_update_cancel(state);
2919-
/* no need to change tuple, but force relcache inval anyway */
2919+
2920+
/*
2921+
* While we didn't change relhasindex, CREATE INDEX needs a
2922+
* transactional inval for when the new index's catalog rows become
2923+
* visible. Other CREATE INDEX and REINDEX code happens to also queue
2924+
* this inval, but keep this in case rare callers rely on this part of
2925+
* our API contract.
2926+
*/
29202927
CacheInvalidateRelcacheByTuple(tuple);
29212928
}
29222929

src/backend/replication/logical/decode.c

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -559,20 +559,13 @@ DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
559559
/*
560560
* Inplace updates are only ever performed on catalog tuples and
561561
* can, per definition, not change tuple visibility. Since we
562-
* don't decode catalog tuples, we're not interested in the
562+
* also don't decode catalog tuples, we're not interested in the
563563
* record's contents.
564-
*
565-
* In-place updates can be used either by XID-bearing transactions
566-
* (e.g. in CREATE INDEX CONCURRENTLY) or by XID-less
567-
* transactions (e.g. VACUUM). In the former case, the commit
568-
* record will include cache invalidations, so we mark the
569-
* transaction as catalog modifying here. Currently that's
570-
* redundant because the commit will do that as well, but once we
571-
* support decoding in-progress relations, this will be important.
572564
*/
573565
if (!TransactionIdIsValid(xid))
574566
break;
575567

568+
/* PostgreSQL 13 was the last to need these actions. */
576569
SnapBuildProcessChange(builder, xid, buf->origptr);
577570
ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
578571
break;

src/backend/utils/cache/catcache.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2190,7 +2190,8 @@ void
21902190
PrepareToInvalidateCacheTuple(Relation relation,
21912191
HeapTuple tuple,
21922192
HeapTuple newtuple,
2193-
void (*function) (int, uint32, Oid))
2193+
void (*function) (int, uint32, Oid, void *),
2194+
void *context)
21942195
{
21952196
slist_iter iter;
21962197
Oid reloid;
@@ -2231,7 +2232,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
22312232
hashvalue = CatalogCacheComputeTupleHashValue(ccp, ccp->cc_nkeys, tuple);
22322233
dbid = ccp->cc_relisshared ? (Oid) 0 : MyDatabaseId;
22332234

2234-
(*function) (ccp->id, hashvalue, dbid);
2235+
(*function) (ccp->id, hashvalue, dbid, context);
22352236

22362237
if (newtuple)
22372238
{
@@ -2240,7 +2241,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
22402241
newhashvalue = CatalogCacheComputeTupleHashValue(ccp, ccp->cc_nkeys, newtuple);
22412242

22422243
if (newhashvalue != hashvalue)
2243-
(*function) (ccp->id, newhashvalue, dbid);
2244+
(*function) (ccp->id, newhashvalue, dbid, context);
22442245
}
22452246
}
22462247
}

0 commit comments

Comments
 (0)