Skip to content

Commit 7cb05dd

Browse files
committed
Escalate ERRORs during async notify processing to FATAL
Previously, if async notify processing encountered an error, we would report the error to the client and advance our read position past the offending entry to prevent trying to process it over and over again. Trying to continue after an error has a few problems however: - We have no way of telling the client that a notification was lost. They get an ERROR, but that doesn't tell you much. As such, it's not clear if keeping the connection alive after losing a notification is a good thing. Depending on the application logic, missing a notification could cause the application to get stuck waiting, for example. - If the connection is idle, PqCommReadingMsg is set and any ERROR is turned into FATAL anyway. - We bailed out of the notification processing loop on first error without processing any subsequent notifications. The subsequent notifications would not be processed until another notify interrupt arrives. For example, if there were two notifications pending, and processing the first one caused an ERROR, the second notification would not be processed until someone sent a new NOTIFY. This commit changes the behavior so that any ERROR while processing async notifications is turned into FATAL, causing the client connection to be terminated. That makes the behavior more consistent as that's what happened in idle state already, and terminating the connection is a clear signal to the application that it might've missed some notifications. The reason to do this now is that the next commits will change the notification processing code in a way that would make it harder to skip over just the offending notification entry on error. Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> Discussion: https://www.postgresql.org/message-id/fedbd908-4571-4bbe-b48e-63bfdcc38f64@iki.fi Backpatch-through: 14
1 parent 4ef048f commit 7cb05dd

File tree

1 file changed

+21
-13
lines changed

1 file changed

+21
-13
lines changed

src/backend/commands/async.c

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,7 @@ static double asyncQueueUsage(void);
461461
static void asyncQueueFillWarning(void);
462462
static void SignalBackends(void);
463463
static void asyncQueueReadAllNotifications(void);
464-
static bool asyncQueueProcessPageEntries(volatile QueuePosition *current,
464+
static bool asyncQueueProcessPageEntries(QueuePosition *current,
465465
QueuePosition stop,
466466
char *page_buffer,
467467
Snapshot snapshot);
@@ -1913,7 +1913,7 @@ ProcessNotifyInterrupt(bool flush)
19131913
static void
19141914
asyncQueueReadAllNotifications(void)
19151915
{
1916-
volatile QueuePosition pos;
1916+
QueuePosition pos;
19171917
QueuePosition head;
19181918
Snapshot snapshot;
19191919

@@ -1983,16 +1983,25 @@ asyncQueueReadAllNotifications(void)
19831983
* It is possible that we fail while trying to send a message to our
19841984
* frontend (for example, because of encoding conversion failure). If
19851985
* that happens it is critical that we not try to send the same message
1986-
* over and over again. Therefore, we place a PG_TRY block here that will
1987-
* forcibly advance our queue position before we lose control to an error.
1988-
* (We could alternatively retake NotifyQueueLock and move the position
1989-
* before handling each individual message, but that seems like too much
1990-
* lock traffic.)
1986+
* over and over again. Therefore, we set ExitOnAnyError to upgrade any
1987+
* ERRORs to FATAL, causing the client connection to be closed on error.
1988+
*
1989+
* We used to only skip over the offending message and try to soldier on,
1990+
* but it was somewhat questionable to lose a notification and give the
1991+
* client an ERROR instead. A client application is not be prepared for
1992+
* that and can't tell that a notification was missed. It was also not
1993+
* very useful in practice because notifications are often processed while
1994+
* a connection is idle and reading a message from the client, and in that
1995+
* state, any error is upgraded to FATAL anyway. Closing the connection
1996+
* is a clear signal to the application that it might have missed
1997+
* notifications.
19911998
*/
1992-
PG_TRY();
19931999
{
2000+
bool save_ExitOnAnyError = ExitOnAnyError;
19942001
bool reachedStop;
19952002

2003+
ExitOnAnyError = true;
2004+
19962005
do
19972006
{
19982007
int curpage = QUEUE_POS_PAGE(pos);
@@ -2045,15 +2054,14 @@ asyncQueueReadAllNotifications(void)
20452054
page_buffer.buf,
20462055
snapshot);
20472056
} while (!reachedStop);
2048-
}
2049-
PG_FINALLY();
2050-
{
2057+
20512058
/* Update shared state */
20522059
LWLockAcquire(NotifyQueueLock, LW_SHARED);
20532060
QUEUE_BACKEND_POS(MyBackendId) = pos;
20542061
LWLockRelease(NotifyQueueLock);
2062+
2063+
ExitOnAnyError = save_ExitOnAnyError;
20552064
}
2056-
PG_END_TRY();
20572065

20582066
/* Done with snapshot */
20592067
UnregisterSnapshot(snapshot);
@@ -2076,7 +2084,7 @@ asyncQueueReadAllNotifications(void)
20762084
* The QueuePosition *current is advanced past all processed messages.
20772085
*/
20782086
static bool
2079-
asyncQueueProcessPageEntries(volatile QueuePosition *current,
2087+
asyncQueueProcessPageEntries(QueuePosition *current,
20802088
QueuePosition stop,
20812089
char *page_buffer,
20822090
Snapshot snapshot)

0 commit comments

Comments
 (0)