- asyncpg version: 0.20.1
- PostgreSQL version: PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
- Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
the issue with a local PostgreSQL install?: unknown - this depends on being somewhere critically inside the iterator for conn.cursor(...)
- Python version: 3.7.7 (default, Mar 24 2020, 03:07:37) \n[GCC 9.2.0]
- Platform: python:3.7-alpine
- Do you use pgbouncer?: No.
- Did you install asyncpg with pip?: yes
- If you built asyncpg locally, which version of Cython did you use?: N/A
- Can the issue be reproduced under both asyncio and
uvloop?: uvloop is not used in this installation.
I made use of a asyncpg.pool.Pool created via create_pool with a command_timeout of None.
Midway through heavy load, the RDS instance crashed and rebooted, losing all connections.
However, the connections actively in use jammed and as of this time have been jammed for well over an hour. Using awaitwhat and a shell, I was able to determine that all the coroutines were jammed on the following:
async def _execute_iter(self, conn):
async for record in conn.cursor(self.query, *self.bindings.values):
yield record
async def execute(self):
conn = self.conn()
if conn is None:
raise ValueError("Connection has been garbage collected!")
cursor = self._execute_iter(conn)
if self.iterable:
return cursor
items = []
async for record in cursor: # <---- jammed in here indefinitely!
items.append(record)
return tuple(items)
Upon grabbing a stalled coroutine with my backdoor shell, I discovered a surprising behavior!
>>> tasks[0].get_stack()[0].f_locals['conn']
<PoolConnectionProxy <asyncpg.connection.Connection object at 0x7fe02b594d70> 0x7fe02a56df10>
>>> tasks[0].get_stack()[0].f_locals['conn'].is_closed()
False
>>> tasks[0].get_stack()[0].f_locals['conn']
<PoolConnectionProxy <asyncpg.connection.Connection object at 0x7fe02b594d70> 0x7fe02a56df10>
>>> tasks[0].get_stack()[0].f_locals['conn'].is_in_transaction()
True
>>> tasks[0].get_stack()[0].f_locals['conn']._con.get_server_pid()
22062
>>> await tasks[0].get_stack()[0].f_locals['conn']._con.fetch('SELECT 1')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 814, in _read_ready__data_received
data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/aioconsole/execute.py", line 95, in aexec
result, new_local = yield from coro
File "<aexec>", line 2, in __corofn
File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 420, in fetch
return await self._execute(query, args, 0, timeout)
File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 1403, in _execute
query, args, limit, timeout, return_status=return_status)
File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 1411, in __execute
return await self._do_execute(query, executor, timeout)
File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 1423, in _do_execute
stmt = await self._get_statement(query, None)
File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 328, in _get_statement
statement = await self._protocol.prepare(stmt_name, query, timeout)
File "asyncpg/protocol/protocol.pyx", line 163, in prepare
asyncpg.exceptions.ConnectionDoesNotExistError: connection was closed in the middle of operation
>>> tasks[0].get_stack()[0].f_locals['conn'].is_closed()
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/aioconsole/execute.py", line 95, in aexec
result, new_local = yield from coro
File "<aexec>", line 2, in __corofn
File "/usr/local/lib/python3.7/site-packages/asyncpg/pool.py", line 56, in call_con_method
meth_name))
asyncpg.exceptions._base.InterfaceError: cannot call Connection.is_closed(): connection has been released back to the pool
>>>
I submit to you that in the code for iterating through a cursor, something does not properly handle the case of a sudden connection reset. Notice how when I actually attempt to use the stalled connection which is blocked in an async for record in cursor generator, it suddenly realizes the connection is dead!
If I write b”x” in the socket itself, that also is enough to wake up asyncio to being disconnected!
I will experiment with a command_timeout to hopefully add in a worst-case timer for blocked queries, however this
the issue with a local PostgreSQL install?: unknown - this depends on being somewhere critically inside the iterator for
conn.cursor(...)uvloop?: uvloop is not used in this installation.
I made use of a
asyncpg.pool.Poolcreated viacreate_poolwith a command_timeout of None.Midway through heavy load, the RDS instance crashed and rebooted, losing all connections.
However, the connections actively in use jammed and as of this time have been jammed for well over an hour. Using
awaitwhatand a shell, I was able to determine that all the coroutines were jammed on the following:Upon grabbing a stalled coroutine with my backdoor shell, I discovered a surprising behavior!
I submit to you that in the code for iterating through a cursor, something does not properly handle the case of a sudden connection reset. Notice how when I actually attempt to use the stalled connection which is blocked in an
async for record in cursorgenerator, it suddenly realizes the connection is dead!If I write b”x” in the socket itself, that also is enough to wake up asyncio to being disconnected!
I will experiment with a
command_timeoutto hopefully add in a worst-case timer for blocked queries, however this