Upon postgres segfault, connections in `async for record in conn.cursor` fail to detect loss of connectivity and hang indefinitely

* **asyncpg version**: 0.20.1
* **PostgreSQL version**:  PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
* **Do you use a PostgreSQL SaaS?  If so, which?  Can you reproduce
  the issue with a local PostgreSQL install?**: unknown - this depends on being somewhere critically inside the iterator for `conn.cursor(...)`
* **Python version**: 3.7.7 (default, Mar 24 2020, 03:07:37) \n[GCC 9.2.0]
* **Platform**: python:3.7-alpine
* **Do you use pgbouncer?**: No.
* **Did you install asyncpg with pip?**: yes
* **If you built asyncpg locally, which version of Cython did you use?**: N/A
* **Can the issue be reproduced under both asyncio and
  [uvloop](https://github.com/magicstack/uvloop)?**: uvloop is not used in this installation.

I made use of a `asyncpg.pool.Pool` created via `create_pool` with a command_timeout of None.

Midway through heavy load, the RDS instance crashed and rebooted, losing all connections.

However, the connections actively in use jammed and as of this time have been jammed for well over an hour. Using `awaitwhat` and a shell, I was able to determine that all the coroutines were jammed on the following:

```
    async def _execute_iter(self, conn):
        async for record in conn.cursor(self.query, *self.bindings.values):
            yield record

    async def execute(self):
        conn = self.conn()
        if conn is None:
            raise ValueError("Connection has been garbage collected!")

        cursor = self._execute_iter(conn)
        if self.iterable:
            return cursor
        items = []
        async for record in cursor:  # <---- jammed in here indefinitely!
            items.append(record)
        return tuple(items)
```

Upon grabbing a stalled coroutine with my backdoor shell, I discovered a surprising behavior!

```
>>> tasks[0].get_stack()[0].f_locals['conn']
<PoolConnectionProxy <asyncpg.connection.Connection object at 0x7fe02b594d70> 0x7fe02a56df10>
>>> tasks[0].get_stack()[0].f_locals['conn'].is_closed()
False
>>> tasks[0].get_stack()[0].f_locals['conn']
<PoolConnectionProxy <asyncpg.connection.Connection object at 0x7fe02b594d70> 0x7fe02a56df10>
>>> tasks[0].get_stack()[0].f_locals['conn'].is_in_transaction()
True
>>> tasks[0].get_stack()[0].f_locals['conn']._con.get_server_pid()
22062
>>> await tasks[0].get_stack()[0].f_locals['conn']._con.fetch('SELECT 1')
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 814, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aioconsole/execute.py", line 95, in aexec
    result, new_local = yield from coro
  File "<aexec>", line 2, in __corofn
  File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 420, in fetch
    return await self._execute(query, args, 0, timeout)
  File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 1403, in _execute
    query, args, limit, timeout, return_status=return_status)
  File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 1411, in __execute
    return await self._do_execute(query, executor, timeout)
  File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 1423, in _do_execute
    stmt = await self._get_statement(query, None)
  File "/usr/local/lib/python3.7/site-packages/asyncpg/connection.py", line 328, in _get_statement
    statement = await self._protocol.prepare(stmt_name, query, timeout)
  File "asyncpg/protocol/protocol.pyx", line 163, in prepare
asyncpg.exceptions.ConnectionDoesNotExistError: connection was closed in the middle of operation
>>> tasks[0].get_stack()[0].f_locals['conn'].is_closed()
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aioconsole/execute.py", line 95, in aexec
    result, new_local = yield from coro
  File "<aexec>", line 2, in __corofn
  File "/usr/local/lib/python3.7/site-packages/asyncpg/pool.py", line 56, in call_con_method
    meth_name))
asyncpg.exceptions._base.InterfaceError: cannot call Connection.is_closed(): connection has been released back to the pool
>>> 
```

I submit to you that in the code for iterating through a cursor, something does not properly handle the case of a sudden connection reset. Notice how when I actually attempt to use the stalled connection which is blocked in an `async for record in cursor` generator, it suddenly realizes the connection is dead!

If I write b”x” in the socket itself, that also is enough to wake up asyncio to being disconnected!

I will experiment with a `command_timeout` to hopefully add in a worst-case timer for blocked queries, however this 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upon postgres segfault, connections in `async for record in conn.cursor` fail to detect loss of connectivity and hang indefinitely #549

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Upon postgres segfault, connections in async for record in conn.cursor fail to detect loss of connectivity and hang indefinitely #549

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Upon postgres segfault, connections in `async for record in conn.cursor` fail to detect loss of connectivity and hang indefinitely #549