-
Notifications
You must be signed in to change notification settings - Fork 853
Fix #1065 by having lock for async Socket Mode client reconnection #1067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1067 +/- ##
==========================================
+ Coverage 84.14% 84.15% +0.01%
==========================================
Files 99 99
Lines 9239 9255 +16
==========================================
+ Hits 7774 7789 +15
- Misses 1465 1466 +1
Continue to review full report at Codecov.
|
seratch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments for reviewers
| channel=req.payload["event"]["channel"], | ||
| timestamp=req.payload["event"]["ts"], | ||
| ) | ||
| if req.payload["event"]["type"] == "message": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the issue where this example may fail if the app subscribes to other events like reaction_added
| else: | ||
| await asyncio.sleep(consecutive_error_count) | ||
|
|
||
| async def is_connected(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For easier access to the state and better alignment with the sync SocketModeClient, I've added this flag.
| await self.connect() | ||
| async def connect_to_new_endpoint(self, force: bool = False): | ||
| try: | ||
| await self.connect_operation_lock.acquire() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock acquisition and release inside this method is the main change in this PR. With this way, other concurrent method calls are synchronized. The client can safely check sef.is_connected() inside and decide if the client still needs to replace the WSS URL and the underlying WebSocket session.
| try: | ||
| if message.get("type") == "disconnect": | ||
| await self.connect_to_new_endpoint() | ||
| await self.connect_to_new_endpoint(force=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sync client does the same. When we receive a disconnect type message from the Socket Mode server-side, we should immediately reconnect even if the current session is still active.
| auto_reconnect_enabled: bool | ||
| default_auto_reconnect_enabled: bool | ||
| closed: bool | ||
| connect_operation_lock: Lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the only changes that are necessary in other client implementations. I'll have a migration guide section in the next version's release notes.
|
@stevengill Thanks for your review! |
|
Just in case, I have been running my example app with this change for a few days but I haven't detected any issues so far. Let me merge this change now. |
Summary
This pull request fixes #1065 by improving the internals of
AsyncSocketModeClient.As described in the bug report, the reconnection occasionally may fail to properly replace the active WebSocket connection in the
connect_to_new_endpointmethod.If both the connection monitoring code and other code try to reconnect at the same time, a race condition issue can arise. In this case, the internal state
self.wss_uriandself.current_sessioncan be invalid (=stale). The invalid state is automatically repaired after a certain amount of time (say, less than 1 minute with the default setting) but the app can be unresponsive to any incoming requests until the recovery.It's not so easy to reproduce the situation in unit tests but you can easily reproduce the situation by running the following script:
To resolve this, we can do the same with the sync version of
SocketModeClient. It always acquires a lock when replacing wss_uri and current_session to make the operation safe. The synchronization affects only the reconnection. The overall performance of the WeSocket message handlers is never affected.If I remember correctly, there was no strong reason why I didn't have the lock in the async Socket Mode clients. I'm thinking that I just missed adding it during the initial development #883
Category (place an
xin each of the[ ])/docs-src(Documents, have you run./docs.sh?)/docs-src-v2(Documents, have you run./docs-v2.sh?)/tutorial(PythOnBoardingBot tutorial)tests/integration_tests(Automated tests for this library)Requirements (place an
xin each[ ])python3 -m venv .venv && source .venv/bin/activate && ./scripts/run_validation.shafter making the changes.