Skip to content

Fix stuck call to Dial when calling Stop on the Initiator#654

Merged
ackleymi merged 1 commit into
quickfixgo:mainfrom
abronan:mitigate-stuck-dial-on-initiator-stop
Aug 9, 2024
Merged

Fix stuck call to Dial when calling Stop on the Initiator#654
ackleymi merged 1 commit into
quickfixgo:mainfrom
abronan:mitigate-stuck-dial-on-initiator-stop

Conversation

@abronan
Copy link
Copy Markdown
Contributor

@abronan abronan commented Jul 15, 2024

This commit fixes an issue when calling Start() and then Stop() on the initiator while the connection is likely to fail and timeout. Sending a SIGTERM and calling Stop() will block since Dial will attempt to connect until it times out and returns on the waitForReconnectInterval call.

We mitigate this problem by using a proxy.ContextDialer and allowing to pass a context with cancellation method to the dialer.DialContext method on handleConnection.

We need to start a routine listening for the stopChan in order to call cancel() explicitly and thus exit the DialContext method.

Note: there are scenarios where cancel() will be called twice, this choice was made in order to avoid a larger refactor of the reconnect logic, but since the call to cancel() is idempotent, this doesn't lead to any adverse effect.

fixes #653

@abronan
Copy link
Copy Markdown
Contributor Author

abronan commented Jul 15, 2024

I have tested this locally using a program sending a SIGTERM and calling initiator.Stop() on a connection that fails, but would love to hear about an approach to test this properly within the repository. There didn't seem to be existing tests for initiator.go (unless I'm mistaken). As of now, it doesn't break any of the tests in dialer_test.go from what I could see.

This commit fixes an issue when calling Start() and then
Stop() on the initiator while the connection is likely
to fail and timeout. Calling initiator.Stop() will block since
Dial will attempt to connect until it times out and returns
on the 'waitForReconnectInterval' call.

We mitigate this problem by using a proxy.ContextDialer and
allowing to pass a context with cancellation method to the
dialer.DialContext method on 'handleConnection'.

We need to start a routine listening for the stopChan in
order to call cancel() explicitly and thus exit the DialContext
method.

Note: there are scenarios where cancel() will be called twice,
this choice was made in order to avoid a larger refactor of the
reconnect logic, but since the call to cancel() is idempotent,
this doesn't lead to any adverse effect.

fixes quickfixgo#653

Signed-off-by: Alexandre Beslic <ab21c8b9f7@abronan.com>
@abronan abronan force-pushed the mitigate-stuck-dial-on-initiator-stop branch from 5121af8 to 3939268 Compare July 18, 2024 19:07
@ackleymi ackleymi merged commit e92fa68 into quickfixgo:main Aug 9, 2024
@abronan abronan deleted the mitigate-stuck-dial-on-initiator-stop branch April 3, 2025 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calling Stop() on Initiator blocks on Dial method

2 participants