Skip to content

test: fix flaky https tests on windows#62451

Open
jasnell wants to merge 1 commit intonodejs:mainfrom
jasnell:jasnell/hopefully-fix-flaky-windows-tests
Open

test: fix flaky https tests on windows#62451
jasnell wants to merge 1 commit intonodejs:mainfrom
jasnell:jasnell/hopefully-fix-flaky-windows-tests

Conversation

@jasnell
Copy link
Copy Markdown
Member

@jasnell jasnell commented Mar 27, 2026

The test-https-server-options-incoming-message and
test-https-server-options-server-response tests can
be flaky on Windows.

Opencode/Opus is helping figure out why.

If this PR makes the tests non-flaky it's just a bandaid. There appears to be a race condition on cleanup on windows where shutting down connections races with shutting down the server. We'd still need to identify specifically where that race is occurring and where to fix it specifically. Right now, however, my goal is to unblock CI flakes.

The flakiness does seem generalized to any test-http(s)-* and not just one... which makes me strongly suspect the race is quite low in the stack, possibly even in libuv but that's yet to be determined precisely.

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. test Issues and PRs related to the tests. labels Mar 27, 2026
@jasnell jasnell requested a review from mcollina March 27, 2026 05:15
@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.71%. Comparing base (e78ccd8) to head (374cc83).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #62451   +/-   ##
=======================================
  Coverage   89.70%   89.71%           
=======================================
  Files         691      691           
  Lines      213935   213963   +28     
  Branches    41050    41047    -3     
=======================================
+ Hits       191907   191950   +43     
- Misses      14095    14106   +11     
+ Partials     7933     7907   -26     
Files with missing lines Coverage Δ
lib/net.js 94.70% <100.00%> (+0.04%) ⬆️

... and 62 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

assert.strictEqual(ports[ports.length - 1], port);
makeRequest(url, agent, common.mustCall((port) => {
assert.strictEqual(ports[ports.length - 1], port);
server.closeAllConnections();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't agent.destroy() below sufficient?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's part of what I'm investigating. I'm running a stress test of this change on windows to determine if that deals with the flakiness. If it does, a follow up will be to figure out why the destroy isn't sufficient.

What's apparent is that there is a race condition on cleanup. Where and exactly why still needs to be determined along with a long term fix.

}, common.mustCall((res) => {
assert.strictEqual(res.statusCode, 200);
res.on('end', () => {
server.closeAllConnections();
Copy link
Copy Markdown
Member

@lpinca lpinca Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is preventing the socket from being closed normally? Is the keep-alive timeout longer than the test timeout? and if it is the case, can keep-alive be disable? I don't think it is needed for this test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through this... disabling keep-alive might address the issue for the various test-http(s)-* tests but there are tcp/tls tests also failing with the same crash that follow the same pattern, suggesting that the issue is a bit more fundamental than just using keep-alive.

@nodejs-github-bot

This comment was marked as outdated.

@jasnell
Copy link
Copy Markdown
Member Author

jasnell commented Mar 27, 2026

Hmm... the CI stress test job seems to not actually be working correctly on most of the Windows configurations.

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

nodejs-github-bot commented Mar 27, 2026

@jasnell

This comment was marked as outdated.

@jasnell
Copy link
Copy Markdown
Member Author

jasnell commented Mar 27, 2026

@joyeecheung or @addaleax ... curious if either of you have thoughts on the root cause here. The flakes are just test-http(s)-* and test-tls-* or test-net-* tests in general on windows. It's not a single or particular set of sets failing each time, they all appear to be following the same pattern, and all fail with the same error code. It's definitely a race condition somewhere, I'm thinking it's most likely in libuv but it's not entirely clear. The closeAllConnections approach in this PR appears to resolve the flakiness in these particular tests but it's just a bandaid.

@lpinca
Copy link
Copy Markdown
Member

lpinca commented Mar 27, 2026

I don't have a Windows machine at hand at the moment but can you reproduce locally or is it only in CI? Did you try with autoSelectFamily disabled?

@jasnell
Copy link
Copy Markdown
Member Author

jasnell commented Mar 27, 2026

I haven't even booted my windows machine in over two years. Once I get it back up and updated I'll see if I can reproduce locally and, if so, I'll try to get a usable dump from from it.

@jasnell jasnell force-pushed the jasnell/hopefully-fix-flaky-windows-tests branch from 8297908 to 89cd5ab Compare March 28, 2026 15:14
This is an attempt to fix the flaky failures on http(s) and
tcp tests on Windows.
@jasnell jasnell force-pushed the jasnell/hopefully-fix-flaky-windows-tests branch from 89cd5ab to 374cc83 Compare March 28, 2026 16:31
@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ci PRs that need a full CI run. test Issues and PRs related to the tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants