Skip to content

[fix][test] Split and fix flaky SASL authentication first-stage auth cache tests#25949

Open
oneby-wang wants to merge 3 commits into
apache:masterfrom
oneby-wang:fix-sasl-only-auth-first-stage-flaky
Open

[fix][test] Split and fix flaky SASL authentication first-stage auth cache tests#25949
oneby-wang wants to merge 3 commits into
apache:masterfrom
oneby-wang:fix-sasl-only-auth-first-stage-flaky

Conversation

@oneby-wang

Copy link
Copy Markdown
Contributor

Fixes #24337

Motivation

SaslAuthenticateTest.testSaslOnlyAuthFirstStage currently verifies two timing-sensitive behaviors in one test: that first-stage SASL authentication leaves inflight contexts in the cache, and that residual contexts are later expired. The exact cache-size assertion can become flaky when entries expire before the assertion is reached.

Modifications

Added two focused tests for the separate behaviors:

  • testSaslOnlyAuthFirstStageKeepsInflightContextsBeforeExpiry verifies that first-stage SASL authentication keeps inflight contexts when the expiry timeout is long enough.
  • testSaslOnlyAuthFirstStageExpiresResidualContexts verifies that residual first-stage contexts expire and are removed after cleanup.

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Comment on lines +359 to +360
cache.cleanUp();
assertEquals(cache.asMap().size(), 0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make sense? The cache is flushed in test code and it's asserted to be empty.

@oneby-wang oneby-wang Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think it would be clearer to verify the behavior that the test actually cares about: the expired authentication contexts are no longer retrievable from the cache.

Instead of asserting the internal cache size after cleanUp(), we can directly check:

Awaitility.await().atMost(5, TimeUnit.SECONDS).pollDelay(100, TimeUnit.MILLISECONDS).untilAsserted(() -> {
    for (int i = 0; i < 10; i++) {
        AuthenticationState authenticationState = cache.getIfPresent(((long) i));
        log.info("authenticationState: " + authenticationState);
        assertNull(cache.getIfPresent(((long) i)));
    }
    // assertEquals(cache.asMap().size(), 0);
});

This makes the test focus on the observable behavior rather than the cache's internal state, and avoids relying on asMap().size() for validation.

Another interesting thing is: if I uncomment the assertEquals(cache.asMap().size(), 0) assertion, the test may still fail even though all getIfPresent(...) checks return null.

2026-06-08T16:35:00,952 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:00,952 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,057 - INFO  - [awaitility-thread:SaslAuthenticateTest] - authenticationState: null {}
2026-06-08T16:35:01,165 - WARN  - [pulsar-tgt-refresh-thread:TGTRefreshThread] - TGT renewal thread has been interrupted and will exit {}

Assertion condition expected [0] but found [10] within 10 seconds.
org.awaitility.core.ConditionTimeoutException: Assertion condition expected [0] but found [10] within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)

Also, do you think a similar change would make sense in #25948 as well? It seems that asserting getIfPresent(...) == null could make the test intent clearer there too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@oneby-wang

Copy link
Copy Markdown
Contributor Author

@lhotari About optimization for PR #25948. Would you recommend addressing it in this PR, or opening a separate PR for it?

@oneby-wang

Copy link
Copy Markdown
Contributor Author

From Caffeine's design, expiration visibility and physical removal are not the same thing. An expired entry can behave as absent (getIfPresent() returns null) while still remaining in the internal data structures until routine maintenance removes it. As a result, the internal map size may temporarily include expired entries that are no longer accessible through the cache API.

This is also discussed by the Caffeine maintainer here:
ben-manes/caffeine#524

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky-test: SaslAuthenticateTest.testSaslOnlyAuthFirstStage

2 participants