Skip to content

[Bug]: LoadAuth.get_tok deletes valid tokens on transient backend errors, logging users out on every Redis hiccup #69073

@co-cy

Description

@co-cy

What happened?

Description

salt.auth.LoadAuth.get_tok does not look at what kind of error happened when the eauth token backend raises while reading a token. Two very different situations are collapsed into the same branch:

def get_tok(self, tok):
    try:
        tdata = self.tokens["{}.get_token".format(self.opts["eauth_tokens"])](
            self.opts, tok
        )
    except salt.exceptions.SaltDeserializationError:
        log.error(...)
        self.rm_token(tok)
        return {}
    if not tdata:
        return {}
    ...
  1. The stored blob is corrupt (SaltDeserializationError) — removing the token is correct, the token is unusable forever.
  2. The backend itself is having a problem right now (OSError / IOError: Redis connection drop, NFS hang, full disk, file briefly held open) — the token is fine, the read just couldn't complete.

In case (2) on the current code path, the OSError propagates up the stack and the caller is forced to invent a recovery policy. Where callers do recover by calling rm_token (or where a wrapping except Exception deletes on any failure), this means a single brief backend hiccup logs every authenticated user out — a single failed read is treated as "this token is dead, drop it from the store".

For sites with a network-backed token store (Redis, NFS-shared localfs) this turns an ordinary backend restart into a fleet-wide forced re-authentication.

Setup

  • on-prem machine
  • container
  • classic packaging
  • onedir packaging

Any deployment whose master uses a network-backed eauth token store (Redis, NFS-shared localfs, anything where get_token can raise OSError).

Steps to Reproduce the behavior

import salt.auth

opts = {
    "extension_modules": "",
    "optimization_order": [0, 1, 2],
    "token_expire": 60,
    "keep_acl_in_token": False,
    "eauth_tokens": "localfs",
    "token_dir": "/tmp/tokens",
    "token_expire_user_override": False,
    "external_auth": {"auto": {"foo": []}},
}

auth = salt.auth.LoadAuth(opts)

# Simulate a transient backend error mid-read.
def bad_read(opts, tok):
    raise OSError("redis connection reset")

auth.tokens["localfs.get_token"] = bad_read

auth.get_tok("any-real-token-id")

The real-world trigger is normal: the Redis container restarts, the NFS export is unreachable for a few seconds, the disk fills up.

Expected behavior

get_tok returns {} (request is not-authenticated for this attempt) and the token is kept in the store. The next request retries against the backend, succeeds once it recovers, and the user stays logged in.

Additional context

The accompanying PR splits the except into two cases:

  • SaltDeserializationError — corrupt blob, remove the token (existing behaviour).
  • OSError (covers IOError) — transient backend error, return {}, do not remove the token.

Three behavioural unit tests guard each branch and the existing expired-token path. The IOError test fails on the previous implementation, which is the regression this change exists to prevent.

Type of salt install

Official deb

Major version

3006.x, 3007.x

What supported OS are you seeing the problem on? Can select multiple. (If bug appears on an unsupported OS, please open a GitHub Discussion instead)

debian-11, debian-12

salt --versions-report output

salt --versions-report
Salt Version:
          Salt: 3007.13

Python Version:
        Python: 3.10.19 (main, Feb  5 2026, 07:05:38) [GCC 11.2.0]

Dependency Versions:
          cffi: 2.0.0
      cherrypy: unknown
  cryptography: 42.0.5
      dateutil: 2.8.2
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.6
       libgit2: 1.9.1
  looseversion: 1.3.0
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.7
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 24.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: 1.18.2
  python-gnupg: 0.5.2
        PyYAML: 6.0.1
         PyZMQ: 25.1.2
        relenv: 0.22.3
         smmap: Not Installed
       timelib: 0.3.0
       Tornado: 6.5.4
           ZMQ: 4.3.4

Salt Extensions:
 saltext.vault: 1.5.0

Salt Package Information:
  Package Type: onedir

System Versions:
          dist: debian 12.13 bookworm
        locale: utf-8
       machine: x86_64
       release: 6.12.73+deb12-amd64
        system: Linux
       version: Debian GNU/Linux 12.13 bookworm

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugbroken, incorrect, or confusing behaviorneeds-triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions