What happened?
Description
salt.auth.LoadAuth.get_tok does not look at what kind of error happened when the eauth token backend raises while reading a token. Two very different situations are collapsed into the same branch:
def get_tok(self, tok):
try:
tdata = self.tokens["{}.get_token".format(self.opts["eauth_tokens"])](
self.opts, tok
)
except salt.exceptions.SaltDeserializationError:
log.error(...)
self.rm_token(tok)
return {}
if not tdata:
return {}
...
- The stored blob is corrupt (
SaltDeserializationError) — removing the token is correct, the token is unusable forever.
- The backend itself is having a problem right now (
OSError / IOError: Redis connection drop, NFS hang, full disk, file briefly held open) — the token is fine, the read just couldn't complete.
In case (2) on the current code path, the OSError propagates up the stack and the caller is forced to invent a recovery policy. Where callers do recover by calling rm_token (or where a wrapping except Exception deletes on any failure), this means a single brief backend hiccup logs every authenticated user out — a single failed read is treated as "this token is dead, drop it from the store".
For sites with a network-backed token store (Redis, NFS-shared localfs) this turns an ordinary backend restart into a fleet-wide forced re-authentication.
Setup
Any deployment whose master uses a network-backed eauth token store (Redis, NFS-shared localfs, anything where get_token can raise OSError).
Steps to Reproduce the behavior
import salt.auth
opts = {
"extension_modules": "",
"optimization_order": [0, 1, 2],
"token_expire": 60,
"keep_acl_in_token": False,
"eauth_tokens": "localfs",
"token_dir": "/tmp/tokens",
"token_expire_user_override": False,
"external_auth": {"auto": {"foo": []}},
}
auth = salt.auth.LoadAuth(opts)
# Simulate a transient backend error mid-read.
def bad_read(opts, tok):
raise OSError("redis connection reset")
auth.tokens["localfs.get_token"] = bad_read
auth.get_tok("any-real-token-id")
The real-world trigger is normal: the Redis container restarts, the NFS export is unreachable for a few seconds, the disk fills up.
Expected behavior
get_tok returns {} (request is not-authenticated for this attempt) and the token is kept in the store. The next request retries against the backend, succeeds once it recovers, and the user stays logged in.
Additional context
The accompanying PR splits the except into two cases:
SaltDeserializationError — corrupt blob, remove the token (existing behaviour).
OSError (covers IOError) — transient backend error, return {}, do not remove the token.
Three behavioural unit tests guard each branch and the existing expired-token path. The IOError test fails on the previous implementation, which is the regression this change exists to prevent.
Type of salt install
Official deb
Major version
3006.x, 3007.x
What supported OS are you seeing the problem on? Can select multiple. (If bug appears on an unsupported OS, please open a GitHub Discussion instead)
debian-11, debian-12
salt --versions-report output
salt --versions-report
Salt Version:
Salt: 3007.13
Python Version:
Python: 3.10.19 (main, Feb 5 2026, 07:05:38) [GCC 11.2.0]
Dependency Versions:
cffi: 2.0.0
cherrypy: unknown
cryptography: 42.0.5
dateutil: 2.8.2
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.6
libgit2: 1.9.1
looseversion: 1.3.0
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.7
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 24.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: 1.18.2
python-gnupg: 0.5.2
PyYAML: 6.0.1
PyZMQ: 25.1.2
relenv: 0.22.3
smmap: Not Installed
timelib: 0.3.0
Tornado: 6.5.4
ZMQ: 4.3.4
Salt Extensions:
saltext.vault: 1.5.0
Salt Package Information:
Package Type: onedir
System Versions:
dist: debian 12.13 bookworm
locale: utf-8
machine: x86_64
release: 6.12.73+deb12-amd64
system: Linux
version: Debian GNU/Linux 12.13 bookworm
What happened?
Description
salt.auth.LoadAuth.get_tokdoes not look at what kind of error happened when the eauth token backend raises while reading a token. Two very different situations are collapsed into the same branch:SaltDeserializationError) — removing the token is correct, the token is unusable forever.OSError/IOError: Redis connection drop, NFS hang, full disk, file briefly held open) — the token is fine, the read just couldn't complete.In case (2) on the current code path, the
OSErrorpropagates up the stack and the caller is forced to invent a recovery policy. Where callers do recover by callingrm_token(or where a wrappingexcept Exceptiondeletes on any failure), this means a single brief backend hiccup logs every authenticated user out — a single failed read is treated as "this token is dead, drop it from the store".For sites with a network-backed token store (Redis, NFS-shared
localfs) this turns an ordinary backend restart into a fleet-wide forced re-authentication.Setup
Any deployment whose master uses a network-backed eauth token store (Redis, NFS-shared
localfs, anything whereget_tokencan raiseOSError).Steps to Reproduce the behavior
The real-world trigger is normal: the Redis container restarts, the NFS export is unreachable for a few seconds, the disk fills up.
Expected behavior
get_tokreturns{}(request is not-authenticated for this attempt) and the token is kept in the store. The next request retries against the backend, succeeds once it recovers, and the user stays logged in.Additional context
The accompanying PR splits the
exceptinto two cases:SaltDeserializationError— corrupt blob, remove the token (existing behaviour).OSError(coversIOError) — transient backend error, return{}, do not remove the token.Three behavioural unit tests guard each branch and the existing expired-token path. The IOError test fails on the previous implementation, which is the regression this change exists to prevent.
Type of salt install
Official deb
Major version
3006.x, 3007.x
What supported OS are you seeing the problem on? Can select multiple. (If bug appears on an unsupported OS, please open a GitHub Discussion instead)
debian-11, debian-12
salt --versions-report output
salt --versions-report Salt Version: Salt: 3007.13 Python Version: Python: 3.10.19 (main, Feb 5 2026, 07:05:38) [GCC 11.2.0] Dependency Versions: cffi: 2.0.0 cherrypy: unknown cryptography: 42.0.5 dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.6 libgit2: 1.9.1 looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.7 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 24.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: 1.18.2 python-gnupg: 0.5.2 PyYAML: 6.0.1 PyZMQ: 25.1.2 relenv: 0.22.3 smmap: Not Installed timelib: 0.3.0 Tornado: 6.5.4 ZMQ: 4.3.4 Salt Extensions: saltext.vault: 1.5.0 Salt Package Information: Package Type: onedir System Versions: dist: debian 12.13 bookworm locale: utf-8 machine: x86_64 release: 6.12.73+deb12-amd64 system: Linux version: Debian GNU/Linux 12.13 bookworm