Things my team is working on: MediaWiki-Platform-Team
Side projects I am working on (or planning to, eventually): User-Tgr
You can find more info about me on my user page.
User Details
- User Since
- Sep 19 2014, 4:55 PM (597 w, 3 d)
- Availability
- Available
- IRC Nick
- tgr
- LDAP User
- Gergő Tisza
- MediaWiki User
- Tgr (WMF) [ Global Accounts ]
Today
I guess this is not really true since the sub is taken from the access token, so you could e.g. rotate access tokens to get around the rate limit.
...except for owner-only access tokens which do not use a prefix. But owner-only tokens are largely ignored so there shouldn't be anything needing to be combined in that case?
The prefix of the sub claim reflects the default CentralIdLookup of the given wiki, it's not affected by what kind of session handler generated it. So for SUL wikis it's always going to be mw:CentralAuth::.
Yesterday
Can be set in X-Analytics directly in the backend response ?
Sun, Mar 1
3 instances so far so impact is small.
Caused by RefreshTokenGrant::respondToAccessTokenRequest(). Upstream added strict_types=1 and a type conversion; our forked code skips the type conversion part. So we either need to fix the fork, or (preferably) finish T261462: Migrate OAuth extension back from wikimedia/oauth2-server fork to upstream.
Sat, Feb 28
Presumably T261462: Migrate OAuth extension back from wikimedia/oauth2-server fork to upstream related - maybe the refresh token was created before rEOAUd78206b971bb: Update league/oauth2-server and dependancies and used after it?
Fri, Feb 27
One way to handle this is T418475: Session providers have no way to invalidate a session from provideSessionInfo() - instead of returning null from provideSessionInfo() which can result in all kinds of things depending on which other session handlers will get involved, just indicate the session is invalid, so cookies (including the JWT) get cleared.
The log entries didn't fully stop around 18:40 but did drop by a magnitude (maybe more; logs before that were capped by the throttling Logstash applies to events with the same message and channel). The rest is presumably due to some bots (at a glance, maybe just one bot) not respecting cookie expiries.
Bot password JWTs were disabled today at 14:40 UTC. So most likely the error will stop happening at 18:40 (JWT cookie expiry is 4 hours). Given that and that there's no obvious impact, it's probably best to wait it out. Rolling back the entire set of bot password JWT patches would be error-prone, and trying to write a fix at Friday evening, without really understanding the exact path these errors take, seems like a bad idea.
Looking up the affected users, they are all bots. So either this is related to the use of bot passwords, or maybe to the login API more generally. Maybe these are bots which log in with a bot password, but then for some reason lose the bot password cookie without losing the JWT cookie, and then during login get an anonymous session which doesn't work due to the inconsistent JWT cookie? There are about 100/s events, that's way too high for logins though. And some of the API requests clearly aren't logins (most are POST so no way to tell).
So apparently this happens for anonymous sessions (in CookieSessionProvider) when there is a JWT cookie. AFAICS the code was essentially the same before rMWa8dd114a4668: Session: Emit JWT cookie in ImmutableSessionProviderWithCookie, but maybe there wasn't an easy way to end up in this situation, and bot passwords somehow changed that?
from /srv/mediawiki/php-1.46.0-wmf.17/includes/Session/SessionManager.php(407)
#0 /srv/mediawiki/php-1.46.0-wmf.17/includes/Session/JwtSessionCookieHelper.php(158): MediaWiki\Session\SessionManager->validateJwtSubject(array, MediaWiki\User\User)
#1 /srv/mediawiki/php-1.46.0-wmf.17/includes/Session/CookieSessionProvider.php(189): MediaWiki\Session\JwtSessionCookieHelper->verifyJwtCookie(MediaWiki\Request\WebRequest, MediaWiki\Session\SessionInfo, array, array)
#2 /srv/mediawiki/php-1.46.0-wmf.17/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php(125): MediaWiki\Session\CookieSessionProvider->provideSessionInfo(MediaWiki\Request\WebRequest)
#3 /srv/mediawiki/php-1.46.0-wmf.17/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php(218): CentralAuthSessionProvider->returnParentSessionInfo(MediaWiki\Request\WebRequest)
#4 /srv/mediawiki/php-1.46.0-wmf.17/includes/Session/SessionManager.php(569): CentralAuthSessionProvider->provideSessionInfo(MediaWiki\Request\WebRequest)
#5 /srv/mediawiki/php-1.46.0-wmf.17/includes/Session/SessionManager.php(137): MediaWiki\Session\SessionManager->getSessionInfoForRequest(MediaWiki\Request\WebRequest)
#6 /srv/mediawiki/php-1.46.0-wmf.17/includes/Request/WebRequest.php(861): MediaWiki\Session\SessionManager->getSessionForRequest(MediaWiki\Request\WebRequest)
#7 /srv/mediawiki/php-1.46.0-wmf.17/includes/Setup.php(504): MediaWiki\Request\WebRequest->getSession()
#8 /srv/mediawiki/php-1.46.0-wmf.17/includes/WebStart.php(73): require_once(string)
#9 /srv/mediawiki/php-1.46.0-wmf.17/api.php(23): require(string)
#10 /srv/mediawiki/w/api.php(3): require(string)
#11 {main}The logs are not down. I tested and bot password JWTs are disabled, as expected. That would imply that the real issue is rMWa8dd114a4668: Session: Emit JWT cookie in ImmutableSessionProviderWithCookie causing some problem in CookieSessionProvider / CentralAuthSessionProvider.
It's a weekend so let's just disable for now.
Probably it's from this line. The central IDs I see in the logs are fine, though, so not sure what's going on.
This might ave broken bot passwords; there are five million JWT validation failed: JWT error: wrong subject log entries since yesterday. No error reports though, so not sure what's going on. (But we will have to fix it for the log volume, if nothing else.)
Thu, Feb 26
As a nice side effect, this tanked the "Session store lookups with no user information" metric, ie. MultiBackendSessionStore now knows bot password sessions are always authenticated.
Deployed to production.
In theory fixed in production. I don't think I can test votewiki login so please verify.
These are the possible authentication types (once we finish the pending bot pw / OAuth work):
| session type | JWT cookie | Authorization header |
|---|---|---|
| normal web-based | yes | no |
| bot password | yes | no |
| OAuth 1 | yes | yes, but not a JWT |
| OAuth 2 owner-only | yes | yes, but no rlc field |
| OAuth 2 (normal) | no | yes |
I'd say PSI, these don't have much to do with MWP's scope, even if wmfGetPrivilegedGroups() does a CentralAuth call or two. (There is a bunch of authentication-related hooks in the configuration, those should be also moved, and those should be owned by MWP.)
We should probably introduce an explicit invalid value for UserInfo, to tell SessionManager to unpersist.
Do you want to differentiate between owner-only and normal access tokens? (For OAuth 2; I don't think it's possible for OAuth 1.) If so, maybe it makes sense to put the logic into Envoy to keep the logic related to JWT internals in one place. (AIUI the edge will validate the JWT signature but not do much else.)
Wed, Feb 25
This upstream issue sounds similar (although it is talking about replies, not top-level comments).
if the Resolved checkbox is unchecked just before hitting the send button, this selection is ignored and the comment remain marked as resolved. If the comment is modified after the checkbox is used, this issue does not occur.
I think if we go this way, we'd want them mutually exclusive because client-credentials apps are conceptually like owner-only apps, so human review could be relaxed. Also we might want a more dedicated registration UI. If someone needs both client credentials and authorization code flow, it's not too much hassle to register two separate consumers.
Tue, Feb 24
JWT cookies / access tokens are regenerated every four hours (except owner-only access tokens but those don't get rlc anyway) so every token should have the right rlc by now.
tgr@deploy2002:~$ foreachwikiindblist sul CentralAuth:UpdateAutomaticGlobalGroupMembership --local-group=bot | tee T415588.log
Mon, Feb 23
tgr@deploy2002:~$ foreachwikiindblist sul CentralAuth:UpdateAutomaticGlobalGroupMembership --local-group=checkuser --local-group=suppress | tee T416541.log
You could always just configure a global-temporary-account-viewer-2 group on beta, re-run the script, and see what happens. That's guaranteed to be empty at start.
Or the output stream gets misconfigured - at a glance, SwiftFileBackend will first make an authentication request refreshAuthentication() and then a request for the actual image, so maybe the body output is from the first one.
OAuthRateLimiter sets higher rate limits for select applications, and it's based on consumer ID, not user ID, so it cannot be reproduced in the normal session cookie. (It can be reproduced in the OAuth session cookie, but with cookies you can never be sure which one you get.) That said, there are like five such applications and they are all internal or use OAuth 1, so I'm probably overcomplicating things here and this is an academic concern only.
Sun, Feb 22
The more ideal version would probably treat the request as untrusted if the sub field of the cookie JWT and the access token differs, and prefer the rlc from the access token if they have the same subject. (Owner-only access tokens won't have an rlc field. For non-owner-only access tokens, in the future the rlc might be more permissive compared to a cookie, due to T409305: API tokens: use rate limit classes instead of rate limit overrides in the API gateway.)
The simplest version would be to just ignore the header when the cookie is present. That is not ideal, but can be made work if we ensure on the MediaWiki side that requests with a mismatching bearer token and cookie get rejected. The reason that's not ideal is that the cookie might not really be under the control of the client, in case of JS-based clients (which are not used much, and until recently, have been implemented weirdly (T323867: Clarify use of non-confidential OAuth 2.0 clients) but they do exist), and the cookie name is shared with normal cookie-based sessions, so an OAuth client might accidentally end up sending a cookie that breaks it. (We could use a different cookie name, but I imagine that would complicate things in Envoy in a different direction.)
Fri, Feb 20
Not for this incident specifically, but we are planning to update tests in the next few weeks (T415281: [EPIC] OAuth extension critical workflows (for automated tests enhancement)).
Thu, Feb 19
In any case it's only doing a filesort because you are grouping it on another table + sorting the results (which might be necessary if you need to list people in the order of their last contributions, but for a count it's irrelevant).
SELECT
actor_user AS `user_id`,
actor_name AS `user_name`,
MIN(rev_actor) AS `actor_id`,
MIN(user_real_name) AS `user_real_name`,
MAX(rev_timestamp) AS `timestamp`
FROM
`revision`
JOIN `actor` ON ((rev_actor = actor_id))
LEFT JOIN `user` ON ((actor_user = user_id))
WHERE
rev_page = 55943877
AND (rev_actor != 31)
AND ((rev_deleted & 4) = 0)
GROUP BY
rev_actorWe should audit the full list of differences between owner-only and normal code paths, it would give us a better idea of what features need to be recreated in whatever we use for the new recommended single-user OAuth mechanism.
Should they use a normal OAuth 2 app with the Client Credentials grant?
Probably you are not setting a sensible user agent and so the requests get throttled aggressively. See https://foundation.wikimedia.org/wiki/Policy:Wikimedia_Foundation_User-Agent_Policy
Would have been nice to get a Phan warning for this. The Builder is marked @immutable, and Phan understands that, but it seems that because internally the class does actually set the variables when creating a copy, the error is raised in the library code, not the calling code, and so we don't see it.
Wed, Feb 18
We use a fork of 9.3 now. @LucasWerkmeister can you check whether this is fixed?
FWIW the patch doesn't 100% restore the pre-upgrade behavior. Back then the user identifier was set to the empty string, now it's null, so the resulting JWT is slightly different. Probably doesn't make any difference.
Yes, thanks, fixed.
Interesting, I get
Script 'CentralAuth:FixStuckGlobalRename' not found (tried path '/vagrant/mediawiki/extensions/CentralAuth/maintenance/FixStuckGlobalRename.php' and class 'MediaWiki\Extension\CentralAuth\Maintenance\FixStuckGlobalRename').
(No way to filter for password resets in the Android app, unfortunately. gerrit 1240375 will fix that.)
Successful resets, failed resets. Seem stable.
How can it not find EmptyIterator?
Might be a train blocker, depending on what kind of clients use the client credentials endpoint (unfortunately we don't log the user agent for exceptions, and in any case the group 1 data wouldn't be that informative).
Should probably do a quick fix rather than waiting for the client credentials issue to be fully worked out.
We are working on this in T417278: Choosing client credentials grant for OAuth 2 results in an anonymous access token. Sorry, I was sure nothing uses this in production.
Tue, Feb 17
This blocks T412214: Ensure a good experience for apps which want to use OAuth credentials for a long time if we want to solve it with client credentials (which we probably do).
So apparently when access tokens are issued via the client credentials flow (that is, making an API call to /oauth2/access_token with the client ID and secret), the oauth2-server library issues an access token with no user ID (sub is the empty string), and does not create a ConsumerAcceptance record. This logic is to make sure that such access tokens are accepted.
Turns out this is harder than it seemed.
A number of MWException subclasses do presentation layer work, UserNotLoggedIn is one of them. Not great but doesn't have much to do with cross-namespace dependencies IMO. Distributing exception classes across namespaces, so that exceptions related to a specific component are within that component (like we do, somewhat, for hooks) would be a reasonable arrangement, but having presentation logic in exception classes would be still bad.
I'm sketpical about treating present-day namespaces as meaningful component boundaries. Neither "all special pages" nor "all exceptions" is a reasonable component where not depending on other components would be a useful goal, IMO.
