Page MenuHomePhabricator

Joe (Giuseppe Lavagetto)
Spy

Today

  • No visible events.

Tomorrow

  • No visible events.

Friday

  • No visible events.

User Details

User Since
Oct 3 2014, 5:57 AM (578 w, 5 d)
Availability
Available
LDAP User
Giuseppe Lavagetto
MediaWiki User
GLavagetto (WMF) [ Global Accounts ]

Recent Activity

Today

Joe triaged T409253: Continuous breakages of apt-staging as Unbreak Now! priority.

Triaging to UBN! as this is blocking activity for at least two hypotheses.

Wed, Nov 5, 5:43 AM · Infrastructure-Foundations, GitLab, collaboration-services
Joe created T409253: Continuous breakages of apt-staging.
Wed, Nov 5, 5:43 AM · Infrastructure-Foundations, GitLab, collaboration-services
Joe created P84813 Sigh.
Wed, Nov 5, 5:34 AM

Mon, Nov 3

Joe renamed T409024: Collect known client fingerprints for common libraries and browsers from Collect known client fingerprints for common libraries to Collect known client fingerprints for common libraries and browsers.
Mon, Nov 3, 6:41 AM · Traffic, Hiddenparma, SRE
Joe triaged T409024: Collect known client fingerprints for common libraries and browsers as Medium priority.
Mon, Nov 3, 6:15 AM · Traffic, Hiddenparma, SRE
Joe created T409024: Collect known client fingerprints for common libraries and browsers.
Mon, Nov 3, 6:15 AM · Traffic, Hiddenparma, SRE
Joe closed T398161: FY 25/26 WE 5.4.3: CDN (text) filtering rationalization as Resolved.
Mon, Nov 3, 5:51 AM · SRE

Thu, Oct 30

Joe added a member for WMF-NDA: SCherukuwada.
Thu, Oct 30, 7:10 AM

Wed, Oct 29

Joe closed T408714: Tomcat Stacktrace Disclosure – idp-test.wikimedia.org as Declined.

Anyone wanting to know what that tomcat is running can just use github.

Wed, Oct 29, 7:52 PM · CAS-SSO, SecTeam-Processed, Security, Infrastructure-Foundations, Vuln-Infoleak
Joe closed T404826: Integrate code from the private repository into the CDN as Resolved.
Wed, Oct 29, 5:03 AM · Hiddenparma, Traffic, SRE
Joe closed T404826: Integrate code from the private repository into the CDN, a subtask of T400270: Browser behaviour detection at the edge, as Resolved.
Wed, Oct 29, 5:03 AM · Patch-For-Review, Hiddenparma, Traffic, SRE

Fri, Oct 24

Joe added a comment to T404826: Integrate code from the private repository into the CDN.
Fri, Oct 24, 5:14 AM · Hiddenparma, Traffic, SRE

Thu, Oct 23

Joe triaged T408062: FY 25/26 WE 5.4.7 Standardize thumbnail sizes as High priority.
Thu, Oct 23, 5:55 AM · MediaViewer, Data-Persistence, Thumbor, SRE-swift-storage, Traffic
Joe created T408062: FY 25/26 WE 5.4.7 Standardize thumbnail sizes.
Thu, Oct 23, 5:55 AM · MediaViewer, Data-Persistence, Thumbor, SRE-swift-storage, Traffic
Joe triaged T408060: Distinguish request classes based on user-agent declaration as High priority.
Thu, Oct 23, 5:46 AM · Traffic, Hiddenparma
Joe added a parent task for T408060: Distinguish request classes based on user-agent declaration: T408061: FY 25/26 WE 5.4.6 Classify the top 30 spiders by traffic as known bots.
Thu, Oct 23, 5:45 AM · Traffic, Hiddenparma
Joe added a subtask for T408061: FY 25/26 WE 5.4.6 Classify the top 30 spiders by traffic as known bots: T408060: Distinguish request classes based on user-agent declaration.
Thu, Oct 23, 5:45 AM · bot-traffic-requests, Traffic
Joe removed a parent task for T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits: T408061: FY 25/26 WE 5.4.6 Classify the top 30 spiders by traffic as known bots.
Thu, Oct 23, 5:45 AM · Traffic, Hiddenparma, SRE
Joe removed a subtask for T408061: FY 25/26 WE 5.4.6 Classify the top 30 spiders by traffic as known bots: T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits.
Thu, Oct 23, 5:45 AM · bot-traffic-requests, Traffic
Joe added a parent task for T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits: T408061: FY 25/26 WE 5.4.6 Classify the top 30 spiders by traffic as known bots.
Thu, Oct 23, 5:45 AM · Traffic, Hiddenparma, SRE
Joe added a subtask for T408061: FY 25/26 WE 5.4.6 Classify the top 30 spiders by traffic as known bots: T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits.
Thu, Oct 23, 5:45 AM · bot-traffic-requests, Traffic
Joe created T408061: FY 25/26 WE 5.4.6 Classify the top 30 spiders by traffic as known bots.
Thu, Oct 23, 5:45 AM · bot-traffic-requests, Traffic
Joe created T408060: Distinguish request classes based on user-agent declaration.
Thu, Oct 23, 5:37 AM · Traffic, Hiddenparma

Wed, Oct 22

Joe created P84210 (An Untitled Masterwork).
Wed, Oct 22, 5:09 AM

Tue, Oct 21

Joe added a comment to T406555: Allow rate-limiting by other properties in requestctl.

The generated VCL code for wmfuniq-based rate-limiting looks like this:

Tue, Oct 21, 8:51 AM · Hiddenparma
Joe added a comment to T407706: Global block exception for AddDesc app.

For instance, I see some of the IP ranges from GCP are part of a global on-wiki block, so I don't think you'll be able to create edits from that range.

Tue, Oct 21, 8:42 AM · Traffic
Joe added a comment to T407706: Global block exception for AddDesc app.

Hi, I'm not sure this task is tagged with the right tags.

Tue, Oct 21, 7:58 AM · Traffic
Joe closed T398161: FY 25/26 WE 5.4.3: CDN (text) filtering rationalization as Resolved.
Tue, Oct 21, 6:50 AM · SRE

Fri, Oct 17

Joe triaged T406555: Allow rate-limiting by other properties in requestctl as High priority.
Fri, Oct 17, 2:22 PM · Hiddenparma
Joe added a comment to T407513: Key packages missing from trixie-wikimedia.

For python3-conftool and dervied packages, you can simply patch the .gitlab-ci.yml file to generate debs also for trixie, and then get them from apt-staging; assuming apt-staging is setup for trixie.

Fri, Oct 17, 7:50 AM · Patch-For-Review, Infrastructure-Foundations, SRE-swift-storage, SRE

Thu, Oct 16

Joe closed T407092: Exclude logged in users from requestctl general filters, create separate scope for it. as Resolved.
Thu, Oct 16, 8:50 AM · Hiddenparma, SRE
Joe closed T407092: Exclude logged in users from requestctl general filters, create separate scope for it., a subtask of T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits, as Resolved.
Thu, Oct 16, 8:50 AM · Traffic, Hiddenparma, SRE
Joe added a comment to T399688: varnish wikimedia_trust ACL isn't used anymore.

I think you have misunderstood what this task was about: it's about specifically removing that ACL from varnish, not about removing the concept, which is central to how we're doing traffic filtering.

Thu, Oct 16, 6:07 AM · Patch-For-Review, Traffic

Mon, Oct 13

Joe created T407092: Exclude logged in users from requestctl general filters, create separate scope for it..
Mon, Oct 13, 6:22 AM · Hiddenparma, SRE
Joe claimed T406555: Allow rate-limiting by other properties in requestctl.
Mon, Oct 13, 6:14 AM · Hiddenparma
Joe added a comment to T389932: Improve the user experience adding new nodes to puppet.

One issue with using just the FQDN is that is breaks tools which rely on matching other hostnames, for instance the dcl tool reuses the site.pp defs for matching hosts of the form sretest1002.eqiad.default.svc.k8s.lan. I'm not aware of other tooling that is dependent on the regexes. On the other hand, if we moved the logic into an ENC script we could add some facility to return the correct role for dcl's hostnames as well.

Maybe I'm missing something, but one of the reasons I wanted to go with that format was also to make it easier for the puppet-compiler and dcl to parse which hosts belong to a node, without parsing regexes in site.pp. There's many ways to get a match without the TLD, starting from .startswith() in python.

Sorry I didn't realize that was on of the aims. I was focused on the regex complexity piece, which I thought was the primary concern of this task. In the case of dcl, the tool stands up a puppetserver, so the parsing of site.pp is done via puppet, the same as in production. For the puppet-compiler, I wasn't aware there were issues with the current python parsing breaking?

Mon, Oct 13, 4:09 AM · User-Joe, Puppet, Infrastructure-Foundations
Joe added a comment to T389932: Improve the user experience adding new nodes to puppet.

In proposing possible solutions, I would love to understand a bit more why our site.pp uses complex regexes. From looking through the git log it appears that the main use case is moving servers in and and out of their insetup role. What are the other reasons people use complex regexes?

Mon, Oct 13, 4:05 AM · User-Joe, Puppet, Infrastructure-Foundations

Tue, Oct 7

Joe created T406555: Allow rate-limiting by other properties in requestctl.
Tue, Oct 7, 9:16 AM · Hiddenparma
Joe added a project to T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits: Hiddenparma.
Tue, Oct 7, 9:08 AM · Traffic, Hiddenparma, SRE
Joe closed T239856: Fold services recommendations into Standards for services RfC as Resolved.

Marking as resolved as this task is obsolete at this point IMO.

Tue, Oct 7, 9:06 AM · serviceops, MediaWiki-Engineering
Joe updated the task description for T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits.
Tue, Oct 7, 6:13 AM · Traffic, Hiddenparma, SRE
Joe created T406545: FY 25/26 WE 5.4.5: Enforce global rate-limits.
Tue, Oct 7, 6:13 AM · Traffic, Hiddenparma, SRE

Sep 23 2025

Joe added a comment to T404291: Allow proxy server to accept another valid http header instead of 'HOST'.

I 've already replied in T394982#11201112, but I find it improbable that SRE will be implementing such a behavior to accommodate for the change in node.js fetch() API. The HTTP Host header is pretty important across the infrastructure. Rewriting other HTTP headers to it might make debugging and reasoning more difficult than needed.

Sep 23 2025, 9:25 AM · Language and Product Localization, SRE, CXServer, envoy

Sep 22 2025

Joe added a comment to T402959: Find a solution for SPARQL federation that is blocked by stricter user agent policy enforcement.

Hi @Lydia_Pintscher , SRE can make some exception here. It seems warranted given the status quo in the broader sparql ecosystem.

But I want us to get the scope of the exception right:
Is federation traffic always against the /sparql endpoint?
Does federation traffic always set an Accept header like Accept: application/sparql-results[...] ?

Thanks!

Sep 22 2025, 6:08 AM · wmde-wikidata-tech, Wikidata-Query-Service, Wikidata

Sep 18 2025

Joe added a comment to T404826: Integrate code from the private repository into the CDN.

Coming to @SLyngshede-WMF's concern, I think some of them are valid, like having disjoint configuration going besides the actual content of a file, including the puppet breakage.

Sep 18 2025, 10:40 AM · Hiddenparma, Traffic, SRE

Sep 17 2025

Joe created T404826: Integrate code from the private repository into the CDN.
Sep 17 2025, 9:43 AM · Hiddenparma, Traffic, SRE
Joe closed T401786: Allow inline patterns in HIDDENPARMA expressions, a subtask of T398161: FY 25/26 WE 5.4.3: CDN (text) filtering rationalization, as Resolved.
Sep 17 2025, 9:25 AM · SRE
Joe closed T401786: Allow inline patterns in HIDDENPARMA expressions as Resolved.
Sep 17 2025, 9:25 AM · Hiddenparma
Joe closed T400119: Block traffic from user-agents not honoring our policy as Resolved.

I will tentatively close this task for now.

Sep 17 2025, 6:35 AM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe closed T400119: Block traffic from user-agents not honoring our policy, a subtask of T398161: FY 25/26 WE 5.4.3: CDN (text) filtering rationalization, as Resolved.
Sep 17 2025, 6:35 AM · SRE

Sep 16 2025

Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

Yeah getting the swagger spec via curl https://api.wikimedia.org/core/v1/wikipedia/en/search/page?q=earth&limit=10 also no longer works I guess.

Sep 16 2025, 10:06 AM · User-notice-archive, Patch-For-Review, Traffic, SRE

Aug 22 2025

Joe added a comment to T402546: Unable to log into LinguaLibre due to user-agent / rate limit.

Thx for reporting, i will try to check and fix it today.

Aug 22 2025, 10:52 AM · Lingua-Libre-Legacy
Joe created T402612: Add pageview information to turnilo's webrequest_sampled_live (is_pageview is always "-").
Aug 22 2025, 6:18 AM · Data-Engineering, SRE, Traffic

Aug 21 2025

Joe closed T396621: Requestctl should use x-provenance header, a subtask of T392217: FY 24/25 WE 4.3.12 systematically populate requestctl database, as Resolved.
Aug 21 2025, 6:04 AM · Traffic
Joe closed T396621: Requestctl should use x-provenance header as Resolved.
Aug 21 2025, 6:04 AM · Patch-For-Review, Traffic, Hiddenparma
Joe closed T399058: Ability for the code to apply different sets of rules depending on request, a subtask of T398161: FY 25/26 WE 5.4.3: CDN (text) filtering rationalization, as Resolved.
Aug 21 2025, 6:03 AM · SRE
Joe closed T399058: Ability for the code to apply different sets of rules depending on request as Resolved.
Aug 21 2025, 6:03 AM · User-Joe, Hiddenparma

Aug 19 2025

Joe added a comment to T400023: Deploy sitemaps API for Commons.

Sorry, for reasons I don't understand my first request for that page got a x-cache-status: pass, I see now it's cacheable at the edge.

Aug 19 2025, 3:28 PM · MW-1.45-notes (1.45.0-wmf.12; 2025-07-29), Commons, Community-Tech (Sea Lion Squad), SEO
Joe added a comment to T400023: Deploy sitemaps API for Commons.

A couple of things we might want to consider:
* Is there any reason not to cache for some time the sitemap api URLs at the edge? Even a few hours would be beneficial I think if we ever publicize this more.

  • Is there a reason to include User_talk: pages and similar in the sitemap? these pages are rarely cached and relatively expensive to render in some cases, do we care about them being indexed in search engines?
Aug 19 2025, 3:22 PM · MW-1.45-notes (1.45.0-wmf.12; 2025-07-29), Commons, Community-Tech (Sea Lion Squad), SEO

Aug 18 2025

Joe added a comment to T402142: Intermittent access issues to English Wikipedia on desktop/laptop.

To clarify:

  • If your error is coming from our filtering at the edge, you will get an error page containing "Error" in the <h1> tags
  • Service Unavailable only happens when the negative response is coming from the backend.
Aug 18 2025, 1:34 PM · SecTeam-Processed, Traffic, SRE
Joe updated subscribers of T402142: Intermittent access issues to English Wikipedia on desktop/laptop.

I should also add - if this has happened in the last week, that might be connected to T400119

@Joe This seems to be related to {T400697} (private task). I don't believe the work on T400119 is connected to this issue, just an FYI.

Aug 18 2025, 1:32 PM · SecTeam-Processed, Traffic, SRE
Joe added a comment to T402142: Intermittent access issues to English Wikipedia on desktop/laptop.

I just noticed a pattern and wanted to surface it. For context: in the VRT (the volunteer helpdesk) we normally only see one such “Wikipedia inaccessible” email every couple of months, and almost always when there’s a confirmed outage. Over the past 1–2 weeks, however, we’ve received several separate reports, which is unusual enough that I thought it worth bringing up. Especially given the Reddit thread linked above.

I can’t share ISP/IP details directly here because of the VRT NDA, but I did try to collect as much information as possible from the reports (OS, browser versions, screenshots, etc.). In two of the tickets, users provided their IP and in one of them the entire user-agent (quoted above). What I can say without breaching confidentiality is that one was from an ISP in India and one from an ISP in the US. Those two seem to be using Chromium 126 related browsers. There were no other error message.

The one who solved their own issue, I can't confirm is related or not, since it is now fixed.

Aug 18 2025, 6:57 AM · SecTeam-Processed, Traffic, SRE
Joe added a comment to T402142: Intermittent access issues to English Wikipedia on desktop/laptop.

I should also add - if this has happened in the last week, that might be connected to T400119 - if users have any browser extension that modifies their user-agent, for example removing it, that would cause failures now.

Aug 18 2025, 5:31 AM · SecTeam-Processed, Traffic, SRE
Joe added a comment to T402142: Intermittent access issues to English Wikipedia on desktop/laptop.

This task seems to coalesce different issues; having said that, we've had quite a few abusers lately that used old versions of Chrome as their user agent, so we had to block traffic at times. Most of the rules should be disabled soon, and we are trying to improve our detection of actual browsers vs forged ones.

Aug 18 2025, 5:26 AM · SecTeam-Processed, Traffic, SRE

Aug 15 2025

Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

Will GitLab CI be excluded from this policy?

I know you added the ignore edit to this, but as this thread is at least partly documentation at this point folks should be aware that the default gitlab-ci runners run in Digital Ocean's public cloud and are likely to be subject to public cloud restrictions. Adding tags: [wmcs] to your job specification will pin your jobs to a different pool of runners which are hosted in a Cloud VPS project.

Aug 15 2025, 2:20 PM · User-notice-archive, Patch-For-Review, Traffic, SRE

Aug 14 2025

Joe created bot-traffic-requests.
Aug 14 2025, 1:48 PM

Aug 13 2025

Joe triaged T401786: Allow inline patterns in HIDDENPARMA expressions as High priority.
Aug 13 2025, 8:34 AM · Hiddenparma
Joe created T401786: Allow inline patterns in HIDDENPARMA expressions.
Aug 13 2025, 8:34 AM · Hiddenparma
Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

Please be more clear about the UA policy enforced here. I am always setting the Api-User-Agent header in my code with my Wikipedia username inside, but my bot still stopped working. My code always set the Api-User-Agent header rather than User-Agent header, because the same code may run in a browser or outside a browser. The most probable reason my code stopped working is that the Api-User-Agent header is null and void, just because User-Agent header is missing, right?

Aug 13 2025, 8:17 AM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

@Midleading you are always supposed to have a user-agent. Api-user-agent is just for situations where you are unable to MODIFY that agent to provide additional identifying information for your tool (browsers). It is not an alternative to having no user-agent, only an alternative for a non-modifiable user-agent (and is described as such in the policy in my opinion, though suggestions for improvement are always welcome on the talk page I guess).

Aug 13 2025, 8:14 AM · User-notice-archive, Patch-For-Review, Traffic, SRE

Aug 6 2025

Joe updated the task description for T400119: Block traffic from user-agents not honoring our policy.
Aug 6 2025, 9:13 AM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe added a comment to T401109: Error in revscoring-editquality-damaging - itwiki-damaging-predictor-default.

We have 5M successful responses in the same period of time.

Sorry I hadn't realized the number of requests was this large, it reframes the problem a bit:

Aug 6 2025, 7:33 AM · Machine-Learning-Team
Joe triaged T396621: Requestctl should use x-provenance header as High priority.
Aug 6 2025, 6:02 AM · Patch-For-Review, Traffic, Hiddenparma
Joe reopened T396621: Requestctl should use x-provenance header, a subtask of T392217: FY 24/25 WE 4.3.12 systematically populate requestctl database, as Open.
Aug 6 2025, 5:17 AM · Traffic
Joe reopened T396621: Requestctl should use x-provenance header as "Open".

You missed changing the use of the abuse ACLs and removing the loading of netmaps for X-Public-Cloud and similar stuff as well.

Aug 6 2025, 5:17 AM · Patch-For-Review, Traffic, Hiddenparma

Aug 5 2025

Joe added a comment to T400881: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents.

MediaWiki-Platform-Team will pick up the core part of this. Note that the soonest a change to the InstantCommons code could make a difference is after the next MediaWiki release (so in about 3 months). Many sites will only upgrade when the next LTS version is released (in about 15 months).

Maybe we can recommend some code that you can drop into your wiki configuration today to affect user agents.

Aug 5 2025, 8:54 AM · MW-1.43-notes, MW-1.44-notes, MW-1.39-notes, MW-1.45-notes (1.45.0-wmf.15; 2025-08-19), MediaWiki-Platform-Team, MediaWiki-extensions-QuickInstantCommons, MediaWiki-File-management, Traffic, SRE

Aug 4 2025

Joe reopened T399058: Ability for the code to apply different sets of rules depending on request, a subtask of T398161: FY 25/26 WE 5.4.3: CDN (text) filtering rationalization, as Open.
Aug 4 2025, 4:06 PM · SRE
Joe reopened T399058: Ability for the code to apply different sets of rules depending on request as "Open".

I actually resolved the task by mistake: there's still some work to do on the varnish/haproxy side.

Aug 4 2025, 4:06 PM · User-Joe, Hiddenparma
Joe added a comment to T401109: Error in revscoring-editquality-damaging - itwiki-damaging-predictor-default.

I fear this is a well known we've already encountered. You can see here https://grafana.wikimedia.org/d/zsdYRV7Vk/istio-sidecar?orgId=1&from=now-7d&to=now&timezone=utc&var-cluster=aWotKxQMz&var-namespace=revscoring-editquality-damaging&var-backend=$__all&var-response_code=503&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99&viewPanel=panel-16 that most of the 503 have failure code UC which is envoy/istio for Upstream connection failed. In our experience, this can happen when a limit is not configured for the life of an upstream http connection.

Aug 4 2025, 3:41 PM · Machine-Learning-Team
Joe added a comment to T401109: Error in revscoring-editquality-damaging - itwiki-damaging-predictor-default.

I've taken a look from the side of the api.log mediawiki generates, and I was a bit surprised to find 16 identical calls over the span of an hour for the same revid that I found failing in logstash.

Aug 4 2025, 3:23 PM · Machine-Learning-Team
Joe added a comment to T399057: Introduce allowlists into the CDN (text) filtering.

I thought about this a bit, and I think we need to distinguish between the grading system (which should express trust levels) and request feature (like bearing a valid session cookie, being identified as a bot...).

Aug 4 2025, 7:57 AM · Traffic, Hiddenparma

Aug 1 2025

Joe updated the task description for T400881: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents.
Aug 1 2025, 4:12 PM · MW-1.43-notes, MW-1.44-notes, MW-1.39-notes, MW-1.45-notes (1.45.0-wmf.15; 2025-08-19), MediaWiki-Platform-Team, MediaWiki-extensions-QuickInstantCommons, MediaWiki-File-management, Traffic, SRE
Joe added a comment to T400881: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents.

Are you suggesting including the wiki user who caused the request to happen (as opposed to the server admin)? That feels like a privacy violation and i fail to see how it would be helpful in abuse fighting. Keep in mind the triggering event could just be a page view (whenever the page falls out of the appropriate cache) and may be an anonoymous user.

I was looking at this from the prespective of wiki operators. If we have the username in the UA, that allows us to rate-limit requests based on the UA string and only block abusive behaviour of single users, instead of need to punish an entire wiki. I'm not sure how that's a privacy violation, unless you assume that the username used on wiki X is somehow private identifying information, which I'm not convinced of.

I think the personal information might be the combined information that user X was viewing article Y/Commons images Y, rather than necessarily a username on its own (although, to be fair, possibly also just a username on its own). IANAL, but I believe that either of these may be covered by the GDPR's broad definition of "personal data", and would potentially cause data-protection headaches for wiki operators were we to implement a system that sent any sort of user data back to Wikimedia servers. On the face of the issue, I think I agree with @Bawolff here.

Aug 1 2025, 4:11 PM · MW-1.43-notes, MW-1.44-notes, MW-1.39-notes, MW-1.45-notes (1.45.0-wmf.15; 2025-08-19), MediaWiki-Platform-Team, MediaWiki-extensions-QuickInstantCommons, MediaWiki-File-management, Traffic, SRE
Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

To give a bit of context, over the last day we saw:

  • 62 million valid requests with no user-agent
  • 24.5 million valid requests with user agent okhttp/*
  • 17 million valid requests with user-agent python-{requests,urllib}*
Aug 1 2025, 9:15 AM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

Case in point, I can't find any request with that UA in the logs for the past few days. Indeed it's not in the list of "medium to large offenders" we're interested in blocking. Even if you are right and it can happen, it's not remotely on our radar as something to block.

It would be passed by Api-User-Agent, as User-Agent in the browser cannot be overridden by Javascript, but Api-User-Agent is a policy compliant fallback header to be set for that. Not sure if those are collated in the data you are looking at.

https://github.com/wikimedia/mediawiki/blob/master/resources/src/mediawiki.api/index.js
line 76 and 301

Aug 1 2025, 9:12 AM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe added a comment to T400540: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org.

Please note, this solution is temporary: bots working from clouds will break repeatedly if they're not properly identified with us.

Aug 1 2025, 6:00 AM · collaboration-services, Traffic, Phabricator, SRE
Joe added a comment to T400881: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents.

[Anyways, I adjusted the QuickInstantCommons part. That's the only part of this bug i plan to work on, so it should stay open for the stuff in MW core]

Aug 1 2025, 4:12 AM · MW-1.43-notes, MW-1.44-notes, MW-1.39-notes, MW-1.45-notes (1.45.0-wmf.15; 2025-08-19), MediaWiki-Platform-Team, MediaWiki-extensions-QuickInstantCommons, MediaWiki-File-management, Traffic, SRE
Joe added a comment to T400881: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents.

Are you suggesting including the wiki user who caused the request to happen (as opposed to the server admin)? That feels like a privacy violation and i fail to see how it would be helpful in abuse fighting. Keep in mind the triggering event could just be a page view (whenever the page falls out of the appropriate cache) and may be an anonoymous user.

Aug 1 2025, 4:10 AM · MW-1.43-notes, MW-1.44-notes, MW-1.39-notes, MW-1.45-notes (1.45.0-wmf.15; 2025-08-19), MediaWiki-Platform-Team, MediaWiki-extensions-QuickInstantCommons, MediaWiki-File-management, Traffic, SRE
Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

Where does UAs like MediaWiki-JS/1.45.0-wmf.12, the defaults used by a plain new mw.Api() in an on-wiki script, stand with this?

(If they're going to be blocked, there are (from quick insource searches) at least a few thousand scripts that need fixing.)

Aug 1 2025, 4:06 AM · User-notice-archive, Patch-For-Review, Traffic, SRE

Jul 31 2025

Joe placed T400881: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents up for grabs.
Jul 31 2025, 11:26 AM · MW-1.43-notes, MW-1.44-notes, MW-1.39-notes, MW-1.45-notes (1.45.0-wmf.15; 2025-08-19), MediaWiki-Platform-Team, MediaWiki-extensions-QuickInstantCommons, MediaWiki-File-management, Traffic, SRE
Joe created T400881: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents.
Jul 31 2025, 11:25 AM · MW-1.43-notes, MW-1.44-notes, MW-1.39-notes, MW-1.45-notes (1.45.0-wmf.15; 2025-08-19), MediaWiki-Platform-Team, MediaWiki-extensions-QuickInstantCommons, MediaWiki-File-management, Traffic, SRE
Joe updated the task description for T400119: Block traffic from user-agents not honoring our policy.
Jul 31 2025, 11:21 AM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

@Joe I wasn't addressing AWB used as a bot, but as an interactive Windows app. Still, the rest of your comment seems applicable. The contact information would be the user's Wikipedia name (not the AWB authors'). We could prepend :p:ll:User: (p = project, ll = language code) if that would be better. I may need to take advice on bots' ids because I've never used that feature.

Jul 31 2025, 5:28 AM · User-notice-archive, Patch-For-Review, Traffic, SRE

Jul 30 2025

Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

AutoWikiBrowser uses the MediaWiki API and User-Agent is WikiFunctions/n.n.n.n (Microsoft Windows NT n.n.n.n; .NET CLR 4.0.n.n). I don't know if that is distinctive enough. I guess a quick fix could be to add the logged-in user, although I'm not completely sure if any API request happens with nobody logged in yet.

Is AWB OK for now, or do we need that fix by Sep 1?

Jul 30 2025, 4:44 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

external mw-related: requests with user-agent strings set by MediaWiki (like ForeignApiRepo) or by other mw-related software like WDQS Updater

Does this include things like ForeignFileRepos/InstantCommons?

Jul 30 2025, 4:37 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe added a comment to T400119: Block traffic from user-agents not honoring our policy.

Would it be possible to include a link to this phab ticket and/or the policy page in the HTTP error response?

Jul 30 2025, 4:30 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe removed a project from T400119: Block traffic from user-agents not honoring our policy: Hiddenparma.
Jul 30 2025, 1:31 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe updated the task description for T400119: Block traffic from user-agents not honoring our policy.
Jul 30 2025, 1:31 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe updated the task description for T400119: Block traffic from user-agents not honoring our policy.
Jul 30 2025, 1:29 PM · User-notice-archive, Patch-For-Review, Traffic, SRE
Joe updated the task description for T400119: Block traffic from user-agents not honoring our policy.
Jul 30 2025, 1:05 PM · User-notice-archive, Patch-For-Review, Traffic, SRE