Page MenuHomePhabricator

Send Api-User-Agent header from MediaWiki client-side code
Open, LowPublic

Description

In the User-Agent Policy, we encourage clients to set the Api-User-Agent header when making requests from a browser, where the User-Agent header cannot be set.

At the moment, we are not using this information, we don't even know if people send it, and our own client code doesn't send it.

Sending this header ourselves will allow us to distinguish requests from our own code from requests coming from third parties und user scripts.

Ideally, we'd be able to set this header to a value that doesn't only include the MediaWiki version, but also the component/extension and ideally even the gadget or script that is making the call.

Event Timeline

@daniel is this meant to be a discussion prompt or is there a team that should actually own moving this forward? Asking mostly because I don't think this should be a DST/Codex responsibility.

@daniel is this meant to be a discussion prompt or is there a team that should actually own moving this forward? Asking mostly because I don't think this should be a DST/Codex responsibility.

I tagged DST/Codex for awareness. It would have to be done by whoever ownes the api client code - I suppose that would be the web team.

As to urgency: this would be useful for the Interfaces team, because we'd get more meaningful signals. It's not super urgent, especially since we need T373871: Log Api-User-Agent header in Turnilo first.

Hi @daniel the api client code lacks an owner right now ( https://www.mediawiki.org/wiki/Developers/Maintainers#MediaWiki_core )
Web team doesn't own API client code currently - only skin code in MediaWiki core so we'd need to work this out.

@daniel - I assume you are referring to mw.Api client code, but there are other libraries that hit the API too (for example Vector's search uses the native fetch function - https://gerrit.wikimedia.org/g/mediawiki/skins/Vector/+/5f944947e470629172ac17149d456f607c3b87b8/resources/skins.vector.search/fetch.js#32) so applying an API user agent header across the entire MediaWiki product will likely require multiple patches in multiple places.

@daniel - I assume you are referring to mw.Api client code, but there are other libraries that hit the API too (for example Vector's search uses the native fetch function - https://gerrit.wikimedia.org/g/mediawiki/skins/Vector/+/5f944947e470629172ac17149d456f607c3b87b8/resources/skins.vector.search/fetch.js#32) so applying an API user agent header across the entire MediaWiki product will likely require multiple patches in multiple places.

Yea, I was afraid you'd say that :)

Do you have an idea how to find the relevant callers? And why they are not going through mw.Api?

Do you have an idea how to find the relevant callers?

You'd want to do an audit of different APIs. I assume some clients might be using $.ajax or event or XMLHttpRequest directly for example. Are gadgets in scope for this header? If so you'd need to consider other APIs.

Sending this header ourselves will allow us to distinguish requests from our own code from requests coming from third parties und user scripts.

And why they are not going through mw.Api?

It's kinda like asking "why are you not using X library to make API requests?". There's no requirement to use mw.Api and usually where it is used, it is because the plus sides outweigh the downsides.

As someone who uses fetch, one reason I sometimes don't use mw.Api is that there is no npm library and often I'm writing code that I want to run inside and outside MediaWiki.

In this case it might be to allow it to be abortable, I'm not sure if that's supported by mw.Api off the top of my head but the fact I don't know without consulting documentation should say something :-) - but there's nothing wrong with using fetch IMO and it side steps having to learn how to use the bespoke mw.Api if you don't use it regularly (I often use fetch these days over mw.Api for most of my use cases - note mw.Api is not available on npm for example).

I think we'll be seeing fetch more over time - it's a API which is universally understood by new JavaScript developers. Generally newer developers haven't needed to support older browsers and typically gravitate towards newer tech. There are developers now who have never used jQuery for example!

Sending this header ourselves will allow us to distinguish requests from our own code from requests coming from third parties und user scripts.

What is the intention here? Could you provide a little background on the problem you are trying to solve, rather than what you are trying to achieve? I suspect there may be other solutions or we might be able to reframe the problem in a way which doesn't require finding every API request and updating it to send the header (or at least limits the scope!).

What's the use case? Where do we want to see it used, and why? I suggest re-titling this task to describe a specific problem. Right now it's not a proposed solution since there's no area of use specified. What would qualify as "use"?

Note that @aaron is using this in the ApiFeatureUsage extension in patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ApiFeatureUsage/+/1058726.

I imagine that SRE also sometimes use it already, when throttling external traffic in relation to https://foundation.wikimedia.org/wiki/Policy:User-Agent_policy. As with regular UA strings, it isn't used much by default. But when we reach for it, it is available in raw requests and used in the same circumstances. Eg at the traffic level in HAProxy and VCL, both headers are available to requestctl filters, and in ad hoc varnishlog queries.

I have used it in the past when analysing traffic in varnishlog. Codesearch shows several first-party clients also set it.

It's not currently copied from Api-User-Agent to User-Agent, and not stored in Hadoop, Logstash, and indeed numerous other places.

Change #1112339 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] Allow setting Api-User-Agent in mw.Api

https://gerrit.wikimedia.org/r/1112339

Change #225224 abandoned by Krinkle:

[mediawiki/core@master] mediawiki.api: Make it easy to set the Api-user-agent header

Reason:

Superseded by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1112339

https://gerrit.wikimedia.org/r/225224

@Diskdance points out on wikitech-l that performance-conscious gadget authors might hesitate to use the Api-User-Agent header as it adds an extra preflight request. I wonder if there is any reason against just allowing to set the user agent via a query parameter? It would make caching impossible, but then the action API is already mostly uncacheable (see T97096, T155314, T122867 for example).

In any case, I think that's one more reason to provide an abstraction for setting the user agent when using mw.Api, so we can change the HTTP level mechanism used for it if necessary.

@Diskdance points out on wikitech-l that performance-conscious gadget authors might hesitate to use the Api-User-Agent header as it adds an extra preflight request. I wonder if there is any reason against just allowing to set the user agent via a query parameter? It would make caching impossible, but then the action API is already mostly uncacheable (see T97096, T155314, T122867 for example).

The caching aspect would be quite bad once we start using the REST API for fetching page HTML at scale...

Also, I suspect it would be more fiddly to expose the api user agent in Turnilo and other tooling. We'd probably want to normalize at the edge - conver the query param and convert it to a header before forwarding to the next layer.

In any case, I think that's one more reason to provide an abstraction for setting the user agent when using mw.Api, so we can change the HTTP level mechanism used for it if necessary.

Yes, true. Though it might be nice to give people a way to pick query param vs header, if we allow both. That way, they could optimize depending on their use case / access pattern.

... we could encode the user agent info in an Accept header, e.g. Accept: application/json;user-agent=my-gadget. That's a terrible hack, and would be bad for Turnilo as well, but it would work around the CORS issue.

Could we automatically set the gadget name in mw.Api by giving each gadget its own copy of mw? I'm not very familiar with the details of JS execution - could we just override getScript in GadgetResourceLoaderModule and wrap the entire thing in a scope that sets a local value for mw?

nshahquinn-wmf subscribed.

(This is a request to collect new data rather than a report of a data bug.)

Change #1112339 merged by jenkins-bot:

[mediawiki/core@master] mediawiki.api: Allow setting Api-User-Agent via mw.Api constructor

https://gerrit.wikimedia.org/r/1112339

Suggested Tech News text:

When using the mw.Api Javascript library, it is now possible to identify the tool using it with the userAgent parameter: var api = new mw.Api( { userAgent: 'MyGadget/1.0.1' } ); If you maintain a gadget or user script, please set a user agent; it helps with library and server maintenance and with differentiating between legitimate and illegitimate traffic. 1 2

... we could encode the user agent info in an Accept header, e.g. Accept: application/json;user-agent=my-gadget. That's a terrible hack, and would be bad for Turnilo as well, but it would work around the CORS issue.

Yeah that sounds terrible. There are five headers that don't trigger preflight: Accept, Accept-Language, Content-Language, Content-Type, Range. None of them can support a user agent in a non-horrible way.

Could we automatically set the gadget name in mw.Api by giving each gadget its own copy of mw? I'm not very familiar with the details of JS execution - could we just override getScript in GadgetResourceLoaderModule and wrap the entire thing in a scope that sets a local value for mw?

It's theoretically possible, we do the same for require(). We could override mw in the local scope with a Proxy object that replaces the mw.Api constructor. Seems like a level of complexity that's way disproportionate with the benefits. (For one thing, seems hard to expose to extensions - I think we would have to add some hook mechanism to ResourceLoader::addImplementScript() and to mw.loader.impl().)

In the User-Agent Policy, we encourage clients to set the Api-User-Agent header when making requests from a browser, where the User-Agent header cannot be set.

When does this happen? I thought every time a browser accesses something, it sends a user agent.

In the User-Agent Policy, we encourage clients to set the Api-User-Agent header when making requests from a browser, where the User-Agent header cannot be set.

When does this happen? I thought every time a browser accesses something, it sends a user agent.

The browser sends its own user-agent string, but we are interested in what Gadget makes the call. So an additional header (Api-User-Agent) needs to be set, since the browser's user agent header cannot be overwritten.

Hi!

Does this work with mw.ForeignApi? In the only signature that I'm aware of, mw.ForeignApi accepts a string which is the foreign domain. I can't find any mention of userAgent in the source code for mw.ForeignApi.