Page MenuHomePhabricator

Block external traffic to RESTBase /page/data-parsoid endpoint and investigate internal usage
Closed, ResolvedPublic

Description

Background Information

The /page/data-parsoid endpoint is now ready for deprecation. After some investigation we discovered that this endpoint hasn't been used since VisualEditor migrated to direct calling Parsoid in MediaWiki T320529.

What

Like we did for MCS in T328036: MCS decommission (2023) we should block the requests and then remove the code.

Open questions

Can we successfully turn this endpoint off? Are there any internal requests to this endpoint?

Why is safe to block requests

It's safe to block the requests for the following reasons:

  • Looking at the requests data, no relevant clients consume it and the majority of the requests are attempts to exploit the endpoint

Date to block the requests

The endpoint is being marked as deprecated and announcement has been sent to block requests after June 7th 2025. It's up to the Traffic team to decide what would be best after that date.

Acceptance Criteria

  • Block requests to the endpoint
  • Remove code from restbase

Event Timeline

@akosiaris this is ready to for traffic blocking. Please let us know if that make sense or if you need any other information.

Mentioned in SAL (#wikimedia-operations) [2025-06-11T13:03:27Z] <akosiaris> T393557 block requests to /api/rest_v1/page/data-parsoid

@akosiaris this is ready to for traffic blocking. Please let us know if that make sense or if you need any other information.

{{Done}}. The produced VCL is:

// FILTER T393557-page-data-parsoid
// API endpoint is removed
// This filter is generated from data in etcd. To disable it, run the following command:
// sudo requestctl disable 'cache-text/T393557-page-data-parsoid'
if ((req.method == "GET" && req.url ~ "^/api/rest_v1/page/data-parsoid")) {
    set req.http.X-Requestctl = req.http.X-Requestctl + ",T393557-page-data-parsoid";
    return (synth(403, "API endpoint is removed. See https://phabricator.wikimedia.org/T393557"));
}

Which just returns 403 to any requests to anything starting with /api/rest_v1/page/data-parsoid. I 've just tested using

curl -X 'GET' \
  'https://en.wikipedia.org/api/rest_v1/page/data-parsoid/foo/123/1234' \
  -H 'accept: application/json; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/data-parsoid/2.1.0'

And got back the error message depicted above.

The one caveat we do have, is that what applied to T328036#10722151 applies here as well. So this method does not take into account Toolforge and Cloud VPS. An announcement to them would probably be useful to avoid similar issues.

Note: I actually do not know how these are generated, so it's plausible that it's expected as long as the end point still exists even if it's returning 403 - but https://en.wikipedia.org/api/rest_v1/#/Page%20content is still documenting the existence of the endpoint.

Note: I actually do not know how these are generated, so it's plausible that it's expected as long as the end point still exists even if it's returning 403 - but https://en.wikipedia.org/api/rest_v1/#/Page%20content is still documenting the existence of the endpoint.

They require a deploy of the RESTBase software (https://gerrit.wikimedia.org/g/mediawiki/services/restbase) with removal of the stanza for e.g. https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/restbase/+/refs/heads/master/v1/content.yaml#424 in this case. I don't think we 've coordinated up to now the 2 legs of the migration, that is blocking the API endpoint and stopping to advertise it. Given RESTBase being what it is, and available resources to mess with it, that's not surprising to me, at least. That being said, yes, you are right, ideally we should have.

There is a larger chunk of work involved in deprecating and replacing API specs as a whole in T396804, but short-term we can probably relatively easily remove the retired specs on restbase

There is a larger chunk of work involved in deprecating and replacing API specs as a whole in T396804, but short-term we can probably relatively easily remove the retired specs on restbase

This works for us. I'm being bold and closing this ticket as completed and we can track the OpenAPI spec discussions in the mentioned task.

MSantos claimed this task.