Skip to content

Conversation

@d00rman
Copy link
Contributor

@d00rman d00rman commented Oct 13, 2019

This PR enables RESTBase to communicate with both Parsoid/JS and Parsoid/PHP and use separate storage for each of them. It introduces three sys modules: parsoid-js, parsoid-php and parsoid, which handle the JS variant, PHP variant and the main Parsoid module acting as the proxy, respectively.

The common Parsoid business logic has been extracted into lib/parsoid.js, while sys/parsoid-js and sys/parsoid-php only deal with variant-specific details: issuing requests to Parsoid and determining the buckets to use when storing/retrieving content. This code extraction and reshuffling was done for the following reasons:

  • to have a clean separation between Parsoid/JS and Parsoid/PHP;
  • to allow for their interchangeability, so that in case of problems it is easy to revert to having only one of them active;
  • to have a clean code situation once the transition has been completed; and
  • to avoid introducing any new permanent config parameters that are not used during the transition (i.e. users can choose to use one or the other without needing to adjust the config).

The proxy allows directing requests to either variant. It loads both variants' modules internally and uses their operations to complete requests. It is designed in such a way so as to allow an easy transition between fully using JS to fully using PHP with no config changes. When first introduced, its defaults emulate the JS-only scenario. Once the switch is fully achieved, then simply changing sys/parsoid.js for sys/parsoid-php.js in projects/sys/default.wmf.yaml with no config change results in having a fully-functional Parsoid/PHP module. The proxy can. thus, function properly with only one of variant modules loaded and configured.

In order to support the transition period, the proxy has three modes of operation: single, mirror and split. In single mode, only one variant is used, defined by the default_variant configuration value, defaulting to js. This allows us to start using the proxy with no config changes. In the final stages of the transition (before we remove the proxy), it can be changed to php to only use the PHP variant. The mirror mode is used to asynchronously mirror traffic to the PHP variant. Requests are issued to both variants, but only the JS one is returned. The amount of traffic to be mirrored can be tuned with the percentage configuration parameter. The important caveat here is that only requests for /page/{format} end points are mirrored - we cannot do so reliably for transforms since they rely on stashed content, which is likely not to be available for the PHP variant. Furthermore, when the proxy is configured in mirror mode, dependency update events are emitted only for the JS variant, so as to avoid duplicates. Finally, the split mode is used to split the traffic between the two variants based on the request domain. If one of the patterns given in the pattern configuration parameter is matched, then the variant not defined in default_variant is used, otherwise the default one is used. This mode supports the second stage of the transition, where JS will be authoritative for the majority of domains, while we will be slowly moving projects one by one (or group by group) over to using Parsoid/PHP. Because of this gradual switch, there is also a built-in fall-back for transform end points in this mode: if a request for the non-default variant 404s, then we retry the request using the default variant in order to cover the clients that are in the middle of an editing session during the switch.

Apart from these modes, the proxy also supports clients directly telling it which variant to use. If the incoming request has the PARSOID_VARIANT cookie or the X-Parsoid-Variant header set, then the request is sent directly to that variant regardless of the proxy's mode. When deciding where to send the request, the proxy gives precedence to the header in case both are set.

Bug: T230791

@d00rman
Copy link
Contributor Author

d00rman commented Oct 13, 2019

Pro tip: looking at the diff of the whole PR is a bit confusing because of the code move, but skipping the first commit and looking at the diff of the other commits paints the real picture. The first commit only moves the ParsoidService class to lib/parsoid.js without any changes to it.

@d00rman d00rman force-pushed the parsoid/php branch 4 times, most recently from 81f6d4e to a30948d Compare October 13, 2019 12:17
@d00rman
Copy link
Contributor Author

d00rman commented Oct 13, 2019

Tests are failing due to T235375

@d00rman
Copy link
Contributor Author

d00rman commented Oct 15, 2019

Tests are failing due to T235375

This has now morphed into T235478

Copy link
Contributor

@Pchelolo Pchelolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gosh holy cow this is not a nice change to review :)

@d00rman
Copy link
Contributor Author

d00rman commented Oct 15, 2019

Oh gosh holy cow this is not a nice change to review :)

Hehe, yeah, sorry about that @Pchelolo . But! You can simply look at this diff which only holds the actual code changes... Kind of :P

@d00rman
Copy link
Contributor Author

d00rman commented Oct 15, 2019

Oh gosh holy cow this is not a nice change to review :)

Hehe, yeah, sorry about that @Pchelolo . But! You can simply look at this diff which only holds the actual code changes... Kind of :P

Oops, sorry @Pchelolo wrong link. This diff is authoritative :)

.then(() => {
const dependencyUpdate = _dependenciesUpdate(hyper, req, newContent);
let dependencyUpdate = P.resolve();
if (!this.options.skip_updates) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated I believe, but huge kudos for finding this :)

@d00rman
Copy link
Contributor Author

d00rman commented Oct 15, 2019

@Pchelolo added the empty methods to lib/parsoid.js including _getParsoidReq(). I used a generic Error, though, because AssertionError is not available in Node 6.

@d00rman
Copy link
Contributor Author

d00rman commented Oct 15, 2019

So the new diff to look at is here :)

@Pchelolo
Copy link
Contributor

LGTM. We need to wait till the varnish issue is resolved.

We can bike shed about the proxy complexity, but since it's temporary, I'm happy with whatever you're happy with.

@d00rman
Copy link
Contributor Author

d00rman commented Oct 15, 2019

LGTM. We need to wait till the varnish issue is resolved.

Agreed.

We can bike shed about the proxy complexity, but since it's temporary, I'm happy with whatever you're happy with.

Yup, I concur here too. But I went for a more complex approach on purpose so as to make all of the transitions easier with less moving pieces during the actual process of transition.

Marko Obrovac added 4 commits October 16, 2019 05:49
This will allow us to reuse it in both Parsoid/JS and Parsoid/PHP sys
modules.

Bug: T230791
Using the Parsoid class, we create two new modules for the JS and PHP
variant, respectively. Each module declares a new class that inherits
from Parsoid, but sets up their methods for making requests to Parsoid
as well as methods for obtaining the bucket URIs to use.

Bug: T230791
The proxy allows directing requests to either variant. It loads both
variants' modules internally and uses their operations to complete
requests. It is designed in such a way so as to allow an easy transition
between fully using JS to fully using PHP with no config changes. When
first introduced, its defaults emulate the JS-only scenario. Once the
switch is fully achieved, then simply changing `sys/parsoid.js` for
`sys/parsoid-php.js` in `projects/sys/default.wmf.yaml` with no config
change results in having a fully-functional Parsoid/PHP module. The
proxy can. thus, function properly with only one of variant modules
loaded and configured.

In order to support the transition period, the proxy has three modes of
operation: single, mirror and split. In single mode, only one variant is
used, defined by the `default_variant` configuration value, defaulting
to `js`. This allows us to start using the proxy with no config changes.
In the final stages of the transition (before we remove the proxy), it
can be changed to `php` to only use the PHP variant. The mirror mode is
used to asynchronously mirror traffic to the PHP variant. Requests are
issued to both variants, but only the JS one is returned. The amount of
traffic to be mirrored can be tuned with the `percentage` configuration
parameter. The imporant caveat here is that only requests for
`/page/{format}` end points are mirrored - we cannot do so reliably for
transforms since they rely on stashed content, which is likely not to be
available for the PHP variant. Furthermore, when the proxy is configured
in mirror mode, dependency update events are emitted only for the JS
variant, so as to avoid duplicates. Finally, the split mode is used to
split the traffic between the two variants based on the request domain.
If one of the patterns given in the `pattern` configuration parameter is
matched, then the variant not defined in `default_variant` is used,
otherwise the default one is used. This mode supports the second stage
of the transition, where JS will be authoritative for the majority of
domains, while we will be slowly moving projects one by one (or group by
group) over to using Parsoid/PHP.

Apart from these modes, the proxy also supports clients directly telling
it which variant to use. If the incoming request has the
`PARSOID_VARIANT` cookie or the `X-Parsoid-Variant` header set, then the
request is sent directly to that variant regardless of the proxy's mode.
When deciding where to send the request, the proxy gives precedence to
the header in case both are set.

Bug: T230791
Also add unit tests for the proxy's parameter handling

Bug: T230791
Marko Obrovac added 3 commits October 16, 2019 05:49
When we are in split mode, we need to account for the transition period
of each domain, during which time some transform requests will actually
need to be handled by Parsoid/JS since the clients retrieved the JS
variant of the HTML. Thefore, for transform end points provide a
fallback in case of a 404 which happens if RESTBase can't retrieve the
stashed content.

Bug: T230791
If a client is switching between wikitext and HTML and in one of the two
transforms they fail to provide the revision in the URI, trust that the
ETag is correct and try to retrieve the stashed content using the
information in the ETag. As a fall-back, though, try with the
client-supplied revision as well.

This is a port of wikimedia#1208 to the proxy branch.

Bug: T235465
@d00rman
Copy link
Contributor Author

d00rman commented Oct 16, 2019

Ported over the fix for #1208 as well.

Marko Obrovac added 4 commits October 22, 2019 21:30
This is a temporary measure until Parsoid/PHP has the capability to do
language conversion at which point it will start emitting them.

Bug: T236382
Bug: T221174
Bug: T230791
@d00rman d00rman merged commit 9e09c5e into wikimedia:master Oct 28, 2019
@d00rman d00rman deleted the parsoid/php branch October 28, 2019 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants