-
Notifications
You must be signed in to change notification settings - Fork 81
Parsoid: Add the variant proxy #1207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pro tip: looking at the diff of the whole PR is a bit confusing because of the code move, but skipping the first commit and looking at the diff of the other commits paints the real picture. The first commit only moves the ParsoidService class to lib/parsoid.js without any changes to it. |
81f6d4e to
a30948d
Compare
|
Tests are failing due to T235375 |
Pchelolo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh gosh holy cow this is not a nice change to review :)
| .then(() => { | ||
| const dependencyUpdate = _dependenciesUpdate(hyper, req, newContent); | ||
| let dependencyUpdate = P.resolve(); | ||
| if (!this.options.skip_updates) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated I believe, but huge kudos for finding this :)
|
@Pchelolo added the empty methods to |
|
So the new diff to look at is here :) |
|
LGTM. We need to wait till the varnish issue is resolved. We can bike shed about the proxy complexity, but since it's temporary, I'm happy with whatever you're happy with. |
Agreed.
Yup, I concur here too. But I went for a more complex approach on purpose so as to make all of the transitions easier with less moving pieces during the actual process of transition. |
This will allow us to reuse it in both Parsoid/JS and Parsoid/PHP sys modules. Bug: T230791
Using the Parsoid class, we create two new modules for the JS and PHP variant, respectively. Each module declares a new class that inherits from Parsoid, but sets up their methods for making requests to Parsoid as well as methods for obtaining the bucket URIs to use. Bug: T230791
The proxy allows directing requests to either variant. It loads both
variants' modules internally and uses their operations to complete
requests. It is designed in such a way so as to allow an easy transition
between fully using JS to fully using PHP with no config changes. When
first introduced, its defaults emulate the JS-only scenario. Once the
switch is fully achieved, then simply changing `sys/parsoid.js` for
`sys/parsoid-php.js` in `projects/sys/default.wmf.yaml` with no config
change results in having a fully-functional Parsoid/PHP module. The
proxy can. thus, function properly with only one of variant modules
loaded and configured.
In order to support the transition period, the proxy has three modes of
operation: single, mirror and split. In single mode, only one variant is
used, defined by the `default_variant` configuration value, defaulting
to `js`. This allows us to start using the proxy with no config changes.
In the final stages of the transition (before we remove the proxy), it
can be changed to `php` to only use the PHP variant. The mirror mode is
used to asynchronously mirror traffic to the PHP variant. Requests are
issued to both variants, but only the JS one is returned. The amount of
traffic to be mirrored can be tuned with the `percentage` configuration
parameter. The imporant caveat here is that only requests for
`/page/{format}` end points are mirrored - we cannot do so reliably for
transforms since they rely on stashed content, which is likely not to be
available for the PHP variant. Furthermore, when the proxy is configured
in mirror mode, dependency update events are emitted only for the JS
variant, so as to avoid duplicates. Finally, the split mode is used to
split the traffic between the two variants based on the request domain.
If one of the patterns given in the `pattern` configuration parameter is
matched, then the variant not defined in `default_variant` is used,
otherwise the default one is used. This mode supports the second stage
of the transition, where JS will be authoritative for the majority of
domains, while we will be slowly moving projects one by one (or group by
group) over to using Parsoid/PHP.
Apart from these modes, the proxy also supports clients directly telling
it which variant to use. If the incoming request has the
`PARSOID_VARIANT` cookie or the `X-Parsoid-Variant` header set, then the
request is sent directly to that variant regardless of the proxy's mode.
When deciding where to send the request, the proxy gives precedence to
the header in case both are set.
Bug: T230791
Also add unit tests for the proxy's parameter handling Bug: T230791
When we are in split mode, we need to account for the transition period of each domain, during which time some transform requests will actually need to be handled by Parsoid/JS since the clients retrieved the JS variant of the HTML. Thefore, for transform end points provide a fallback in case of a 404 which happens if RESTBase can't retrieve the stashed content. Bug: T230791
Bug: T230791
If a client is switching between wikitext and HTML and in one of the two transforms they fail to provide the revision in the URI, trust that the ETag is correct and try to retrieve the stashed content using the information in the ETag. As a fall-back, though, try with the client-supplied revision as well. This is a port of wikimedia#1208 to the proxy branch. Bug: T235465
|
Ported over the fix for #1208 as well. |
This is a temporary measure until Parsoid/PHP has the capability to do language conversion at which point it will start emitting them. Bug: T236382 Bug: T221174 Bug: T230791
This PR enables RESTBase to communicate with both Parsoid/JS and Parsoid/PHP and use separate storage for each of them. It introduces three
sysmodules:parsoid-js,parsoid-phpandparsoid, which handle the JS variant, PHP variant and the main Parsoid module acting as the proxy, respectively.The common Parsoid business logic has been extracted into
lib/parsoid.js, whilesys/parsoid-jsandsys/parsoid-phponly deal with variant-specific details: issuing requests to Parsoid and determining the buckets to use when storing/retrieving content. This code extraction and reshuffling was done for the following reasons:The proxy allows directing requests to either variant. It loads both variants' modules internally and uses their operations to complete requests. It is designed in such a way so as to allow an easy transition between fully using JS to fully using PHP with no config changes. When first introduced, its defaults emulate the JS-only scenario. Once the switch is fully achieved, then simply changing
sys/parsoid.jsforsys/parsoid-php.jsinprojects/sys/default.wmf.yamlwith no config change results in having a fully-functional Parsoid/PHP module. The proxy can. thus, function properly with only one of variant modules loaded and configured.In order to support the transition period, the proxy has three modes of operation: single, mirror and split. In single mode, only one variant is used, defined by the
default_variantconfiguration value, defaulting tojs. This allows us to start using the proxy with no config changes. In the final stages of the transition (before we remove the proxy), it can be changed tophpto only use the PHP variant. The mirror mode is used to asynchronously mirror traffic to the PHP variant. Requests are issued to both variants, but only the JS one is returned. The amount of traffic to be mirrored can be tuned with thepercentageconfiguration parameter. The important caveat here is that only requests for/page/{format}end points are mirrored - we cannot do so reliably for transforms since they rely on stashed content, which is likely not to be available for the PHP variant. Furthermore, when the proxy is configured in mirror mode, dependency update events are emitted only for the JS variant, so as to avoid duplicates. Finally, the split mode is used to split the traffic between the two variants based on the request domain. If one of the patterns given in thepatternconfiguration parameter is matched, then the variant not defined indefault_variantis used, otherwise the default one is used. This mode supports the second stage of the transition, where JS will be authoritative for the majority of domains, while we will be slowly moving projects one by one (or group by group) over to using Parsoid/PHP. Because of this gradual switch, there is also a built-in fall-back for transform end points in this mode: if a request for the non-default variant 404s, then we retry the request using the default variant in order to cover the clients that are in the middle of an editing session during the switch.Apart from these modes, the proxy also supports clients directly telling it which variant to use. If the incoming request has the
PARSOID_VARIANTcookie or theX-Parsoid-Variantheader set, then the request is sent directly to that variant regardless of the proxy's mode. When deciding where to send the request, the proxy gives precedence to the header in case both are set.Bug: T230791