-
-
Notifications
You must be signed in to change notification settings - Fork 95
Add JavaScript option #2483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add JavaScript option #2483
Conversation
|
This draft is currently made as a proof of concept with a disregard for any other renderer it might be breaking. |
|
I created some example ZIM files including no JavaScript, all JavaScript or only JavaScript trusted not to request external content. Please take a look at them and let me know how you feel about them.
|
|
I tried PoC ZIMs. For some reasons, the All javascripts version is obviously causing problems in the web console ... but nothing really visible in the articles. The "trusted javascript" works very well. I'm not sufficiently export to assess the list of trusted modules, or the one of "addModules" version, but this is more a detail we can refine over the years anyway. The only things I can say is that I confirm that the "trusted javascript with additional modules" is required/enough to solve #1491 and #1911 articles. Only "trusted javascript" is not enough. I've tested with JS disabled in Chrome and I confirm the ZIM degrades nicely (i.e. it looks like we get the same content than currently without this PR merged). All points above makes me consider that we might want to change default scraper behavior (compared to current PR behavior), so that scraper creates "near-perfect" ZIM for "newbies", without needing additional flags. Especially important for people willing to ZIM wikipedia which is somehow our primary audience:
@Markus-Rost Thanks a lot anyway, very impressive change. Glad you made it work! I did not had any look at the code since you specified it is only a PoC. Let me know when it is worth it. |
Oh, I'm getting that as well. Not sure what happened there. I tested it locally using 100.tsv which worked fine and created the 10.tsv version to fit the GitHub upload limit, but didn't actually test that one again.
Yes, those tables need the
Do you mean having all JS included by default? That doesn't seem like a good idea as some JS not expecting a request to fail can easily break stuff on the page.
I've already added
I've looked through basically all the modules I could find loaded on Wikipedia articles as well as a few other wikis and those are all the modules which won't try to load any non-scraped content. Sadly all three MathJax extensions either load external JS or from an extension path we can't easily rewrite or guess ahead of time. @benoit74 let me know about any specific scripts to add you have in mind and I'll take a look at them. |
|
OK, then what I meant is having And what I meant by having more modules was having everything needed for wikipedia (having the ancestry table in mind). But if this comes from the |
Yeah, the |
@benoit74 Oh please take a look at the code. I assume that I've broken all renderers besides ActionParse, though I'm also not familiar at all with those renderers and have no idea how badly broken they are. I'm certainly gonna need some help with getting those issues fixed. |
|
Apologies for the delay. I've now tested these archives in Kiwix JS Browser Extension, with the same results as @benoit74 reported.
I don't see any differences in formatting, suggesting to me that at least for the tested pages JS_none is as good as other formats, and would be faster to load on older devices due to fewer resources needing to be extracted. However, it would be necessary to remove references to scripts not in the archive to avoid the multiple console errors from attempts to load these. However, JS_trusted seems like a good compromise to me, so long as calls to untrusted resources are removed from the landing page. Regarding the core issue of calling external scripts or resources #2310, it seems none of the test pages I noticed had that problem in any of these test archives, but possibly I didn't test deeply enough. |
Currently the scraper has indeed more or less solved #2310 by removing all JS, so that we de-facto do not load anymore external scripts. But this is not really a long term solution. This is why we kept #2310 open saying this was more an interim fix and the medium term solution is this PR, adding back "just enough" scripts. |
The calling of external scripts seen in #2310 is mostly just the attempted loading of JS modules. By rewriting startup.js to load those modules from Of course JS_all can still have scripts attempting to load external resources, just that en.wikipedia is a bad example for that because they have very little JS doing that in the first place. Getting JS_all on es.wikipedia for example will load external scripts through the
I have the suspicion that my filtering of JS modules stopped the already existing landing page JS from being included 😅 Ah, I'll fix JS_none including startup.
You should notice small things like the navboxes at the bottom of articles being collapsed by default now or the video/audio player popup on "Michael Jackson", but yes there aren't a lot of differences in formatting as en.wikipedia does not rely as heavily on JS as other wikis might. |
res/templates/javaScript.html
Outdated
| @@ -0,0 +1,7 @@ | |||
| <script> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline scripts is taboo in ZIMs. Experienced showed it could cause problem on some readers around security constraints. It seems to be more or less resolved, but given the price to pay is not so high, we should transform this into JS scripts of their own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content of the script is different for every article, so that will be one script file for each article in the zim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realized that ... even if you already explained it in the past ... sorry.
This is indeed going to a be a bit "painful" (more a technical constraint anyway than a real usability problem).
@kelson42 @Jaifroid I would appreciate your experience on that point.
Which alternative is best:
- have few lines of inline JS in all articles (and only 1 HTML file per article)
- or have 1 HTML and 1 JS file per article but no inline JS at all (more or less doubling the number of entries in the ZIM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kelson42 ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed about this topic with @kelson42 and he confirmed it feels way better to avoid inline JS at all prices, especially for very popular ZIMs like the ones made by mwoffliner.
He however came up with a new idea which could avoid to have 1 HTML + 1 JS file per article. It lookt like it could be possible to include variables we currently add in the JS (RLCONF, RLSTATE, RLPAGEMODULES) inside the HTML instead (with some "hidden" DOM properties, e.g <meta name="RLCONF" content="..." />, but I'm open to other ideas) and load these variables inside the appropriate JS object at page load. With this "workaround" we will need the same JS code on all articles, hence only one JS file per ZIM and no more inline JS.
@Markus-Rost WDYT, do you think this would be feasible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! I was more referring to any CSP that you might want to add to Wikipedia articles in the ZIM (for readers that don't enforce one). Indeed, a CSP can't in fact be loosened by a second CSP, it can only be made stricter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that @Markus-Rost point is correct: we probably shouldn't mind about using inline JS in mwoffliner ZIMs because we have a nice no-JS fallback present in the ZIM anyway. When the browser context does not allows to load inline JS, then JS will anyway not load at all and we are back to the no-JS fallback (which is not a dirty fallback at all since this is what is currently used on all production ZIMs).
I just tested test ZIMs from #2310 (comment) on Kiwix JS browser extension in all modes (restricted, with/without service worker, ...) and it looked fine. When in restricted mode, JS did not loaded but display was still fine. Of course there was some errors in the console, but I don't think we should really care. @Jaifroid do you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If inline JS is now allowed, please open a dedicated issue in openzim/overview to manage this, because this is everything but clear why now it should be OK, whereas it has been forbidden during years. We can not change the rules, just because it is convenient to get a PR merged - as it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benoit74 On my side, I had to develop a few years ago workarounds for lack of inline JS support in some contexts, notably Chromium extension (even in ServiceWorker mode). In such contexts the user is effectively redirected to a PWA from the extension (where there is no such restriction). Restricted mode is now effectively a no-JS mode that is kept for compatibility with old systems and as a security/screening measure when opening a new untrusted ZIM for the first time.
This should be an Org decision (@kelson42 ). There is absolutely no need to hold things up due to Kiwix JS or the PWA any more. There are independent arguments for avoiding inline JS in ZIMs where we know that JS is not needed, and for trying to keep our baseline, bread-and-butter Wikipedia ZIMs as simple as possible and compatible with even the oldest frameworks. With a bit of work, any inline script snippet can be converted into an attached script loaded at runtime (allowed by all frameworks of course), especially when we know the contents of inline JS script tags in a given context. Nevertheless, it seems reasonable to allow ZIM creators to take that decision themselves, if they need inline JS for their Mediawiki ZIMs, but for us to act conservatively vis-à-vis official Kiwix Wikipedia ZIMs.
As you say, Wikipedia ZIMs provide a good NOSCRIPT default in any case for now.. It should be possible to provide the long-awaited ZIM JS API #87 independently of whether or not inline JS is allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If inline JS is now allowed, please open a dedicated issue in
openzim/overviewto manage this, because this is everything but clear why now it should be OK, whereas it has been forbidden during years. We can not change the rules, just because it is convenient to get a PR merged - as it.
@kelson42 As far as I can tell, the reason for inline scripts to be disallowed is that some readers are/were blocking inline scripts from executing, resulting in (partially) unusable ZIMs. At least that was the case in all issues regarding inline scripts I could find (#324, #1578, #2096, kiwix/kiwix-js#355, kiwix/kiwix-js#789, kiwix/kiwix-js#865, openzim/overview#52), please let me know if there are additional reasons against inline scripts that I missed.
So avoiding inline scripts was to ensure anything needing those inline scripts would work properly in all ZIM readers. Now in this specific case here, avoiding inline scripts won't actually make the feature work in those readers as they are also blocking the eval based feature check in the startup JS and thus will always fall back to a noscript behavior. This means we have nothing to gain here from avoiding inline scripts.
See also openzim/overview#52 for a similar situation, through with a lot less graceful noscript behavior.
299a154 to
efdfefe
Compare
|
@kelson42 we need your decision on this one as well, it sat idle for way too long. I again see no reason to be worried about the inline scripts, because as stated by @Markus-Rost the ZIM will anyway fallback to no-JS even if we remove these inline scripts (because startup.js also requires I tried the wikipedia_en_10_JS_trusted_2025-08.zip ZIM in kiwix-desktop (it was your main concern) and it works like a charm, including all JS-based functions (like show/hide at the bottom of "Michael Jackson" page). Taking measures to remove these inline scripts without providing any added value to the end users looks like a waste of our precious resources and a risk to introduce bugs. The no-JS fallback is working exactly as-of today (without this PR merged), i.e. the ZIM will still be high-quality enough. I understand you would prefer to provide same ZIM behavior on all readers, but at some point I feel like it is important to recognize that the effort is not worth it. I've also opened openzim/overview#62 to discuss the general policy for openZIM |
|
I also tried the wikipedia_en_10_JS_trusted_2025-08.zip ZIM in kiwix-js and kiwix-pwa in Restricted mode and I confirm it fallbacks very nice to no-JS mode (i.e. the ZIM looks just like a currently prod ZIM not supporting JS at all). |
That one has a bunch of violations in ServiceWorkerLocal mode in Chrome, with the CSP violations looking pretty rough:
But more worryingly, in ServiceWorker mode, we get a bunch of attempts to access an external PHP server (wikipedia.org/w/load.php). At the very least those calls should be removed ISTM! -
|
I'm only seeing the two CSP violations from the two inline scripts existing. These are expected (and can be reduced to just one).
This is worrying, as only the startup module does those requests and has been modified in this PR to make JS work properly in the first place. The only way I can think of for you to get these errors in the PR ZIM again is if your service worker is somehow using a startup.js file cached from a different Wikipedia ZIM, which seems like a major security issue. @Jaifroid |
@Markus-Rost Just to confirm that I only do these tests in the Browser Extension, which takes a purist approach and only uses the resources supplied in the ZIM except (currently) for an added dark CSS if the user selects that (to be changed soon to native dark). (Even in the PWA, I never supply JS, only CSS for some other transforms.) |
|
I'm having less time to work on this now, but I would like to avoid this PR becoming stale. @benoit74 can you review the PR further (besides the inline script existing), so that I have some pointers to work on whenever I'm available? |
benoit74
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about DO_PROPAGATION, ALL_READY_FUNCTION and LOAD_PHP? Looks like these were "useful" things at some point, but indeed never called since years since the module name was not correct so startup.js was never fixed.
The failing unit test is about inline scripts, it will be easily resolved once discussion about this has settled.
I've added a comment about what looks like the root cause of all failing e2e tests.
And this PR obviously misses a significant share of unit and e2e tests to ensure we will not break too easily this very important feature in the future.
I'm unsure what
No idea what |
|
OK then this PR should completely remove |
56ce8cd to
e951b80
Compare
e951b80 to
36c6b9a
Compare





Fix #2310
--javaScriptoption with values "none", "trusted" or "all" (default being "trusted").--addModulesoption for additional ResourceLoader modules