Page MenuHomePhabricator

jberkel
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Sunday

  • No visible events.

User Details

User Since
Mar 31 2015, 8:12 PM (559 w, 2 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Jberkel [ Global Accounts ]

Recent Activity

Sep 24 2025

jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

The situation has improved somewhat, but there are still caching issues: if a page A transcludes template B, and B is updated (but not A), then the HTML dumps will still show the old content of B in A until A itself is updated.

Sep 24 2025, 3:27 PM · Wikimedia Enterprise, Dumps-Generation

Sep 14 2025

jberkel added a comment to T303652: Include more namespaces in Wiktionary HTML dumps.

@Fenakhay: I think we'll have to solve this ourselves for now. The reconstruction namespace is a lot smaller, it should be doable to produce dumps for them.

Sep 14 2025, 8:41 PM · Dumps-Generation, Wikimedia Enterprise

May 29 2025

jberkel added a comment to T393198: Snapshot API: namespace filtering does not work.

confirmed! thanks

May 29 2025, 8:59 PM · Wikimedia Enterprise

May 25 2025

jberkel added a comment to T393203: Snapshot API: unable to download chunks.

Also, it looks like there's another bug where the SDK doesn't handle the request limit situation properly and keeps on retrying the requests in quick succession before seemingly getting rate-limited for the retries:

May 25 2025, 12:02 AM · Wikimedia Enterprise

May 24 2025

jberkel added a comment to T393203: Snapshot API: unable to download chunks.

@creynolds Thanks for investigating. With those 1500 requests I was only able to download 35 of the 72 chunks of the Wiktionary dump. I suspect this can be explained by the SDK making several requests for each chunk. In my case I had it set to 5 MB (the default is 25MB), and each chunk is ~ 200 MB, so that makes (72 * 200) / 5 = 2880 requests. With the default transfer size of 25 MB this would just be ~ 576 requests and still in the free tier. Given that the chunk size is configurable it's strange to have these accounted for by number of requests sent. Maybe it would make more sense to cap by amount of data transferred instead.

May 24 2025, 11:54 PM · Wikimedia Enterprise
jberkel reopened T393203: Snapshot API: unable to download chunks as "Open".

How many free chunk requests are there? I'm now getting 429 responses on chunk downloads, and the API dashboard confusingly says: "0 / 0 Chunk requests left" (from a free account outside WMCS).

May 24 2025, 3:03 PM · Wikimedia Enterprise
jberkel closed T393203: Snapshot API: unable to download chunks as Resolved.
May 24 2025, 8:36 AM · Wikimedia Enterprise
jberkel added a comment to T393203: Snapshot API: unable to download chunks.

Confirmed, works now, thanks.

May 24 2025, 8:35 AM · Wikimedia Enterprise

May 2 2025

jberkel added a comment to T393203: Snapshot API: unable to download chunks.

Will these requests work when sent from WMCS?

May 2 2025, 7:35 PM · Wikimedia Enterprise
jberkel added a comment to T389542: NEW/CHANGE FEATURE REQUEST: Documentation for v1 Enterprise endpoint deprecation .

Hey @jberkel

yes accessing these through the Enterprise APIs does require a separate login for now and is a little more involved than how it was on the WMF dumps site. Hopefully the SDKs available on github make it easier to get started.

May 2 2025, 5:31 PM · Data-Engineering (Q3 2025 January 1st - March 31th), Test Kitchen, Data-Platform
jberkel created T393203: Snapshot API: unable to download chunks.
May 2 2025, 5:27 PM · Wikimedia Enterprise
jberkel updated the task description for T393198: Snapshot API: namespace filtering does not work.
May 2 2025, 4:36 PM · Wikimedia Enterprise
jberkel created T393198: Snapshot API: namespace filtering does not work.
May 2 2025, 4:35 PM · Wikimedia Enterprise

Apr 22 2025

jberkel added a comment to T389542: NEW/CHANGE FEATURE REQUEST: Documentation for v1 Enterprise endpoint deprecation .

I was pointed to this ticket on T390839. I'd like to raise some concerns regarding the reduced accessibility and visibility of HTML dumps following the removal of the mirroring.

Apr 22 2025, 12:02 PM · Data-Engineering (Q3 2025 January 1st - March 31th), Test Kitchen, Data-Platform

Apr 21 2025

jberkel added a comment to T390839: The 20250401 dumps haven't started on time because the mediawikiwiki dump from 20250320 is looping.

The HTML dumps have been removed by the Enterprise team (see https://dumps.wikimedia.org/other/enterprise_html/ for alternatives).

Apr 21 2025, 6:56 PM · MW-1.45-notes (1.45.0-wmf.9; 2025-07-08), Data-Platform-SRE (2025.06.13 - 2025.07.04), Data-Engineering (Q4 2025 April 1st - June 30th), Dumps-Generation
jberkel added a comment to T390839: The 20250401 dumps haven't started on time because the mediawikiwiki dump from 20250320 is looping.

This also affects the HTML dumps, right? There are none in https://dumps.wikimedia.org/other/enterprise_html/runs/20250401/ or https://dumps.wikimedia.org/other/enterprise_html/runs/20250420/.

Apr 21 2025, 11:44 AM · MW-1.45-notes (1.45.0-wmf.9; 2025-07-08), Data-Platform-SRE (2025.06.13 - 2025.07.04), Data-Engineering (Q4 2025 April 1st - June 30th), Dumps-Generation

Jul 9 2024

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

Done some testing with the latest (20240701) dumps (allowing for some tolerance around the moment of dump generation):

Jul 9 2024, 12:02 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Jun 4 2024

jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Jun 4 2024, 2:46 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation
jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

That's good news. I've done some tests, and it's looking much better now. The XML dumps haven't been released yet (due to T365501), so there's no baseline to do more detailed testing.

Jun 4 2024, 2:43 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

May 26 2024

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

Latest HTML enwikt dump (20240520) vs XML dump:

May 26 2024, 10:55 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

May 23 2024

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

It's probably just the new content, with the baseline still being incomplete. I'll check with the XML dumps.

May 23 2024, 2:29 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation
jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
May 23 2024, 10:20 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Apr 18 2024

jberkel added a comment to T362894: Data quality: HTML dumps contain unexplainably outdated revisions of some pages.

The HTML dumps are pretty much useless until T351712 is fixed.

Apr 18 2024, 4:22 PM · Wikimedia Enterprise, Dumps-Generation, WMDE-References-FocusArea

Mar 25 2024

jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Mar 25 2024, 11:02 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation
jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

Can anyone clarify though? It seems that the new sub-tasks are now stuck again.

Mar 25 2024, 11:00 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Mar 18 2024

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

It probably means the investigation has been "resolved". The main task is now T351712 + subtasks.

Mar 18 2024, 2:13 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Mar 1 2024

jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Mar 1 2024, 1:47 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Feb 21 2024

jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Feb 21 2024, 8:52 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Feb 19 2024

jberkel added a comment to T349899: 'digero' tool uses an unreasonable amount of disk space.

I'll add a command to automatically clear the tmp storage, that should help

Feb 19 2024, 12:17 PM · Tools
jberkel added a comment to T349899: 'digero' tool uses an unreasonable amount of disk space.

I've deleted tmp and other unused stuff it's now down to 16GB, is that acceptable?

Feb 19 2024, 12:14 PM · Tools

Feb 5 2024

jberkel added a comment to T351712: Q3- Q4: Snapshots service is failing to decode some Kafka messages .

Could you explain a bit more what this means, please?

Feb 5 2024, 1:09 PM · Wikimedia Enterprise, Epic

Feb 2 2024

jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Feb 2 2024, 11:50 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Jan 26 2024

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

Latest enwikt dump is now at 9.6 GB, still some way to go to the 13GB of the 20230701 dump (also incomplete, but still useful as a baseline).

Jan 26 2024, 12:10 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation
jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Jan 26 2024, 12:08 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Jan 9 2024

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

OK. I think it might be worth putting a disclaimer somewhere, perhaps on https://dumps.wikimedia.org/other/enterprise_html/, to warn users that the dumps are incomplete.

Jan 9 2024, 7:14 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation
jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

@REsquito-WMF thanks! So this means the next dumps will have more data, but will still be incomplete until this other bug is fixed?

Jan 9 2024, 6:48 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Dec 31 2023

jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Dec 31 2023, 10:41 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Dec 11 2023

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

@REsquito-WMF not sure if the changes were already in place, but the current enwiktionary NS0 dump is still at 3.5 GB (compared to 13 GB on 20230701).

Dec 11 2023, 8:15 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation
jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Dec 11 2023, 8:13 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Nov 6 2023

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

Is there anything going to be done about this? The enterprise dumps have been in full failure mode for a few months now and are absolutely unusable. I really don't know how an obvious total failure of service can stay in triage hell for such a long time. I understand WMF resources are limited, but then at least let volunteers help out with this. My question about the code generating the dumps above is still unanswered. The transparency/communication on this whole issue has been miserable.

Nov 6 2023, 9:28 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation
jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Nov 6 2023, 9:09 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Oct 27 2023

jberkel added a comment to T349899: 'digero' tool uses an unreasonable amount of disk space.

We don't really need to keep all the old dumps around, I've started the deletion of all dump files before 2023. Different files are needed different purposes: for the stats, and for the "wanted entries" on Wiktionary. After generating the dumps, all the data "lives" on Wiktionary, except for the raw data, which is hosted on ~tools.digero/www and shouldn't be deleted. Right now it uses about 1.3G.

Oct 27 2023, 2:49 PM · Tools

Oct 24 2023

jberkel added a comment to T165935: "Lua error: not enough memory" on certain en.wiktionary pages.

@tstarling Thanks for unblocking this! 🙌

Oct 24 2023, 4:58 PM · Performance Issue, Scribunto, All-and-every-Wiktionary

Oct 20 2023

jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Oct 20 2023, 3:00 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Oct 5 2023

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

Some random guessing: perhaps the error handling code is borked, and it just finishes the dump and closes the file (without erroring the process)? But why then would so many repositories hit errors at the same time? All the 7-20 dumps seem to be affected, maybe some site-wide network/server problems which weren't handled properly?

Oct 5 2023, 1:06 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Oct 4 2023

jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

so why not simply base the HTML dumps off them?

This does not work. You cannot convert wikitext to HTML in offline mode. You need access to running Mediawiki instance to render macros, templates, etc. So rendering HTML is by definition a dynamic process and something which has not been available otherwise.

Oct 4 2023, 8:29 AM · Wikimedia Enterprise, Dumps-Generation
jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

I've suggested this previously: the XML dumps have been around for a very long time, and compared to the HTML version, they are very reliable, so why not simply base the HTML dumps off them? Then you could easily compare the counts of the two files, it would also help users who consume both types of data.

When you say "base off them", are you suggesting that the HTML dumps be produced by iterating over the XML dumps and then fetching the HTML content for each row? This would be a cumbersome approach since it introduces an unnecessary extra dependency. The HTML content of the new dumps is not directly derived from the XML dumps, so I don't see much advantage to this approach. I agree that it would be nice to snapshot the wiki content before dumping but this isn't feasible given the way that HTML rendering requires random access to all articles and templates.

Oct 4 2023, 8:25 AM · Wikimedia Enterprise, Dumps-Generation
jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

I've suggested this previously: the XML dumps have been around for a very long time, and compared to the HTML version, they are very reliable, so why not simply base the HTML dumps off them? Then you could easily compare the counts of the two files, it would also help users who consume both types of data.

Oct 4 2023, 8:11 AM · Wikimedia Enterprise, Dumps-Generation
jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Oct 4 2023, 8:04 AM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Sep 20 2023

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

Any idea why this would affect primarily non-wikipedia instances? Is the code which generates these dumps available somewhere?

Sep 20 2023, 4:04 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation
jberkel updated the task description for T345176: {Investigation} Different file sizes for dumps.
Sep 20 2023, 3:07 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Sep 15 2023

jberkel added a comment to T345176: {Investigation} Different file sizes for dumps.

Weirdly, there seems to be less variation in filesizes for Wikipedia dumps:

Sep 15 2023, 8:48 PM · Wikimedia Enterprise (sprint 53), Dumps-Generation

Aug 24 2023

jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

file sizes from the most recent enwikt HTML dumps (NS0):

Aug 24 2023, 11:13 AM · Wikimedia Enterprise, Dumps-Generation

Jul 24 2023

jberkel reopened T305407: Stale data / missing pages in HTML ("enterprise") as "Open".

Hasn't been fixed yet, data is still missing.

Jul 24 2023, 5:54 PM · Wikimedia Enterprise, Dumps-Generation

Jul 21 2023

jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

Ok, I hope this can be rolled out quickly, it can't get much worse than the current state

Jul 21 2023, 3:03 PM · Wikimedia Enterprise, Dumps-Generation
jberkel reopened T305407: Stale data / missing pages in HTML ("enterprise") as "Open".

I just checked the latest dumps (2023-07-20), and it's now worse: there are around 2.5 million pages missing from the HTML dump (using the XML dump as a baseline).

Jul 21 2023, 2:40 PM · Wikimedia Enterprise, Dumps-Generation

Jul 12 2023

jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

Why was this already marked as resolved? New dumps haven't even been published yet, so it's impossible to verify.

Jul 12 2023, 8:09 PM · Wikimedia Enterprise, Dumps-Generation

Jun 12 2023

jberkel closed T338770: Concurrent gradle jobs on toolforge as Invalid.

Closing this, maybe it'll be useful for future reference. I haven't added documentation to wikitech, not sure where it should go.

Jun 12 2023, 10:46 AM
jberkel added a comment to T338770: Concurrent gradle jobs on toolforge.

I'll see if I can prebuilt the binaries and then just launch the commands without gradle to avoid this issue (so the locks are only held during building, not execution)

Jun 12 2023, 9:37 AM
jberkel added a comment to T338770: Concurrent gradle jobs on toolforge.

Have we tried declaring a different gradle home for each job?

https://github.com/gradle/gradle/issues/8750#issuecomment-605016788

Jun 12 2023, 9:24 AM
jberkel created T338770: Concurrent gradle jobs on toolforge.
Jun 12 2023, 7:30 AM

Jun 10 2023

jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

There are ~150 entries missing from the HTML dump (compared to 2200 earlier):

Jun 10 2023, 11:53 PM · Wikimedia Enterprise, Dumps-Generation
jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

It looks like the situation has improved with the latest dump (20230601, enwikt):

Jun 10 2023, 11:24 PM · Wikimedia Enterprise, Dumps-Generation

Jun 9 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

looks like the files have finally been synced to toolforge!

Jun 9 2023, 9:45 AM · Wikimedia Enterprise, Dumps-Generation

Jun 7 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

still in progress?

Yes please :-) The rsync is still in progress!

Jun 7 2023, 8:36 AM · Wikimedia Enterprise, Dumps-Generation

Jun 5 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

The rsync, which copies the files over to the nfs share accessible to toolforge, is still in progress.

Jun 5 2023, 12:42 PM · Wikimedia Enterprise, Dumps-Generation

Jun 2 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

Looks like the data was copied successfully this time! I've downloaded the enwiktionary-NS0 dump and the checksum matches.

Jun 2 2023, 7:10 AM · Wikimedia Enterprise, Dumps-Generation

May 29 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

Looking into potential fixes and trying to figure out the best way to handle this.

May 29 2023, 3:42 PM · Wikimedia Enterprise, Dumps-Generation
jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

It might be the case that we are just serving the checksum of the previous dump.
Meaning: we are grabbing the checksum before the upload has finished.

May 29 2023, 12:43 PM · Wikimedia Enterprise, Dumps-Generation
jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

@ArielGlenn if the API side isn't fixed until the June run would it be possible to ignore the checksums and copy the files regardless? We've been dump-less for 2 months now…

May 29 2023, 9:43 AM · Wikimedia Enterprise, Dumps-Generation

May 25 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

@Protsack.stephan Where are the checksums calculated? Can you re-index the metadata of the dump files on the API side so that they match the actual file content? It looks like they might get calculated before the file is fully processed, or they are calculated from a different version of the file (as you indicated in your comment)?

May 25 2023, 9:36 AM · Wikimedia Enterprise, Dumps-Generation
jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

@ArielGlenn Is the downloaded data usable, that is, can you decompress the files without error? If the files are OK, maybe it's a problem with the checksum generation: if the checksums are off only for some files, it could be related to the file size. Perhaps some sort of overflow where the hashes are calculated?

May 25 2023, 6:18 AM · Wikimedia Enterprise, Dumps-Generation

May 22 2023

jberkel updated the task description for T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.
May 22 2023, 7:07 AM · Wikimedia Enterprise, Dumps-Generation
jberkel renamed T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs from Missing Enterprise Dumps in 2023-04-20 and 2023-05-01 runs to Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.
May 22 2023, 7:06 AM · Wikimedia Enterprise, Dumps-Generation

May 17 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

Another question, where are the enterprise dumps stored on toolforge now? They seem to have stopped updating October last year.

$ ls /public/dumps/public/other/enterprise_html/runs/
20220720  20220801  20220820  20220901	20220920  20221001

The rsync job meant to update the dumps after files have been downloaded on the primary host has not been running since last year. It was recently fixed and we expect the data on clouddumps1002 to be updated on the next run.

May 17 2023, 2:46 PM · Wikimedia Enterprise, Dumps-Generation

May 16 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

Another question, where are the enterprise dumps stored on toolforge now? They seem to have stopped updating October last year.

May 16 2023, 6:45 PM · Wikimedia Enterprise, Dumps-Generation
jberkel added a comment to T320343: Include "make" in all images.

Thanks for moving this one forward!

May 16 2023, 10:09 AM · Toolforge (Software install/update)

May 11 2023

jberkel added a comment to T331765: Outdated page / corrupt data in enwiki-NS0-20230220-ENTERPRISE-HTML.json.tar.gz.

Perhaps the same underlying issue as T305407.

May 11 2023, 3:26 PM · Wikimedia Enterprise Volunteer Request, Wikimedia Enterprise

May 8 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

The files haven't materialized, guess something is still amiss…

May 8 2023, 7:56 AM · Wikimedia Enterprise, Dumps-Generation

May 4 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

Yes that's what I meant, thanks 🤞

May 4 2023, 12:53 PM · Wikimedia Enterprise, Dumps-Generation
jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

Ok, so the files have been generated, but not copied? Can they be recovered?

May 4 2023, 6:15 AM · Wikimedia Enterprise, Dumps-Generation

May 2 2023

jberkel added a comment to T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.

Thanks! Is there any way to check the HTML dump progress/state "from the outside"? The XML dumps have a status page + the machine readable dumpstatus.json

May 2 2023, 1:15 PM · Wikimedia Enterprise, Dumps-Generation
jberkel created T335761: Missing Enterprise Dumps in 2023-04-20, 2023-05-01 and 2023-05-20 runs.
May 2 2023, 11:07 AM · Wikimedia Enterprise, Dumps-Generation

Apr 11 2023

jberkel added a comment to T303652: Include more namespaces in Wiktionary HTML dumps.

Related to T318371

Apr 11 2023, 6:35 PM · Dumps-Generation, Wikimedia Enterprise

Mar 24 2023

jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

Ok, let me know once you have dumps available with the new infra and I'll re-generate them.

Mar 24 2023, 12:35 PM · Wikimedia Enterprise, Dumps-Generation
jberkel added a comment to T303652: Include more namespaces in Wiktionary HTML dumps.

On the English Wiktionary we now use HTML dumps to generate our stats. Some of our content is not in the mainspace and therefore not reflected in the statistics. There are also problems generating information related to proto-languages, these live in the Reconstruction: namespace.

Mar 24 2023, 12:45 AM · Dumps-Generation, Wikimedia Enterprise
jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

Thanks, are you referring to the deprecation of restbase/MCS? On the English Wiktionary, we're relying more and more on these dumps for statistics and maintenance tasks, and many editors have noticed problems with data derived from these dumps.

Mar 24 2023, 12:25 AM · Wikimedia Enterprise, Dumps-Generation

Mar 22 2023

jberkel added a comment to T305407: Stale data / missing pages in HTML ("enterprise") .

Another cache fail related ticket, probably not related though: T226931

Mar 22 2023, 9:58 PM · Wikimedia Enterprise, Dumps-Generation

Mar 16 2023

jberkel added a comment to T331906: Add Lua function to read out previous section heading.

Looks like T122934 is relevant and would help with this. Unfortunately, there's been no movement on that task recently.

Mar 16 2023, 12:10 PM · Scribunto, All-and-every-Wiktionary

Dec 5 2022

jberkel added a comment to T320343: Include "make" in all images.

It works when adding -t latest.

Dec 5 2022, 11:29 AM · Toolforge (Software install/update)

Dec 4 2022

jberkel added a comment to T320343: Include "make" in all images.

I've been looking at submitting a patch for this myself, but while building the docker images from https://gerrit.wikimedia.org/g/operations/docker-images/toollabs-images
I get the following error:

Dec 4 2022, 1:41 PM · Toolforge (Software install/update)

Nov 9 2022

jberkel updated the task description for T322725: Allow selection of the page title in 2017 Wikitext Editor on Vector 2022.
Nov 9 2022, 4:55 PM · Verified, MW-1.40-notes (1.40.0-wmf.12; 2022-11-28), Editing-team (Kanban Board), VisualEditor
jberkel added a comment to T322725: Allow selection of the page title in 2017 Wikitext Editor on Vector 2022.

I have disabled all gadgets and beta features (except "Visual Editing" and "New wikitext mode"), still the same result.
I've also tried it with Safari (see screenshot).

Nov 9 2022, 4:47 PM · Verified, MW-1.40-notes (1.40.0-wmf.12; 2022-11-28), Editing-team (Kanban Board), VisualEditor
jberkel updated the task description for T322725: Allow selection of the page title in 2017 Wikitext Editor on Vector 2022.
Nov 9 2022, 9:20 AM · Verified, MW-1.40-notes (1.40.0-wmf.12; 2022-11-28), Editing-team (Kanban Board), VisualEditor
jberkel created T322725: Allow selection of the page title in 2017 Wikitext Editor on Vector 2022.
Nov 9 2022, 9:20 AM · Verified, MW-1.40-notes (1.40.0-wmf.12; 2022-11-28), Editing-team (Kanban Board), VisualEditor

Oct 10 2022

jberkel added a comment to T315276: Latest English Wikipedia Wikimedia Enterprise HTML dumps do not seem to be updated.

The stats now have a correct timestamp, but there's still missing data. Can you please fix this? With this unpredictable mix of old and new data they're useless for most purposes right now, might as well not generate them at all.

Oct 10 2022, 11:33 AM · Dumps-Generation, Wikimedia Enterprise Engineering, Wikimedia Enterprise

Oct 9 2022

jberkel updated the task description for T320343: Include "make" in all images.
Oct 9 2022, 10:16 AM · Toolforge (Software install/update)
jberkel created T320343: Include "make" in all images.
Oct 9 2022, 10:16 AM · Toolforge (Software install/update)
jberkel closed T319269: Missing Enterprise Dumps from 2022-10-01 run as Resolved.
Oct 9 2022, 10:04 AM · Dumps-Generation

Oct 7 2022

jberkel added a comment to T319269: Missing Enterprise Dumps from 2022-10-01 run.

Hmm, dumps are still not available…

Oct 7 2022, 10:14 AM · Dumps-Generation