Page MenuHomePhabricator

Run data migration script for file migration
Open, MediumPublic

Description

It must be done after it's on write both, and T183490: MCR schema migration stage 4: Migrate External Store URLs (wmf production) is done in that section.

Progress:

  • beta cluster
  • s1
  • s2
  • s3
  • s4
  • s5
  • s6
  • s7
  • s8: Not needed (wikidata doesn't have local upload enabled, tables are empty)

Reminder: Make sure renamed files are taken care of.

Event Timeline

Ladsgroup triaged this task as Medium priority.Jan 30 2025, 1:12 PM
Ladsgroup moved this task from Triage to In progress on the DBA board.

In commonswiki in beta cluster, it gets this:

InvalidArgumentException from line 284 of /srv/mediawiki-staging/php-master/includes/user/UserFactory.php: Cannot create a user with no name, no ID, and no actor ID
#0 /srv/mediawiki-staging/php-master/includes/filerepo/file/LocalFile.php(637): MediaWiki\User\UserFactory->newFromAnyId(NULL, NULL, 0)
#1 /srv/mediawiki-staging/php-master/includes/filerepo/file/LocalFile.php(247): LocalFile->loadFromRow(Object(stdClass))
#2 [internal function]: LocalFile::newFromRow(Object(stdClass), Object(LocalRepo))
#3 /srv/mediawiki-staging/php-master/includes/filerepo/LocalRepo.php(125): call_user_func(Array, Object(stdClass), Object(LocalRepo))
#4 /srv/mediawiki-staging/php-master/maintenance/migrateFileTables.php(154): LocalRepo->newFileFromRow(Object(stdClass))
#5 /srv/mediawiki-staging/php-master/maintenance/migrateFileTables.php(124): MigrateFileTables->handleFile(Object(stdClass))
#6 /srv/mediawiki-staging/php-master/maintenance/includes/MaintenanceRunner.php(695): MigrateFileTables->execute()
#7 /srv/mediawiki-staging/php-master/maintenance/run.php(51): MediaWiki\Maintenance\MaintenanceRunner->run()
#8 /srv/mediawiki-staging/multiversion/MWScript.php(156): require_once('/srv/mediawiki-...')
#9 {main}

I put a warning on https://www.mediawiki.org/wiki/Manual:File_table linking to this ticket. Please remove when done.

Thanks! Will do.

Running s4 now:

K8S_CLUSTER=eqiad KUBECONFIG=/etc/kubernetes/mw-script-eqiad.config kubectl logs -f job/mw-script.eqiad.g7lu3tv6 mediawiki-g7lu3tv6-app

Running enwiki now:

K8S_CLUSTER=eqiad KUBECONFIG=/etc/kubernetes/mw-script-eqiad.config kubectl logs -f job/mw-script.eqiad.4rnrneyx mediawiki-4rnrneyx-app

S4 seems to be done:

mysql:research@s4-analytics-replica.eqiad.wmnet [commonswiki]> select count(*) from file;
+-----------+
| count(*)  |
+-----------+
| 118515397 |
+-----------+
1 row in set (49.192 sec)

mysql:research@s4-analytics-replica.eqiad.wmnet [commonswiki]> select count(*) from image;
+-----------+
| count(*)  |
+-----------+
| 118427533 |
+-----------+
1 row in set (6 min 9.027 sec)

The difference in number is natural, file keeps deleted files too (the ones that are deleted after turning on write both). Note how the query is much faster in the new schema.

I do a re-run to take care of file renames during the run:

K8S_CLUSTER=eqiad KUBECONFIG=/etc/kubernetes/mw-script-eqiad.config kubectl logs -f job/mw-script.eqiad.wxg18ige mediawiki-wxg18ige-app

This is basically done, only thing left is to make sure no data has been lost.

So, can I start using the new file table?

So, can I start using the new file table?

So I have finished the data migration a while ago but I haven't got around to do consistency check to make sure everything is done correctly. There might be some missing files in the new schema but for most purposes you should be fine if you want to use it unless it's critical and the data has to be correct with high degree of certainty.

Does that answer your question?

I think I should wait untill the confirmation of data consistency.

Mentioned in SAL (#wikimedia-operations) [2025-06-18T17:52:07Z] <ladsgroup@cumin1002> dbctl commit (dc=all): 'Depool db2155 for queries (T385167)', diff saved to https://phabricator.wikimedia.org/P78379 and previous config saved to /var/cache/conftool/dbconfig/20250618-175206-ladsgroup.json

Mentioned in SAL (#wikimedia-operations) [2025-06-18T17:54:11Z] <ladsgroup@cumin1002> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Running queries (T385167)

I ran the check and there was only one file that was not in file table but in image table:

ladsgroup@deploy1003:~$ mwscript-k8s --follow  -- migrateFileTables --wiki=commonswiki --start "Reception_of_Grand_Duke_Alexander_Nikolayevich_by_Prince_Metternich_in_the_Vienna_Hofburg_in_1829,_by_Ferdinand_Georg_Waldmüller.jpa" --end "Reception_of_Grand_Duke_Alexander_Nikolayevich_by_Prince_Metternich_in_the_Vienna_Hofburg_in_1829,_by_Ferdinand_Georg_Waldmüller.jpz"
⏳ Starting migrateFileTables on Kubernetes as job mw-script.eqiad.70jbm9wd ...
⏳ Waiting for the container to start...
🚀 Job is running.
📜 Streaming logs:
Processing next 1 row(s) starting with Reception_of_Grand_Duke_Alexander_Nikolayevich_by_Prince_Metternich_in_the_Vienna_Hofburg_in_1829,_by_Ferdinand_Georg_Waldmüller.jpg.
Migrated File:Reception_of_Grand_Duke_Alexander_Nikolayevich_by_Prince_Metternich_in_the_Vienna_Hofburg_in_1829,_by_Ferdinand_Georg_Waldmüller.jpg. Inserted 2 rows.

Finished migration for 1 files. 2 rows have been inserted into filerevision table.

OTOH, there were 168 files that were in file table and file_deleted was zero but they didn't exist in image table.

I leave some here:

Charles_Dejour,_manager_-_photographie_de_presse_-_Agence_Meurisse_-_btv1b90533153.jpg
Académie_des_Sciences_Morales_et_Politiques_-_réception_de_Son_Eminence_le_Cardinal_Mercier_-_photographie_de_presse_-_Agence_Meurisse_-_btv1b90332868.jpg
Ils_ont_déjà_occupé_la_villa_voisine,_texte_de_Stanislas_Ignacy_Witkiewicz_-_photographies_-_Jean-Marc_Martin_du_Theil_-_btv1b106124537_(050_of_248).jpg

I'll check those.

Other wikis were fine in both directions. Now I need to figure out what to do with the 168 files that are in file table but not in image table. It looks like they might have been caused by T389586: Wikimedia\Rdbms\DBQueryError: Error 1062: Duplicate entry moving a file; Function: LocalFileMoveBatch::doDBUpdates but can't say for sure. I debug that later.