Page MenuHomePhabricator

media-backupsComponent
ActivePublic

Members (2)

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

Tag for tickets related to WMF backup processes regarding backups and recoveries of multimedia files from wikis (including Wikimedia Commons), whose files are stored in production on SRE-swift-storage.

media-backups are one of the main components on handling WMF infrastructure backups and recoveries (Data-Persistence-Backup), the others being bacula and database-backups.

The project already produces working backups and is able to recover single file, but it is still under heavy development as of 2022.

Recent Activity

Thu, Nov 27

gerritbot added a comment to T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio.

Change #1211693 merged by Jcrespo:

[operations/puppet@production] Revert^2 "garage: Add a first role and profile"

https://gerrit.wikimedia.org/r/1211693

Thu, Nov 27, 10:03 AM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE
gerritbot added a comment to T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio.

Change #1211693 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] Revert^2 "garage: Add a first role and profile"

https://gerrit.wikimedia.org/r/1211693

Thu, Nov 27, 9:56 AM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE
gerritbot added a project to T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio: Patch-For-Review.
Thu, Nov 27, 9:39 AM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE
gerritbot added a comment to T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio.

Change #1211693 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] Revert^2 "garage: Add a first role and profile"

https://gerrit.wikimedia.org/r/1211693

Thu, Nov 27, 9:38 AM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE

Wed, Nov 26

Maintenance_bot removed a project from T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio: Patch-For-Review.
Wed, Nov 26, 3:31 PM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE
gerritbot added a comment to T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio.

Change #1207887 merged by Jcrespo:

[operations/puppet@production] garage: Add a first role and profile

https://gerrit.wikimedia.org/r/1207887

Wed, Nov 26, 2:48 PM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE

Tue, Nov 25

gerritbot added a comment to T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio.

Change #1211160 merged by Jcrespo:

[labs/private@master] garage: Add sample private tokens for non production hosts

https://gerrit.wikimedia.org/r/1211160

Tue, Nov 25, 4:20 PM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE
gerritbot added a comment to T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio.

Change #1211160 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[labs/private@master] garage: Add sample private tokens for non production hosts

https://gerrit.wikimedia.org/r/1211160

Tue, Nov 25, 4:05 PM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE
jcrespo triaged T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio as High priority.
Tue, Nov 25, 4:05 PM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE

Fri, Nov 21

Maintenance_bot moved T405942: eqiad row C/D Data Persistence host migrations from In progress to Done on the DBA board.
Fri, Nov 21, 3:29 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Jclark-ctr closed T405942: eqiad row C/D Data Persistence host migrations as Resolved.

all hosts listed on this task have been migrated.

Fri, Nov 21, 2:32 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Ladsgroup added a comment to T405942: eqiad row C/D Data Persistence host migrations.

https://grafana.wikimedia.org/d/35WSHOjVk/application-servers-red-k8s?orgId=1&from=2025-11-21T13:37:29.200Z&to=2025-11-21T14:28:00.361Z&timezone=utc&var-site=$__all&var-deployment=mw-web&var-method=GET&var-code=200&var-handler=php&var-service=mediawiki&refresh=1m&viewPanel=panel-9

grafik.png (854×1 px, 96 KB)

Fri, Nov 21, 2:28 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-21T14:25:01Z] <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repool pc8 (T405942)', diff saved to https://phabricator.wikimedia.org/P85440 and previous config saved to /var/cache/conftool/dbconfig/20251121-142500-ladsgroup.json

Fri, Nov 21, 2:25 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-21T14:21:00Z] <ladsgroup@cumin1003> dbctl commit (dc=all): 'Depool pc8 (T405942)', diff saved to https://phabricator.wikimedia.org/P85439 and previous config saved to /var/cache/conftool/dbconfig/20251121-142059-ladsgroup.json

Fri, Nov 21, 2:21 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-21T14:17:47Z] <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repool pc7 (T405942)', diff saved to https://phabricator.wikimedia.org/P85438 and previous config saved to /var/cache/conftool/dbconfig/20251121-141747-ladsgroup.json

Fri, Nov 21, 2:17 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Ladsgroup added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Since the depool time was quite short, the latency immediately recovered so we are moving forward to pc7 and pc8 too.

Fri, Nov 21, 2:16 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-21T14:13:45Z] <ladsgroup@cumin1003> dbctl commit (dc=all): 'Depool pc7 (T405942)', diff saved to https://phabricator.wikimedia.org/P85437 and previous config saved to /var/cache/conftool/dbconfig/20251121-141345-ladsgroup.json

Fri, Nov 21, 2:13 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-21T14:09:04Z] <ladsgroup@cumin1003> dbctl commit (dc=all): 'Repool pc6 (T405942)', diff saved to https://phabricator.wikimedia.org/P85436 and previous config saved to /var/cache/conftool/dbconfig/20251121-140903-ladsgroup.json

Fri, Nov 21, 2:09 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Jclark-ctr added a comment to T405942: eqiad row C/D Data Persistence host migrations.

pc1016 has been moved with @Ladsgroup Thanks for your help this morning

Fri, Nov 21, 2:08 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-21T14:03:28Z] <ladsgroup@cumin1003> dbctl commit (dc=all): 'Depool pc6 (T405942)', diff saved to https://phabricator.wikimedia.org/P85435 and previous config saved to /var/cache/conftool/dbconfig/20251121-140327-ladsgroup.json

Fri, Nov 21, 2:03 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
gerritbot added a comment to T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio.

Change #1206199 merged by Jcrespo:

[operations/puppet@production] garage: Productionize garage

https://gerrit.wikimedia.org/r/1206199

Fri, Nov 21, 9:27 AM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE

Thu, Nov 20

Marostegui added a comment to T405942: eqiad row C/D Data Persistence host migrations.

@RobH I'm out and not near a keyboard but you have to replace pc1016 with pc6

Thu, Nov 20, 6:10 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
RobH added a comment to T405942: eqiad row C/D Data Persistence host migrations.

@Ladsgroup had other things going on and wasn't able to do this today but did link me to the directions on how to depool: https://wikitech.wikimedia.org/wiki/MariaDB/Troubleshooting

Thu, Nov 20, 6:05 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
gerritbot added a comment to T410020: Evaluate garage as a replacement for an S3-compatible replacement for minio.

Change #1207887 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] garage: Add a first role and profile

https://gerrit.wikimedia.org/r/1207887

Thu, Nov 20, 3:33 PM · Patch-For-Review, Data-Persistence, media-backups, Data-Persistence-Backup, SRE
Marostegui added a comment to T405942: eqiad row C/D Data Persistence host migrations.

@RobH as @Ladsgroup mentions, pc* hosts can only be done one at the time. I am out half today and Friday as oncall compensation. If @Ladsgroup is around you could coordinate with him to get those three moved this week (yes, pc1015 was a typo).
Pending hosts could be done:

Thu, Nov 20, 6:10 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Ladsgroup added a comment to T405942: eqiad row C/D Data Persistence host migrations.

If you want to, I'll be around Thursday and Friday of this week and I can depool them for you. I can also do the 10G switch too (but that comes later I think?). When he is around, he'll be the responsible person but I can do it to free him a bit from work. Note that PC hosts must be depooled serially (so depool a section, move them, etc., repool, and then depool the next one). Just ping me in IRC tomorrow or the day after and we can get it done.

Thu, Nov 20, 12:42 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad

Wed, Nov 19

RobH added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Migration Update:
Only 3 Data-Persistence hosts remain for migration: pc101[678].

Wed, Nov 19, 10:32 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Bugreporter added a comment to T400567: File not found: /v1/AUTH_mw/wikipedia-commons-local-public ... for 3 files.

As there are likely many more of these cases is there a possibility to scan over all files on Commons to find all files affected?

In a way that's been done while taking backups, the issue is that some files have been deleted on purpose (think illegal stuff), other never were uploaded (e.g. imports from wikivoyage) and others may be lost. A larger audit would be needed to check which are on the third category.

Wed, Nov 19, 5:48 PM · media-backups, SRE-swift-storage, Commons
Marostegui added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Repooled pc4 as Rob confirmed pc1014 has been moved.

Wed, Nov 19, 5:17 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-19T17:16:23Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Repool pc4 T405942', diff saved to https://phabricator.wikimedia.org/P85395 and previous config saved to /var/cache/conftool/dbconfig/20251119-171622-marostegui.json

Wed, Nov 19, 5:16 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Marostegui added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Please ping me before moving of pc1014 so I depool pc4 cluster from rotation.

Wed, Nov 19, 5:03 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
RobH added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Please ping me before moving of pc1014 so I depool pc4 cluster from rotation.

Wed, Nov 19, 3:53 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
jcrespo added a comment to T405942: eqiad row C/D Data Persistence host migrations.
  • moss-be1002 - no directions provided on moving this, please advise
Wed, Nov 19, 3:48 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Ladsgroup added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Please ping me before moving of pc1014 so I depool pc4 cluster from rotation.

Wed, Nov 19, 3:47 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
RobH added a comment to T405942: eqiad row C/D Data Persistence host migrations.
  • backup1006, backup1007, ms-backup1002 moved yesterday.
  • db1189 was moved yesterday by accident sorry about that!
  • The only data persistence hosts left to move are:
    • moss-be1002 - no directions provided on moving this, please advise
    • pc1014 - scheduled to move today
    • pc1016 - not yet scheduled for migration
    • pc1017 - not yet scheduled for migration
    • pc1018
Wed, Nov 19, 3:45 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
jcrespo added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Based on the spreedsheet, no more interruptions are expected on

Wed, Nov 19, 8:24 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Marostegui added a comment to T405942: eqiad row C/D Data Persistence host migrations.

@Jclark-ctr I think we scheduled db1189 for today but it was done yesterday? The spreadsheet marks it as done and also I can see:

[Tue Nov 18 17:39:15 2025] tg3 0000:04:00.0 eno1: Link is down
[Tue Nov 18 17:39:21 2025] tg3 0000:04:00.0 eno1: Link is up at 1000 Mbps, full duplex
Wed, Nov 19, 8:22 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Marostegui added a comment to T405942: eqiad row C/D Data Persistence host migrations.

@Jclark-ctr
db1189
pc1014

Wed, Nov 19, 8:15 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Marostegui added a comment to T405942: eqiad row C/D Data Persistence host migrations.

@Jclark-ctr
db1189
pc1014

Wed, Nov 19, 7:07 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Marostegui closed T410283: Switchover s3 master (db1189 -> db1223), a subtask of T405942: eqiad row C/D Data Persistence host migrations, as Resolved.
Wed, Nov 19, 6:49 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-19T06:26:34Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Repool ms3 T405942', diff saved to https://phabricator.wikimedia.org/P85373 and previous config saved to /var/cache/conftool/dbconfig/20251119-062634-marostegui.json

Wed, Nov 19, 6:26 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-19T06:25:10Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Repool ms3 T405942', diff saved to https://phabricator.wikimedia.org/P85372 and previous config saved to /var/cache/conftool/dbconfig/20251119-062509-marostegui.json

Wed, Nov 19, 6:25 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad

Tue, Nov 18

MatthewVernon added a comment to T405942: eqiad row C/D Data Persistence host migrations.

@RobH / @Jclark-ctr as I noted above, moss-be1002 can be done whenever, I'd just like to be told when you're going to do it, please.

Tue, Nov 18, 4:53 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad

Nov 18 2025

jcrespo added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Media backups processing on eqiad is stopped and the following hosts have been downtimed for 24 hours from now:

Nov 18 2025, 10:27 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Icinga downtime and Alertmanager silence (ID=8cc3d2b8-5b7e-411e-aa85-6a4983c97ec1) set by jynus@cumin1003 for 1 day, 0:00:00 on 4 host(s) and their services with reason: Network maintenance

backup[1006-1007].eqiad.wmnet,ms-backup[1001-1002].eqiad.wmnet
Nov 18 2025, 10:25 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Marostegui added a comment to T405942: eqiad row C/D Data Persistence host migrations.

@Jclark-ctr the following hosts are ready for you to proceed. No special cookbooks or downtime are required:
db1153
db1167
db1121
es1033
pc1013
db1181
db1184
pc1013

Nov 18 2025, 6:34 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-18T06:30:48Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Depool ms3 T405942', diff saved to https://phabricator.wikimedia.org/P85356 and previous config saved to /var/cache/conftool/dbconfig/20251118-063048-marostegui.json

Nov 18 2025, 6:30 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Stashbot added a comment to T405942: eqiad row C/D Data Persistence host migrations.

Mentioned in SAL (#wikimedia-operations) [2025-11-18T06:30:10Z] <marostegui@cumin1003> dbctl commit (dc=all): 'Depool pc1 T405942', diff saved to https://phabricator.wikimedia.org/P85355 and previous config saved to /var/cache/conftool/dbconfig/20251118-063010-marostegui.json

Nov 18 2025, 6:30 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad
Marostegui closed T410282: Switchover s1 master (db1184 -> db1163), a subtask of T405942: eqiad row C/D Data Persistence host migrations, as Resolved.
Nov 18 2025, 6:28 AM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad

Nov 17 2025

Jclark-ctr added a comment to T405942: eqiad row C/D Data Persistence host migrations.

@jcrespo That works for me Thanks!

Nov 17 2025, 7:31 PM · media-backups, DBA, Data-Persistence, SRE, DC-Ops, ops-eqiad