Page MenuHomePhabricator

SREGroup
ActivePublic

Recent Activity

Today

JMeybohm updated the task description for T418925: Q3:rack/setup/install wikikube-worker23[57-74].
Wed, Mar 4, 9:55 AM · ServiceOps-Upgrades-Hardware, ServiceOps new, SRE, ops-eqiad, DC-Ops
gerritbot added a project to T417035: Create a cookbook to execute Kafka rolling upgrades: Patch-For-Review.
Wed, Mar 4, 9:51 AM · Patch-For-Review, Infrastructure-Foundations, SRE
gerritbot added a comment to T417035: Create a cookbook to execute Kafka rolling upgrades.

Change #1247942 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/cookbooks@master] WIP: add sre.kafka.change-confluent-distro-version

https://gerrit.wikimedia.org/r/1247942

Wed, Mar 4, 9:51 AM · Patch-For-Review, Infrastructure-Foundations, SRE
ops-monitoring-bot added a comment to T415002: Unusually high disk errors on the an-worker nodes since upgrading the disks.

Host an-worker1200.eqiad.wmnet rebooted by btullis@cumin1003 with reason: Rebooting to pick up new server profile

Wed, Mar 4, 9:51 AM · Data-Platform-SRE (2026-02-13 - 2026-03-06), SRE, DC-Ops, ops-eqiad
gerritbot added a comment to T418903: Q3:rack/setup/install ganeti105[56].

Change #1247908 merged by Muehlenhoff:

[operations/puppet@production] Add ganeti1055/1056/1057/1058 to site.pp

https://gerrit.wikimedia.org/r/1247908

Wed, Mar 4, 9:47 AM · Patch-For-Review, Infrastructure-Foundations, SRE, ops-eqiad, DC-Ops
MatthewVernon added a comment to T418772: Eqiad: lsw1-d7-eqiad BGP maintenance.

Is this maintenance happening at 15:00 UTC today?

Wed, Mar 4, 9:36 AM · Prod-Kubernetes, ServiceOps new, netops, Infrastructure-Foundations, SRE
MatthewVernon updated the task description for T418772: Eqiad: lsw1-d7-eqiad BGP maintenance.
Wed, Mar 4, 9:34 AM · Prod-Kubernetes, ServiceOps new, netops, Infrastructure-Foundations, SRE
Maintenance_bot removed a project from T418911: Q4:rack/setup/install 4 new db hosts in codfw: Patch-For-Review.
Wed, Mar 4, 9:31 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
gerritbot added a project to T418902: Q3:rack/setup/install apus-be200[56]: Patch-For-Review.
Wed, Mar 4, 9:30 AM · Patch-For-Review, SRE-swift-storage, SRE, Data-Persistence, ops-codfw, DC-Ops
gerritbot added a comment to T418902: Q3:rack/setup/install apus-be200[56].

Change #1247937 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] preseed: all apus-be nodes are using boss cards

https://gerrit.wikimedia.org/r/1247937

Wed, Mar 4, 9:29 AM · Patch-For-Review, SRE-swift-storage, SRE, Data-Persistence, ops-codfw, DC-Ops
gerritbot added a project to T418901: Q3:rack/setup/install apus-be100[56]: Patch-For-Review.
Wed, Mar 4, 9:29 AM · Patch-For-Review, SRE-swift-storage, SRE, ops-eqiad, Data-Persistence, DC-Ops
gerritbot added a comment to T418901: Q3:rack/setup/install apus-be100[56].

Change #1247937 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] preseed: all apus-be nodes are using boss cards

https://gerrit.wikimedia.org/r/1247937

Wed, Mar 4, 9:29 AM · Patch-For-Review, SRE-swift-storage, SRE, ops-eqiad, Data-Persistence, DC-Ops
Marostegui updated the task description for T418911: Q4:rack/setup/install 4 new db hosts in codfw.
Wed, Mar 4, 9:23 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
Marostegui placed T418911: Q4:rack/setup/install 4 new db hosts in codfw up for grabs.

Patches ready

Wed, Mar 4, 9:22 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
gerritbot added a comment to T418911: Q4:rack/setup/install 4 new db hosts in codfw.

Change #1247934 merged by Marostegui:

[operations/puppet@production] site.pp: Add db225[0-3]

https://gerrit.wikimedia.org/r/1247934

Wed, Mar 4, 9:21 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
gerritbot added a comment to T418911: Q4:rack/setup/install 4 new db hosts in codfw.

Change #1247934 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] site.pp: Add db225[0-3]

https://gerrit.wikimedia.org/r/1247934

Wed, Mar 4, 9:20 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
cmooney added a comment to T411054: Nokia SR-Linux DHCP Relay Bug.

@ayounsi thanks for following up on this. I've done some testing to see if there may be a better way to force a tunnel teardown/re-establishment today.

Wed, Mar 4, 9:20 AM · netops, Infrastructure-Foundations, SRE
MatthewVernon closed T413089: FY2526 Q3:rack/setup/install ms-be109[67] as Resolved.

Yes, they look good now, thank you!

Wed, Mar 4, 9:19 AM · Patch-For-Review, SRE, SRE-swift-storage, ops-eqiad, DC-Ops
gerritbot added a project to T413089: FY2526 Q3:rack/setup/install ms-be109[67]: Patch-For-Review.
Wed, Mar 4, 9:17 AM · Patch-For-Review, SRE, SRE-swift-storage, ops-eqiad, DC-Ops
gerritbot added a comment to T413089: FY2526 Q3:rack/setup/install ms-be109[67].

Change #1247932 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] swift: add 2 new storage nodes ms-be109{6,7}

https://gerrit.wikimedia.org/r/1247932

Wed, Mar 4, 9:16 AM · Patch-For-Review, SRE, SRE-swift-storage, ops-eqiad, DC-Ops
gerritbot added a comment to T418911: Q4:rack/setup/install 4 new db hosts in codfw.

Change #1247928 merged by Marostegui:

[operations/puppet@production] installserver: Install db225[0-3]

https://gerrit.wikimedia.org/r/1247928

Wed, Mar 4, 9:11 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
gerritbot added a comment to T418911: Q4:rack/setup/install 4 new db hosts in codfw.

Change #1247928 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] installserver: Install db225[0-3]

https://gerrit.wikimedia.org/r/1247928

Wed, Mar 4, 9:08 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
Stashbot added a comment to T411054: Nokia SR-Linux DHCP Relay Bug.

Mentioned in SAL (#wikimedia-operations) [2026-03-04T08:49:16Z] <topranks> disabling IBGP session between ssw1-d1-eqiad and ssw1-d8-eqiad to remove backup paths try #2 T411054

Wed, Mar 4, 8:49 AM · netops, Infrastructure-Foundations, SRE
Diskdance updated the task description for T205378: Support Encrypted Client Hello (ECH) on Wikimedia servers.
Wed, Mar 4, 8:47 AM · Traffic, Traffic-Icebox, Upstream, HTTPS, SRE
Diskdance renamed T205378: Support Encrypted Client Hello (ECH) on Wikimedia servers from Support ECH on Wikimedia servers to Support Encrypted Client Hello (ECH) on Wikimedia servers.
Wed, Mar 4, 8:46 AM · Traffic, Traffic-Icebox, Upstream, HTTPS, SRE
Jelto closed T418415: Requesting access to analytics-privatedata-users for Dani Totten as Resolved.

Thank you for the key. You should have access now. I also created a kerberos principal because Jupyter + Hive/Spark was mentioned.

Wed, Mar 4, 8:45 AM · Patch-For-Review, SRE, SRE-Access-Requests
Diskdance added a comment to T205378: Support Encrypted Client Hello (ECH) on Wikimedia servers.

FYI, the ECH standard has been stabilized as RFC9848: https://www.rfc-editor.org/info/rfc9848.

Wed, Mar 4, 8:40 AM · Traffic, Traffic-Icebox, Upstream, HTTPS, SRE
jcrespo updated the task description for T418772: Eqiad: lsw1-d7-eqiad BGP maintenance.
Wed, Mar 4, 8:38 AM · Prod-Kubernetes, ServiceOps new, netops, Infrastructure-Foundations, SRE
cmooney created T418978: cr2-magru <-> asw1-b3-magru link down March 2026.
Wed, Mar 4, 8:37 AM · ops-magru, netops, Infrastructure-Foundations, SRE
ops-monitoring-bot added a comment to T418772: Eqiad: lsw1-d7-eqiad BGP maintenance.

Icinga downtime and Alertmanager silence (ID=cd8c8777-0916-4a5b-b6f5-55f2535990f4) set by jynus@cumin1003 for 1 day, 0:00:00 on 2 host(s) and their services with reason: network maintenance

backup1007.eqiad.wmnet,dbprov1004.eqiad.wmnet
Wed, Mar 4, 8:36 AM · Prod-Kubernetes, ServiceOps new, netops, Infrastructure-Foundations, SRE
gerritbot added a comment to T418911: Q4:rack/setup/install 4 new db hosts in codfw.

Change #1247919 merged by Marostegui:

[operations/puppet@production] mariadb: db225[0-3].yaml

https://gerrit.wikimedia.org/r/1247919

Wed, Mar 4, 8:36 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
gerritbot added a project to T418911: Q4:rack/setup/install 4 new db hosts in codfw: Patch-For-Review.
Wed, Mar 4, 8:35 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
gerritbot added a comment to T418911: Q4:rack/setup/install 4 new db hosts in codfw.

Change #1247919 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: db225[0-3].yaml

https://gerrit.wikimedia.org/r/1247919

Wed, Mar 4, 8:35 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
Marostegui updated the task description for T418911: Q4:rack/setup/install 4 new db hosts in codfw.
Wed, Mar 4, 8:32 AM · Data-Persistence, SRE, ops-codfw, DC-Ops
Maintenance_bot removed a project from T418201: wikimedia-l was signed up for a developer account: Patch-For-Review.
Wed, Mar 4, 8:31 AM · SRE, Bitu, Infrastructure-Foundations
Maintenance_bot removed a project from T418908: Q4:rack/setup/install pc102[1-4]: Patch-For-Review.
Wed, Mar 4, 8:30 AM · SRE, Data-Persistence, ops-eqiad, DC-Ops
jcrespo updated the task description for T418772: Eqiad: lsw1-d7-eqiad BGP maintenance.
Wed, Mar 4, 8:26 AM · Prod-Kubernetes, ServiceOps new, netops, Infrastructure-Foundations, SRE
jcrespo added a comment to T418772: Eqiad: lsw1-d7-eqiad BGP maintenance.

@Papaul for backup1007, dbprov1004, while they are a production host with important content, a small network interruption will not cause any issue. Just give us a heads up if the window gets larger. Let me downtime it for a day. Let me update the ticket.

Wed, Mar 4, 8:25 AM · Prod-Kubernetes, ServiceOps new, netops, Infrastructure-Foundations, SRE
Marostegui updated the task description for T418908: Q4:rack/setup/install pc102[1-4].
Wed, Mar 4, 8:21 AM · SRE, Data-Persistence, ops-eqiad, DC-Ops
Marostegui placed T418908: Q4:rack/setup/install pc102[1-4] up for grabs.

Patches are ready

Wed, Mar 4, 8:20 AM · SRE, Data-Persistence, ops-eqiad, DC-Ops
gerritbot added a comment to T418908: Q4:rack/setup/install pc102[1-4].

Change #1247911 merged by Marostegui:

[operations/puppet@production] site.pp: Add pc102[1-4]

https://gerrit.wikimedia.org/r/1247911

Wed, Mar 4, 8:19 AM · SRE, Data-Persistence, ops-eqiad, DC-Ops
gerritbot added a comment to T418908: Q4:rack/setup/install pc102[1-4].

Change #1247911 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] site.pp: Add pc102[1-4]

https://gerrit.wikimedia.org/r/1247911

Wed, Mar 4, 8:17 AM · SRE, Data-Persistence, ops-eqiad, DC-Ops
gerritbot added a comment to T418201: wikimedia-l was signed up for a developer account.

Change #1247584 merged by Slyngshede:

[operations/puppet@production] P:idm disallow signups from select domains

https://gerrit.wikimedia.org/r/1247584

Wed, Mar 4, 8:10 AM · SRE, Bitu, Infrastructure-Foundations
gerritbot added a project to T418903: Q3:rack/setup/install ganeti105[56]: Patch-For-Review.
Wed, Mar 4, 8:07 AM · Patch-For-Review, Infrastructure-Foundations, SRE, ops-eqiad, DC-Ops
gerritbot added a comment to T418903: Q3:rack/setup/install ganeti105[56].

Change #1247908 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add ganeti1055/1056/1057/1058 to site.pp

https://gerrit.wikimedia.org/r/1247908

Wed, Mar 4, 8:06 AM · Patch-For-Review, Infrastructure-Foundations, SRE, ops-eqiad, DC-Ops
MoritzMuehlenhoff added a comment to T418903: Q3:rack/setup/install ganeti105[56].

Also, I changed the names: ganeti1053/1053 were already added last year in https://phabricator.wikimedia.org/T401691.

Wed, Mar 4, 8:05 AM · Patch-For-Review, Infrastructure-Foundations, SRE, ops-eqiad, DC-Ops
MoritzMuehlenhoff renamed T418903: Q3:rack/setup/install ganeti105[56] from Q3:rack/setup/install ganeti105[34] to Q3:rack/setup/install ganeti105[56].
Wed, Mar 4, 8:04 AM · Patch-For-Review, Infrastructure-Foundations, SRE, ops-eqiad, DC-Ops
gerritbot added a comment to T418908: Q4:rack/setup/install pc102[1-4].

Change #1247907 merged by Marostegui:

[operations/puppet@production] installserver: Install pc102[1-4]

https://gerrit.wikimedia.org/r/1247907

Wed, Mar 4, 8:02 AM · SRE, Data-Persistence, ops-eqiad, DC-Ops
MoritzMuehlenhoff updated subscribers of T418903: Q3:rack/setup/install ganeti105[56].

@RobH Why did you create a racking task only for two servers, are these shipped out in batches and we only get two initially? The order is for four servers. I'll go ahead and create the site.pp/preseed config for all four of them already anyway.

Wed, Mar 4, 8:00 AM · Patch-For-Review, Infrastructure-Foundations, SRE, ops-eqiad, DC-Ops
gerritbot added a project to T418908: Q4:rack/setup/install pc102[1-4]: Patch-For-Review.
Wed, Mar 4, 7:59 AM · SRE, Data-Persistence, ops-eqiad, DC-Ops