User Details
- User Since
- Nov 13 2017, 5:57 PM (422 w, 2 d)
- Roles
- Disabled
- LDAP User
- Imarlier
- MediaWiki User
- IMarlier (WMF) [ Global Accounts ]
Jan 10 2019
It is a google group -- I've invited @kchapman and made her a manager of the group.
Jan 7 2019
Dec 18 2018
Dec 17 2018
Dec 14 2018
On a very random note, I wanted to say that I enjoyed this:
Dec 13 2018
Dec 12 2018
Dec 11 2018
Had an extended conversation with @Eevans about this on IRC today. His values are good with me -- as long as he/Core Platform feel that session storage is fast enough I'm happy to take the increased latency in exchange for a move toward multi-master.
@Krinkle Opened CR's for coal and navtiming. arc-lamp is missing most of the scaffolding for a python package (tox.ini, setup.py, etc), so rather than add that as part of this ticket, I'd suggest that the tox.ini file be created with a pinned version whenever that happens.
Verified that tox.ini pinning works, by changing and then running locally:
Dec 10 2018
@aaron to provide feedback, will assign back once he has.
@aaron is going to see what else can be done to reduce spam, will then assign back to @ArielGlenn
Given that we mix and match using pip to manage python deps (which is good!), and using puppet to install them (which is bad!), I'd suggest using a VERY light hand with this.
Dec 6 2018
Dec 5 2018
@phuedx Any chance that someone had an opportunity to look at this?
Had a conversation with Apple Web Tech Evangelist, they are aware of this and it's assigned, but no release date known
@CCicalese_WMF @Legoktm Is this ready to review? We are a bit unclear on current status.
Same as T205369, which has been resolved.
@mwjames we believe that this is fixed, but we're waiting on confirmation of that. Could you please let us know if this is addressed for you?
Dec 4 2018
@jcrespo Why would we need to deploy Mediawiki in order to repoint when the master is switched? Wouldn't the proxy be responsible for that?
Dec 3 2018
@Smalyshev Guessing this should go back to you for followup?
I've been running this in a tmux session on a few of the wdqs servers: while :; do DSTAMP=$(date); CW=$(sudo netstat -anet | grep 208.80.154.224 | grep -c CLOSE_WAIT); echo "${DSTAMP}: ${CW}"; sleep 1; done >> ~/close_waits.txt. (154.224 is the edge for the text cache cluster.)
@Smalyshev Yes, it would be slower, but it would also be diagnostic -- if persistent connections are disabled and the errors stop, we can be pretty confident that something about the way that they're configured is what's resulting in this issue.
@Smalyshev Another thought: why not just disable pooling, and have the client close each connection after each request?
@BBlack @ema Couple of questions for you about Nginx:
- Do we have nginx configured to handle a specific number of requests on a given worker process/thread, and then shut that down?
- If it possible for nginx to be restarted (interrupting existing persistent connections) due to config updates or the like, and if so, is there a record of times when that has happened?
Nov 30 2018
Hey, that looks a lot better! Nice work, @daniel !
Banner went live - happens every year
Nov 29 2018
@jcrespo I should have kept my answer simpler: I think it's fine to go ahead and do this. I would suggest changing one key (eg, '10.64.0.12' becomes 'pc1'), and deploying that. Wait 24-48 hours, until the hit rate has basically recovered, then change the second key. Wait another 24-48 hours, and change the last one.
