Page MenuHomePhabricator

Echo notification expiration should be time-based, not hardcoded 2000
Open, Needs TriagePublic

Description

Echo notification expiration should be time-based, not a hardcoded limit of 2000 notifications.

This would be better because:

  • The hardcoded limit is vulnerable to data loss caused by notification spam. Even if the spammy events are later hidden (e.g. by revision-deleting relevant revisions), older notifications are still lost. (T227853)
  • The hardcoded limit reveals how many events were hidden that way. For example, on enwiki I currently have 1,993 notifications, which let me know that there are 7 notifications I am no longer allowed to view (due to deleted revisions, deleted pages, or something else). This is a very minor, but probably unwanted information leak.
  • It would let us remove compatibility code for very old notifications. Currently we have to be able to display every notification ever sent, which requires maintaining backwards-compatibility if the data format is ever changed (which means we never change the format, and just leave mistakes in forever, e.g. this), compatibility with corrupted data caused by bugs (e.g. T323236, T367638, T286192, T223741), or running maintenance scripts to convert them (to my knowledge this never happened, but would be necessary for e.g. T325703). For example, on plwiki I have notifications going all the way back to 2013, when the extension was deployed.

Echo code should be able to cope with more than 2000 notifs, it uses key-based pagination and has good indexes.

The only downside would be that notifs of sentimental value would all be eventually lost. If we make this change, we should announce it first.

Details

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

In theory, this would open us up to a DoS attack, where someone could trigger thousands of notifications in a minute, flooding the storage. However, creating eg. a revision takes way more rows/space than a single notification, so this may not be a concern? Regardless, it is something that needs to be considered.

Echo's database is much healthier now and if users are asking for a higher value, I think we can bump it to 5K but there should be a limit somewhere both timestamp and number. Once the limit on timstamp is set, we can bump the number to a higher value but I don't think we can fully remove it

In theory, this would open us up to a DoS attack, where someone could trigger thousands of notifications in a minute, flooding the storage. However, creating eg. a revision takes way more rows/space than a single notification, so this may not be a concern? Regardless, it is something that needs to be considered.

There should be some rate limits on this?

matmarex updated the task description. (Show Details)

There aren't any rate limits on notifications themselves, but there are rate limits on the actions that cause a notification to be sent, like editing pages.

There aren't any rate limits on notifications themselves, but there are rate limits on the actions that cause a notification to be sent, like editing pages.

Yes. That was my point. Thanks for clarifying.

Notes:

  1. Proper versioning means being prepared to do "complex migrations" as in the example the description mentions. We should deal with our technical debt neither "just leaving mistakes in forever" nor deleting data to hide the past.
  2. The original reason for imposing a limit has never been discussed with the community. The change https://www.mediawiki.org/w/index.php?title=Topic:S2gux2za46y9zqsv&topic_showPostId=v3aoduag8wpvgroy#flow-post-v3aoduag8wpvgroy was put live some 10 years ago to address a "production bottleneck". No engagement has occurred to discuss what the "production bottleneck" is or was. However it is quite possible that a simple change to a SQL query somewhere would have obviated the whole enterprise. We should look into this before replacing one deletion system with another, however superior. For example it may be possible that simply having an "archive" bit set for everything over 12 months would resolve the production bottleneck. It may be that the bottleneck is in code that has since been deleted, or that another change has rendered in no longer a bottleneck.

FWIW, I look at notification timestamps and their distribution over the years. In total we had 232M notifications recorded and this is their cumulative percentage:

20100%
20110%
20120%
20132%
20147%
201515%
201621%
201727%
201835%
201944%
202054%
202164%
202273%
202383%
202495%
2025100%

Of course we can start with some ridiculous number (anything older than ten years) and then slowly reduce the years to something reasonable. Seems like 5 years is a good compromise of being long enough but also can clean up a decent chunk (around half!) of notifications. cc @matmarex

My opinion is that we should have both number limit and time limit but if we clean up old notifications (e.g. older than five years) then we can bump the 2,000 limit to something bigger.

@KStoller-WMF said she is okay with the following proposed change: Delete notifications older than 5 years (which will cut the size of the table to half)‌ and in return, increase the maximum number allowed for each user from 2,000 to 10,000.

From my perspective, I would be comfortable with this change, and it seems logically sound. Notifications are generally designed to be time bound rather than permanent records. On most platforms, users understand that notifications are not retained indefinitely and are instead intended to surface timely information.

If we can meaningfully increase the maximum notification limit in exchange for removing very old notifications, that could be a reasonable trade off. It balances system constraints with user expectations and aligns the feature more closely with how notifications function elsewhere.

That being said, I would love to hear from someone from Movement Communications about how we should loop in the community. @Quiddity - could we simply announce this ahead of time in Tech News? Or is some other communication needed here?

Change #1239793 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/Echo@master] Delete old notifications of users

https://gerrit.wikimedia.org/r/1239793

Thanks!‌ I‌ made a quick patch that would drop the old rows in case a new notification is being inserted. It doesn't automatically clean up everything but it should clean up a lot. Before merge and deploy, definitely a tech news entry sounds good to me. I‌ leave it to Nick on how to proceed and/or whether a consultation is needed beforehand.

I would suggest writing a Tech News entry that gives a clear explanation of what is changing and why, plus gives a 2 week deadline for feedback in case there's something we've overlooked. E.g. Perhaps something like this? (Please expand and rewrite/improve however you can!):

  • Notifications are currently limited to 2,000 historic entries per user, and extend back to 2013 when the feature was released. This is going to be changed in two weeks, to only store Notifications from the last 5 years, but up to 10,000 of them. This will help with long-term code maintenance and help to prevent more recent notifications from disappearing too soon. [plus any other important reasons to highlight?]. [1]

I cannot immediately think of any major concerns, but the edge-case that springs to mind is: For editors who are active long-term on multiple wikis, they might have Notifications on a sister project that they're intending to come back to, which are currently easily accessible. - E.g. If I go to Wiktionary, which I only edit occasionally, there's a 7 year old notification about a task-idea that I keep intending to get back to, and I don't have it written down because I know it's there at the top of my local Notifications... [This task is now prompting me to make a proper note-to-self about it.]

Thanks for pushing this forward @Ladsgroup and @KStoller-WMF.

I think we should have at least two notifications in Tech News, given that we're about to permanently delete data:

  • One notification well ahead of time, so that if some problems are raised in response, we could still change the plan
  • Second notification just before the data starts being deleted

Also, I wish we could document a way to export your notifications before they're gone. I think the action=query&meta=notifications API would be an acceptable way to do this, were it not for the fact that it is limited to 50 notifications – could we temporarily increase this limit to 2000? (perhaps with a rate limit to avoid issues) [I can write the patch for this]

@Quiddity I think it's a good point that unread notifications could be exempt from the time limit, and only subject to the total count limit.

Eh, on second thought… handling unread notifications differently kind of defeats the point of the change. So, I wish we could do that, but I don't think it's practical to do. Anyway, that makes it all the more important to notify earlier :)

To finalise things, this is the final copy I am adding to Tech News:

"Notifications are currently limited to 2,000 historic entries per user, and extend back to 2013 when the feature was released. This is going to be changed in two weeks, to only store Notifications from the last 5 years, but up to 10,000 of them. This will help with long-term code maintenance and help to prevent more recent notifications from disappearing too soon."

Please ping me with corrections if any. In the meantime I will add the entry to the draft.

I would suggest writing a Tech News entry that gives a clear explanation of what is changing and why, plus gives a 2 week deadline for feedback in case there's something we've overlooked. E.g. Perhaps something like this? (Please expand and rewrite/improve however you can!):

  • Notifications are currently limited to 2,000 historic entries per user, and extend back to 2013 when the feature was released. This is going to be changed in two weeks, to only store Notifications from the last 5 years, but up to 10,000 of them. This will help with long-term code maintenance and help to prevent more recent notifications from disappearing too soon. [plus any other important reasons to highlight?]. [1]

I cannot immediately think of any major concerns, but the edge-case that springs to mind is: For editors who are active long-term on multiple wikis, they might have Notifications on a sister project that they're intending to come back to, which are currently easily accessible. - E.g. If I go to Wiktionary, which I only edit occasionally, there's a 7 year old notification about a task-idea that I keep intending to get back to, and I don't have it written down because I know it's there at the top of my local Notifications... [This task is now prompting me to make a proper note-to-self about it.]

Instead of "long-term code maintenance" I'd say "long-term infrastructure health" otherwise looks great!

@Ladsgroup I have made the change however now, the announcement will need to to be more specific about when the users can expect the change. Also, this is a good link for this entry right https://www.mediawiki.org/wiki/Notifications? Please check the draft below to see if I have captured things well:

"Notifications are currently limited to 2,000 historic entries per user, and extend back to 2013 when the feature was released. This is going to be changed in the week of 2 March, to only store Notifications from the last 5 years, but up to 10,000 of them. This will help with long-term infrastructure health and help to prevent more recent notifications from disappearing too soon."

Instead of "long-term code maintenance" I'd say "long-term infrastructure health" otherwise looks great!

"[...] This is going to be changed in the week of 2 March [...]"

Can I just check -- if I'm reading it correctly, @matmarex's comment in T383948#11625527 referred to giving at least two notices in Tech/News (the first one being well in advance) before anything changes. Is this not currently the plan re comms? /genq

Also, I wish we could document a way to export your notifications before they're gone. I think the action=query&meta=notifications API would be an acceptable way to do this, were it not for the fact that it is limited to 50 notifications – could we temporarily increase this limit to 2000? (perhaps with a rate limit to avoid issues) [I can write the patch for this]

I think it would be more useful to have a JavaScript snippet, possibly hosted as a Toolforge tool, which handles the continuation to fetch all notifications instead of temporarily raising the limit.