Page MenuHomePhabricator

CirrusSearch should allow filtering on page creation and last edit timestamps
Closed, ResolvedPublic5 Estimated Story Points

Description

CirrusSearch indexes the creation & last edit timestamp, it could be interesting to introduce a keyword to filter search results based on these fields (see T392283#11101044 for a possible use-case in a product features).

The exact nature of the filter is yet to be defined but could resemble something like this:
lasteditdate:<now-1d: filter pages edited before yesterday
lasteditdate:>now-1d: filter pages edited in the last 24hours
lasteditdate:<2024-01-01: filter pages edited before 2024

A similar keyword like creationdate should be added as well.

Things to consider:

  • time format
    • granularity and no comparison: should we support lasteditdate:2024 and possibly transform it to a range like: 2024-01-01T00:00:00 <= date < 2025-01-01T00:00:00?
    • timezone: should we always use UTC? or should we allow users to use their own Time offset (found in their user preferences)? If yes how?
  • period format: should we support all of https://www.php.net/manual/en/dateinterval.format.php ?

AC:

  • CirrusSearch is able to filter pages based on their last edit timestamp and creation date
  • The new syntax is documented

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel triaged this task as High priority.Sep 11 2025, 8:08 AM
Gehel moved this task from needs triage to Feature Requests on the Discovery-Search board.

gonna be bold and say that IMO this is probably also worth a User-notice if/when it's deployed (in case these filters would be of interest to any community members), in addition to the other new CirrusSearch filters that have been developed recently :)

Looking over the date field docs and testing a few things, it looks like we can fairly easily support the syntax requested above. To me the biggest questions are around localization. As stated in the ticket there is the question of localtime vs UTC. There is also the question of date formats, is "05/04/25" in april, or may? Do we perhaps only accept YYYY, YYYY-MM, and YYYY-MM-DD?

I suppose my proposal would be to keep it simple:

  • Supported date formats: YYYY, YYYY-MM, YYYY-MM-DD
  • No supported time format, only works on units of year/month/day
  • Days are defined by UTC, not local time.
  • Support >, >=, <, <= prefixes
  • Date math only supported attached to "now"
  • Date math is equivalent to specifying a specific day. Only subtraction from now is supported.
  • Accepted format: (<|<=|>|>=)?(YYYY(-MM(-DD)?)?|now(-\d+[ymd])?)

Example Queries:

querydesc
>=2024pages edited in 2024 or later
>2024pages edited in 2025 or later
2024pages edited in 2024
>2024-05pages last edited after may 2025
>now-1ypages edited within the last ~365 days
<now-1ypages not edited within the last ~365 days
nowpages edited today
now-1dpages edited yesterday
>=now-1dpages edited yesterday or today
  • Days are defined by UTC, not local time.

One worry is that this might be potentially confusing for projects that have $wgLocaltimezone set to something other than UTC — as then, an article's history page might say that it was created/last edited on one day, but (IIUC) CirrusSearch filters would treat it as being created/last edited on a different day.

  • Accepted format: (<|<=|>|>=)?(YYYY(-MM(-DD)?)?|now(-\d+[ymd])?)

We, Growth, can make that work, but I wonder if we could add h for hours too? So that we can say <now-24h (or maybe <now-36h if we want to have more slack). Though, if that is too complicated, we can also use <=now-2d for our offset to be sure that we exclude pages edited within the last 24 hours.

  • Accepted format: (<|<=|>|>=)?(YYYY(-MM(-DD)?)?|now(-\d+[ymd])?)

We, Growth, can make that work, but I wonder if we could add h for hours too? So that we can say <now-24h (or maybe <now-36h if we want to have more slack). Though, if that is too complicated, we can also use <=now-2d for our offset to be sure that we exclude pages edited within the last 24 hours.

I can fit hours into here if it's needed, but I do wonder if it will feel a bit awkward with consistently rounding time. What i mean is >2024 and <2024 round their comparisons to the nearest year, similarly for months or days. This feels natural (to me, at least) when working with those units. With hours we would have lasteditdate:>now-2h, do we also round that to hours? It feels more natural to me for such short timespans to be rounded to minutes, but that would lack consistency and make the system harder to explain. Not sure what the right approach is, but switching between them isnt too hard.

  • Days are defined by UTC, not local time.

One worry is that this might be potentially confusing for projects that have $wgLocaltimezone set to something other than UTC — as then, an article's history page might say that it was created/last edited on one day, but (IIUC) CirrusSearch filters would treat it as being created/last edited on a different day.

Hmm, I was hoping to avoid timezones because they open up a huge can of complexity. I can look into supporting wgLocaltimezone, it's certainly possible. I worry what we are willing to offer might not be enough though. As far as i can tell history pages are localized to the logged in users timezone, but we would really prefer to keep per-user preferences out of the search queries. The general goal is that the same query issued by two users should return nearly the same result set. If we localize to $wgLocaltimezone it might be enough to cover many of the editors, but it will still vary from the history page for many users.

Change #1187911 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] WIP: Implement syntax for range queries on dates

https://gerrit.wikimedia.org/r/1187911

[…]
As far as i can tell history pages are localized to the logged in users timezone, but we would really prefer to keep per-user preferences out of the search queries. The general goal is that the same query issued by two users should return nearly the same result set.

I believe that this is indeed the case; but FWIW, I think I share your desire to keep user-preferences out of this (for the reason you've mentioned). I was more thinking that - if a wiki itself is deliberately configured to use a certain time zone - it might potentially be better for the search filters on that wiki to reflect that configured timezone.
However, I will happily defer to the Search Platform team about what you think is best here (re. UTC vs. $wgLocaltimezone) :)

If we localize to $wgLocaltimezone it might be enough to cover many of the editors, but it will still vary from the history page for many users.

Indeed. I guess personally, I would probably say that this might just have to be an expected result of setting one's timezone-preference to something other than the wiki's default; given that the alternative would mean that different users can receive different results for the same search query.

Either way, whichever decision on this is made, hopefully any potential confusion can be (at least somewhat) mitigated by documenting the timezone-related behaviour of the new filters (and how/why it [doesn't] react to changes in a user's preferences).

I can fit hours into here if it's needed, but I do wonder if it will feel a bit awkward with consistently rounding time. What i mean is >2024 and <2024 round their comparisons to the nearest year, similarly for months or days. This feels natural (to me, at least) when working with those units. With hours we would have lasteditdate:>now-2h, do we also round that to hours? It feels more natural to me for such short timespans to be rounded to minutes, but that would lack consistency and make the system harder to explain. Not sure what the right approach is, but switching between them isnt too hard.

@dcausse offered the idea to vary precision of datemath via the keyword, now would operate on hourly precision, and today would operate on daily precision. That seems reasonably easy to explain and should do what we want. So now-4d would be 96 hours ago, but today-4d would be 4 days ago, rounded to the beginning of the day.

I can fit hours into here if it's needed, but I do wonder if it will feel a bit awkward with consistently rounding time. What i mean is >2024 and <2024 round their comparisons to the nearest year, similarly for months or days. This feels natural (to me, at least) when working with those units. With hours we would have lasteditdate:>now-2h, do we also round that to hours? It feels more natural to me for such short timespans to be rounded to minutes, but that would lack consistency and make the system harder to explain. Not sure what the right approach is, but switching between them isnt too hard.

@dcausse offered the idea to vary precision of datemath via the keyword, now would operate on hourly prevision, and today would operate on daily precision. That seems reasonably easy to explain and should do what we want. So now-4d would be 96 hours ago, but today-4d would be 4 days ago, rounded to the beginning of the day.

Nice, that feels very intuitive to me!

Proposed Documentation, under the Filters heading:

creationdate and lasteditdate

You can filter search results by date using two filters: lasteditdate: for when pages were last edited, and creationdate: for when pages were first created (the date of their first revision). This allows you to find recently updated content, pages that have not been updated in some time, or pages created within specific time periods. All date queries use the local timezone of the wiki you're searching, your personal timezone settings don't effect the results.

You can add comparison operators before any date to constrain your search:

OperatorMeaningExampleDescription
>aftercreationdate:>2024Pages created after 2024
>=on or afterlasteditdate:>=2024Pages last edited in 2024 or later
<beforecreationdate:<2025Pages created before 2025
<=on or beforelasteditdate:<=2020Pages last edited in 2020 or earlier

Without an operator, the search looks for exact matches within that time period.

The precision of your search depends on how the date is provided. When providing an absolute date it depends on how specific your date is:

PrecisionExampleDescription
yearcreationdate:2025Pages created anywhere in 2025
monthlasteditdate:2025-09Pages last edited in September 2025
daycreationdate:2025-09-01Pages created on September 1, 2025

Alternatively you can provide a relative date using either now or today. The precision of the search will be determined by the keyword used. The keyword can be suffixed by a minus sign (-) followed by a number and a suffix. The suffixes y (year), m (month), d (day), and h (hour) are available for use:

KeywordPrecisionExampleDescription
nowhourlasteditdate:nowPages last edited in the current hour
nowhourcreationdate:now-1hPages created in the previous hour
nowhourlasteditdate:>=now-1dPages last edited in the 24 previous hours
todaydaycreationdate:todayPages created today
todaydaylasteditdate:today-1dPages last edited yesterday
todaydaycreationdate:>today-1yPages created anytime in the last year

While there's no single "range" operator, you can create date ranges by combining two date filters in the same query.

ExampleDescription
creationdate:>=2010 creationdate:<2020Pages created in the 10's (from jan 1 2010 to jan 1 2020).

Change #1187911 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Implement syntax for range queries on dates

https://gerrit.wikimedia.org/r/1187911

Tested the keywords in prod, looks to be working as expected. Updated Help:CirrusSearch on mw.org with the proposed documentation from above.

How should this be worded for Tech News?

Something like this should be reasonable, along with the link to the documentation

New date filters, creationdate: and lasteditdate:, are now available in the search engine. This allows users to filter search results by a page's first or last revision date. The filters support comparison operators (e.g., >2024) and relative dates (e.g., today-1d), making it easier to find recently updated content or pages within specific age ranges.