Page MenuHomePhabricator

Search operator intitle does not work if it includes a - minus sign on Commons
Closed, InvalidPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Search for -intitle:"LL-Q" on Wikimedia Commons (link)
  • Click on Audio and sort by recency

What happens?:
There are lots of files with "LL-Q" in the title so the search operator does not work. It does work only if -intitle:"LL" but that also excludes other audio files with LL in the title.

See the help page on searching WMC.

What should have happened instead?:
The filter should work and the files with LL-Q in the title be excluded from the results.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

I found this because I went to the page that shows new audio files which is cluttered with audio files of brief pronunciations usually titled "LL-Q…" and there is no way to filter the results in any way except for excluding bot uploads. The above is a way to view recent audio uploads with filters (here trying to filter away some of the pronunciation files to e.g. view audio files of music and more to e.g. spot copyvios or explore which kinds of audio files are getting uploaded to WMC).

Edit: here is the query that filters out these LL-Q files using regex (it currently seems to load fast enough, please edit this link if it can be improved)

Event Timeline

I don't see how this is relevant to the Commons community folks only (as they were tagged); this seem to be about the CirrusSearch codebase and behavior instead, so I'd recommend to set that project tag instead?

Maybe because it searches words.
Use regexp
-intitle:/LL\-Q/

@Aklapper Right, I should have tested whether this issue also exists on other Wikimedia sites. I'll change the tags accordingly but haven't tested it elsewhere.

@Wargo Thanks, this does work as a workaround (link). So I guess this really is low importance. Please be aware that even techy editors don't know this and even if they try something it would be e.g. -intitle:"LL\-Q" rather than -intitle:/LL\-Q/. Maybe a note should be added to the help page linked above.

Izno closed this task as Invalid.EditedJul 28 2024, 6:13 PM
Izno subscribed.

Yes, in general insource and intitle ignore punctuation. This is a known quantity and is unlikely to change (I'm pretty sure due to performance reasons). en wiki documents it for insource but the same applies for intitle. If you want to find punctuation you do have to resort to a regex search, as suggested by Wargo.

@Izno It only documents that "non-alphanumeric characters are ignored" but not this workaround. The second row on regexpressions does not clarify that and how these can be used for such characters.

Furthermore, I think "a note should be added to the help page linked above" and it would be better to at least keep it open until that is done but I'll go ahead and add something but it may not be as good as if you or Wargo added it (maybe you can check & edit).

Lastly, it seems like when using regex the search is much slower. So I think it would be best if it was either sped up when used for these workarounds or that this workaround is made redundant. But I guess that would be lowest importance as it's rarely needed and has a workaround.

Yes, regex search alone is slower. That is documented in multiple places on both the English WP and MediaWiki wiki pages documenting search. The workaround for that is also I believe documented in one or both (to whit, include an insource term without regex and then also the insource regex search, at least in the general case).

(That we have at least 3 pages all in English, Commons and MediaWiki wiki and English wp........)

In general documentation onwiki only tasks don't get worked on Phab. Making an improving change onwiki is good enough for this issue to me, and someone watching the page(s) can followup if necessary.

I think if it's in quotes it would be possible to make this work without being much slower by simply escaping those punctuation characters....wouldn't that be possible?