Page MenuHomePhabricator

Deepcategory search does not show any results on Commons instead of results up to the configured limits
Closed, ResolvedPublic1 Estimated Story PointsBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Search for deepcategory:"Audio files of music" -deepcategory:"Audio files of music by genre" (link) in the Wikimedia Commons search

What happens?:
It does not show any search results. (It also shows no error message so the user is left clueless as to why that is but that is a separate issue T376439)

What should have happened instead?:
Deepcategory searches should not fail but show the results up to the extent possible and display in the error message which categories have been trimmed off.

  • For example, instead of displaying no files, it would display many / probably most files and an info message in the box like "Deep category query returned too many categories so MIDI files of melody settings by Peter Gerloff‎ and Chill-out music from Free Music Archive‎ have been excluded". This is especially useful for category branches that have just one very deeply nested branch that does not get considered in the search results.
  • Maybe I could add some better examples if that helps but you can probably find very large or deeply nested categories in Videos in English for example. The user can then look at / use the results with the awareness that files in the named category/ies have not been included here and the user may also later separately deepcategory search the trimmed off category separately. The full results may have 563 results and it shows only 480 but that is usually far better than showing no results at all and probably often already showing what the user wanted to see since the deeply nested category is too far from the specified cat to still contain very relevant results to it (note that one can go through the results and exclude categories using -deepcategory or -incategory so showing too many results isn't a problem albeit showing the source of categorization per file would be very useful).

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):
By the way some info about which categories are the top categories in the scan by the number of files directly contained in them would also be very useful. This could be useful for all sorts of things, for example to exclude a category that is not of interest but adding many files to the results like e.g. "Videos from studies uploaded with Open Access Media Importer‎" (about 10 k files) in "Videos of science". It would be great if at some point the current limits could be increased but far more needed is seeing some results instead of no results at all per scan.

It's unclear whether self-categorizations also cause the deepcategory search to fail (if so it could be prevented by comparing the cat title to the titles in the array of already scanned categories or any duplicate items). That may be a separate issue.

Event Timeline

Please do not add team project tags without their consent - thanks.

Okay didn't know it was a team project tag.

@dcausse wrote "when negating the keyword with -deepcategory:"Large Tree" the partial results could possibly be matching a file actually part of the Large Tree category tree" this is a good point – however, I think in most uses of deepcategory categories are not excluded that way. Moreover, when excluding files of a category that way the category branch is usually small. In any case: simply show a special warning about that if a category exclusion did not work. Maybe even add the useful info that the user could look for a smaller subcategory of it to exclude instead.

showing the list of categories that are not included might not be practical because the list could be relatively big to fit into an error message.

I suggest that if it's more than e.g. one category, it's shown in an auto collapsed box that the user can uncollapse if not in a collapsed box by default. Alternatively one could have it only show the info that some subcategory/ies of category X that the user specified in the search query is too deep or even only the info that some category was too deep without specifying which. However, I think the best solution would be to enable the user to see the trimmed off categories even if that's many (one can glance over them and then collapse the box). Note that showing partial results instead of none is often useful (I would estimate in sth like 80% of cases) so even when there's cases where the search results are not useful anymore, the many cases where they are make this well warranted to have. Also if you search Google or DuckDuckGo and you made a strange query due to which it found only relatively few items would you like to have it display some results or none? I think some because these may be all or exactly what the user was looking for and the user may change the query, at worst case one could prompt the user with something like "The results were incomplete because XYZ, do you want to see the results anyway?".

Is the example supposed to be deepcategory:"Videos English" rather than deepcategory:"Videos in English"? The latter seems to work fine. We don't show an error message when a user searches for a category that doesn't exist, which seems reasonable.

@TJones I'll remove that example. Thought I had it fixed here but it must have been somewhere else.

Gehel triaged this task as Medium priority.Nov 25 2024, 4:24 PM
Gehel moved this task from needs triage to Current work on the Discovery-Search board.
Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Search.
Gehel set the point value for this task to 1.Nov 25 2024, 4:46 PM

It looks like it will probably be straightforward to enable partial results, and we've moved this to Current Work.

While discussing it, we were wondering if being able to modify a query that gets partial results to be more limited (and thus giving it a deterministic set of results) would be useful? In particular, whether limiting the depth of the deepcat query would be useful. If, as a searcher, that's too into the weeds, we can leave it at showing partial results for now.

Change #1097454 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] deepcat: Don't short-circuit over fetches

https://gerrit.wikimedia.org/r/1097454

@TJones Sounds great. Yes ways to narrow the results would be very useful. So it would be best if it showed partial results and along with at the top showed some ways to narrow these down.

  1. A slider to change depth (I think FastCCI has one such but I can't check since it doesn't load; one could also add a search-operator like depth:x in addition)
  2. Ways to filter out various types of subcategories such "xyz in fiction" and "xyz in art" categories (these are named in always-the-same standardized ways and are in the same category and one could exclude them for example by appending -deepcategory:"the relevant subcat found in this branch"). There would be filters to select from and people could add more of these filters (3. is also useful for new filters).
  3. A way to filter out categories depending on the results – there could be some info at the top with the subcategories sorted by how many files they contain like:
1. CDC videos in English (3503 files) x
2. Videos in English (2700 files) x
3. Wikimedia videos in English (980 files) x
[show 50 more; 214 categories in total]

where one can exclude a category/ies by clicking on the x and then refreshing the search. This way one can exclude various large or unrelated categories which may often get excluded via the 2. filters (often because of particular types of subcategories like the in art subcats – e.g. a deepcategory search of cat Maps of the world does not contain only instances of world maps and due to miscategorizations cat Microscopic images also contains various files/subcats that aren't microscopic images).

deepcategory search improvements enable other things like the DeepcatSearch and FastCCI wall of images for a category, it's not just for searching but could also be used for different view-modes etc.

Edit: detailed this now at https://meta.wikimedia.org/wiki/Community_Wishlist/Wishes/In_Commons_category_deepcategory_view_mode_(wall_of_images),_allow_easily_filtering_offtopic_subcats

Change #1097454 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] deepcat: Don't short-circuit over fetches

https://gerrit.wikimedia.org/r/1097454

This is now deployed to production, the example query in the ticket now gives results along with the appropriate warning.

Great, thanks for solving this! Has something changed about this or is this not applied for some cases?: I was looking for example photos of factory interiors for a new photo challenge proposal and tried to use deepcategory:"Manufacturing by product" but it does not show any results in the MediaSearch. Why is that? The special search does show results with the note "A warning has occurred while searching: Deep category query returned too many categories. Only a subset of categories has been applied." but the MediaSearch doesn't. Maybe a separate bug should be filed about it – iirc it did show results in MediaSearch for other categories of that type.

Indeed something wrong is happening there. Checking the query generated by the provided link it is limiting to only Manufacturing by product and does not include the rest of the deepcategory filter. This looks to perhaps be a wider issue with MediaSearch, best to open a dedicated ticket about deepcat+mediasearch compatability.

I've opened T391876 to track the problem of Special:MediaSearch

What happened now? Didn't this work just two days or so ago – see T391876

However, now deepcategory:"Manufacturing by product" shows no results and:

Deep category search timed out. Most likely the category has too many subcategories

instead of showing results up to some limit with the error:

A warning has occurred while searching: Deep category query returned too many categories. Only a subset of categories has been applied.

as is the case with the example in the issue description. By the way, Gehel re "Adding more UI and more ways to navigate categories as part of Search" – it seems less like an issue about UI but more about a way of loading further results results for those cats where deepcat fails to show all results so is mostly a backend thing and I'd create a new issue but in any case first this preceding issue needs to be reopened I think since now it again / still does not show any results.

Please in general file new tickets for new problems instead of reopening old ones. See previous comment:

This looks to perhaps be a wider issue with MediaSearch, best to open a dedicated ticket about deepcat+mediasearch compatability.

Well the reason for why I reopened this issue is that it seems to be the same problem (again or still). It's not about MediaSearch, it also doesn't work (anymore iirc) with SpecialSearch. However, now results are showing again so it seems like it was a temporary problem that's fixed now.