-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Target suffix globs are optionally matched #18892
Description
Type: feature request
CONTRIBUTING.md discourages suffix globs on targets, and shows the Google rules as examples of where it is appropriate.
I was a bit surprised to find that Google.tld_Subdomains.xml only includes
<target host="accounts.google.com.*" />
and that suffix glob is supposed to match the test case http://accounts.google.com/
The same occurs in Google.xml, with target www.google.com.* and test cases http://www.google.com/about.
This non-mandatory matching of suffix globs is only needed by tests for those two Google rulesets, but there are a few other rulesets using target suffix .* (only HSBC.xml and Ticketmaster.xml it seems) so it is possibly being utilised in those other rulesets without tests ensuring it works.
However this is messy to implement, and IMO should be replaced with explicit targets without the .* when that is needed. Foremost, this would make it obvious that extra test cases are needed for those extra targets. Also I am pretty sure that any library implementing .* target suffixes will assume they are mandatory, and so these very important targets will not work. The libraries I have looked at which reside in other repos are not running the .xml test suite, and often have quite small tests, so I would wager this is a bug in most implementations.
In addition, it means extra effort is needed to implement sanely/safely, as for e.g. hsbc.* in HSBC.xml surely should not also match hsbc (possibly a hostname on local domain?), and hsbc.co.* shouldnt match hsbc.co (doesnt exist, but could be registered like google.co, and could be legitimately owned by someone other than the owner of hsbc.co.uk and friends, depending on the policies of the .co domain, owned by U.S. for-profit Neustar). Even if those cases are not highly probable/problematic, it is still extra lookups needed to resolve the appropriate ruleset to use.