feat: cache IANA TLDs for faster lookups#390
feat: cache IANA TLDs for faster lookups#390yozachar merged 5 commits intopython-validators:masterfrom
Conversation
|
Hi, validators already uses an environment variable to decide whether or not to throw errors, so I thought (again that) we could do something like this instead: class _TLDList:
"""Cache IANA TLDs."""
cache = set[str]()
def _load_tld_to_memory(tld_file_path: Path):
"""Load IANA TLDs to memory."""
if not _TLDList.cache:
with tld_file_path.open() as tld_f:
_ = next(tld_f) # ignore the first line
_TLDList.cache = set(line.strip() for line in tld_f)
return _TLDList.cache
def _iana_tld():
"""Provide IANA TLDs."""
# # source: https://data.iana.org/TLD/tlds-alpha-by-domain.txt
tld_file_path = Path(__file__).parent.joinpath("_tld.txt")
if environ.get("LOAD_TLD_TO_MEMORY", "False") == "True":
return _load_tld_to_memory(tld_file_path)
with tld_file_path.open() as tld_f:
_ = next(tld_f) # ignore the first line
for line in tld_f:
yield line.strip()Environment |
|
Some questions:
|
|
|
BTW, I wonder if it's worth it to hardcode |
|
Pushed a commit with changes, including my hard-coded list to cover the common TLDs before trying the file. For the documentation, I think it should be covered/linked the domain, hostname, and URL pages, regardless of any other place you think is important. Someone who's just reading about |
yozachar
left a comment
There was a problem hiding this comment.
Feel, free to counter suggest/correct/modify/improve.
|
Should I squash all the commits, or do you want to merge it yourself? |
|
I'll squash and merge. |
|
Thanks for the PR! |
|
Thank you for the help and accepting the feature! |
Follow-up to the discussion in #362.
I wasn't sure what was meant by using
dataclass, as this isn't a data-first class. I used a regular class, instead.One thing that's obviously missing is tests. I'm not familiar with pytest, and don't know how to re-run the existing domain tests after running the new "load" function.
Here are some basic timing results: