-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Hi!
The code in class Bio.Entrez.DataHandler — that is also powering API functions Bio.Entrez.read and Bio.Entrez.parse — parses arbitrary XML content in a way where contained DTD and XSD URLs are requested via a HTTP GET request through stdlib function urllib.request.urlopen unless a file with the same basename — the last path component of the URL — matches a local file in the filesystem based DTD cache (or XSD cache respectively):
biopython/Bio/Entrez/Parser.py
Lines 581 to 584 in d07dde9
| handle = self.open_xsd_file(os.path.basename(schema)) | |
| # if there is no local xsd file grab the url and parse the file | |
| if not handle: | |
| handle = urlopen(schema) |
biopython/Bio/Entrez/Parser.py
Lines 1126 to 1131 in d07dde9
| handle = self.open_dtd_file(filename) | |
| if not handle: | |
| # DTD is not available as a local file. Try accessing it through | |
| # the internet instead. | |
| try: | |
| handle = urlopen(url) |
That urlopen allows an attacker to craft XML that when parsed makes the parser do arbitrary(!) HTTP GET requests which can be used to e.g. access internal network resources or cause denial of service: the code is vulnerable to server-side request forgery via a form of "doctype XXE" (XML external entity attack).
Demo: biopython_doctype_xxe_demo.py
The very same code in Biopython is also vulnerable to MITM (due to not enforcing TLS) and cache poisoning (..), but that is secondary and likely goes away for free once the SSRF issue is plugged properly.
Official Python >=3.15 docs are now warning of problems like these at https://docs.python.org/3.15/library/pyexpat.html#xml.parsers.expat.xmlparser.ExternalEntityRefHandler .
I hope that you will find some complete fix to this problem, so that users are safe from attacks through BioPython's XML parser.
Thanks and best, Sebastian