Extract all internal and external links from a URL in Python.
Links-Extractor fetches one or more web pages and lists the internal and external hyperlinks found on each page. A link is treated as internal when its host matches the host of the page being scanned, and external otherwise. Empty anchors and javascript:, mailto:, and tel: links are ignored.
pip install links-extractor-cli
This installs the links-extractor command. You can also run the script directly from a clone (python3 extractor.py ...).
- Python 3
- Dependencies:
requests,beautifulsoup4,lxml
Install them with:
pip install -r requirements.txt
Pass one or more URLs as arguments:
links-extractor https://example.com
python3 extractor.py https://example.com
python3 extractor.py https://example.com https://www.python.org
Redirect the output to a file:
python3 extractor.py https://example.com > out.txt
For each URL the script prints the count and list of internal links followed by the count and list of external links.
A full write-up is available at http://com.puter.tips/2016/12/extract-all-internal-and-external-links.html
You may also find the companion project useful: https://github.com/com-puter-tips/SEO-Analysis
If you use this software, please cite it using the metadata in CITATION.cff.
Distributed under the GNU General Public License v3.0. See LICENSE.