Discover the repository software by crawling the repository homepage.
The code in the repository uses Python 3+.
Create a CSV containing a list of repositories to be checked. It should look like this (without the header):
| Homepage URL | Country code | CORE ID | OPEN DOAR ID | ROAR ID |
|---|---|---|---|---|
| http://aura.abdn.ac.uk | GB | 1 | 1767 | |
| http://repository.abertay.ac.uk | GB | 2 | 1589 | |
| http://gtcni.openrepository.com | GB | 3 | 549 | |
| http://eprints.aktors.org | GB | 4 | 1 | 31 |
| http://repository.alt.ac.uk | GB | 5 | 961 |
There's an example input file you can use to test the app.
Afterwards, run:
pip install --upgrade pip && pip install -r requirements.txtmkdir results && mkdir repo_pagespython reposwdiscovery.py /path/to/your/input_file.csv 0- Using the example input provided in this repository:
python reposwdiscovery.py example_input.csv 0
- Using the example input provided in this repository:
cat results/results-Thread-* >> results-$(date +"%d%m%y").csvcat results/errors-Thread-* >> errors-$(date +"%d%m%y").csv
This will produce two CSV files with results, one containing successfully checked repositories and the other containing unsuccessfully processed repositories.