Skip to content

Commit 2149488

Browse files
committed
#38 added readme for crawler
1 parent e7a2614 commit 2149488

3 files changed

Lines changed: 27 additions & 2 deletions

File tree

DEPLOY_GITLAB_DOCKER.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ docker pull sonatype/nexus3:3.79.0
4141
docker run -d -p 8081:8081 --name lasso-nexus sonatype/nexus3
4242

4343
# configure
44-
docker exec -it nexus bash
44+
docker exec -it lasso-nexus bash
4545
cat sonatype-work/nexus3/admin.password
4646
# configure in repos in http://localhost:8081/ (lasso-deploy, lasso-web)
4747
# see https://softwareobservatorium.github.io/web/docs/infrastructure/nexus

arena/src/main/java/de/uni_mannheim/swt/lasso/arena/search/SolrInstance.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ public static SolrInstance mavenCentral2017() {
7373
}
7474

7575
public static SolrInstance mavenCentral2023() {
76-
return new SolrInstance("mavencentral2023", "", "", "http://lassohp10.informatik.uni-mannheim.de:8983/solr/mavencentral2023/");
76+
return new SolrInstance("mavencentral2023", "", "", "https://odisse.informatik.uni-mannheim.de/solr/mavencentral2023/");
7777
}
7878

7979
public static SolrInstance secorpora2022() {

crawler/README.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Changes 08.05.25
2+
3+
Issues with automatic download of index (too large to unzip programmatically?)
4+
5+
## Manual Download of Index
6+
7+
see https://maven.apache.org/repository/central-index.html
8+
9+
see https://repo.maven.apache.org/maven2/.index/
10+
11+
download the Central index: nexus-maven-repository-index.gz
12+
download Maven Indexer CLI and unpack the index to raw Lucene index directory:
13+
14+
```bash
15+
java -jar indexer-cli-5.1.1.jar --unpack nexus-maven-repository-index.gz --destination central-lucene-index --type full
16+
```
17+
18+
Results in same directory as created by crawler component
19+
20+
Run crawler
21+
22+
```bash
23+
wget "http://swtweb.informatik.uni-mannheim.de/nexus/repository/maven-snapshots/de/uni-mannheim/swt/lasso/crawler/1.0.0-SNAPSHOT/crawler-1.0.0-20250508.085922-88.jar" # identify latest snapshot ..
24+
java -Xms60G -Xmx60G -Dindexer.work.path=lasso_crawler -Dbatch.maven.repo.url=https://repo1.maven.org/maven2/ -Dlasso.indexer.worker.threads=8 -Dbatch.maven.index.update=false -Dbatch.maven.latest.head=1 -jar crawler-1.0.0-20250508.085922-88.jar
25+
```

0 commit comments

Comments
 (0)