0

Setup: Kubernetes Solr Cloud (bitnami chart). Current version 8.11 (also looking to go to 9)

I've tried various methods to get a larger file 120Mb loaded into KeepWordFilterFactory.

Main problems => zookeeper timing out. Then tried embedding the file in the image and loading it from there.

<filter class="solr.KeepWordFilterFactory" words="${keepwords.file.path}" ignoreCase="true"/>

The problem here is that Solr cloud prepends the path /configs/coreName//opt/bitnami/solr/server/solr/custom_resources/keepwords.txt"

This gets's added => /configs/coreName/

Also tried sending it as zookeeper config, but I understand it's not designed to distribute such large files.(increasing -Djute.maxbuffer is not enough).

Also checked managed resources, but these seem to only exist for stopwords and synonyms.

What would be the right way of loading such a file in config? (do note that I probably need to change the keepwordsFilterFactory approach, but for now I would like to use it with existing config it worked nicely).

The exact error is:

org.apache.solr.common.SolrException: Error CREATEing SolrCore 'corename_shard1_replica_n1': Unable to create core [corename_shard1_replica_n1] Caused by: Invalid path string "/configs/corename//opt/bitnami/solr/server/solr/custom_resources/keepwords.txt" caused by empty node name specified @21
0

1 Answer 1

0

in SolrCloud you can’t load a 120 MB file into ZooKeeper (even with -Djute.maxbuffer), and absolute paths fail because Solr treats them as ZK configset resources unless you explicitly allow external paths. the way to fix this is to mount the file on a filesystem accessible to all Solr pods (e.g via a Kubernetes PersistentVolume or by embedding it in the image) at a stable location such as /solr-extra/keepwords.txt, then start Solr with -Dsolr.allowPaths=/solr-extra -Dkeepwords.file.path=/solr-extra/keepwords.txt (in the Bitnami chart this can be passed through extraEnvVars or solrOpts). in your schema you can then reference the file either with ${keepwords.file.path} or directly as an absolute path (words="/solr-extra/keepwords.txt"), and Solr will load it from disk rather than from ZooKeeper. This will avoid the path mangling you had seen (/configs/coreName/...) and is the only reliable way to use a large keepwords list in SolrCloud; ZooKeeper and managed resources are unsuitable for files of that size

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.