Load external file in Solr Cloud

Question

Setup: Kubernetes Solr Cloud (bitnami chart). Current version 8.11 (also looking to go to 9)

I've tried various methods to get a larger file 120Mb loaded into KeepWordFilterFactory.

Main problems => zookeeper timing out. Then tried embedding the file in the image and loading it from there.

<filter class="solr.KeepWordFilterFactory" words="${keepwords.file.path}" ignoreCase="true"/>

The problem here is that Solr cloud prepends the path /configs/coreName//opt/bitnami/solr/server/solr/custom_resources/keepwords.txt"

This gets's added => /configs/coreName/

Also tried sending it as zookeeper config, but I understand it's not designed to distribute such large files.(increasing -Djute.maxbuffer is not enough).

Also checked managed resources, but these seem to only exist for stopwords and synonyms.

What would be the right way of loading such a file in config? (do note that I probably need to change the keepwordsFilterFactory approach, but for now I would like to use it with existing config it worked nicely).

The exact error is:

org.apache.solr.common.SolrException: Error CREATEing SolrCore 'corename_shard1_replica_n1': Unable to create core [corename_shard1_replica_n1] Caused by: Invalid path string "/configs/corename//opt/bitnami/solr/server/solr/custom_resources/keepwords.txt" caused by empty node name specified @21

Harpreet · Accepted Answer · 2025-09-25 12:03:35Z

in SolrCloud you can’t load a 120 MB file into ZooKeeper (even with -Djute.maxbuffer), and absolute paths fail because Solr treats them as ZK configset resources unless you explicitly allow external paths. the way to fix this is to mount the file on a filesystem accessible to all Solr pods (e.g via a Kubernetes PersistentVolume or by embedding it in the image) at a stable location such as /solr-extra/keepwords.txt, then start Solr with -Dsolr.allowPaths=/solr-extra -Dkeepwords.file.path=/solr-extra/keepwords.txt (in the Bitnami chart this can be passed through extraEnvVars or solrOpts). in your schema you can then reference the file either with ${keepwords.file.path} or directly as an absolute path (words="/solr-extra/keepwords.txt"), and Solr will load it from disk rather than from ZooKeeper. This will avoid the path mangling you had seen (/configs/coreName/...) and is the only reliable way to use a large keepwords list in SolrCloud; ZooKeeper and managed resources are unsuitable for files of that size

Collectives™ on Stack Overflow

Load external file in Solr Cloud

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related