How to bulk load ~1M of JSON files into Redis?

Question

Here is my current code in bash (based on https://redis.io/docs/latest/develop/use/patterns/bulk-loading):

head=$(redis-cli -h $redis_server get .git-head)
if [[ ! $head ]]; then
    redis-cli -h $redis_server flushdb
    for fileOrFolder in $(ls -1); do
        time {
            find $fileOrFolder -type f |
                LC_ALL=C xargs -n1 bash -c 'echo -e *3\\r\\n\$3\\r\\nSET\\r\\n\$${#0}\\r\\n$0\\r\\n\$$(stat -c%s $0)\\r && cat $0 && echo -e \\r' |
                redis-cli -h $redis_server --pipe
        }
    done
    redis-cli -h $redis_server set .git-head $(git rev-parse HEAD)
fi

And it works, but there is a problem - the values are of the type string, not JSON. At least this is what Redis Insights tells me.

My question is - how can I modify the code to make sure the stored values are of the JSON type and keep the bulk load performance? Or even - should I modify it at all? Maybe the string type is good enough and would not limit our query abilities.

EDIT 1

To emphasize - the files are all JSON files already. To store them as JSON would mean to store them exactly as they are, only indicate to Redis that the values are valid JSONs.

dongocanh96 · Accepted Answer · 2024-09-09 03:10:07Z

0

If you don't need to perform complex queries on the data itself, then using strings is fine.

If you still want to store in JSON type, you can use JSON.SET command instead of SET

You can modify your code as below:

time {
        find $fileOrFolder -type f |
            LC_ALL=C xargs -n1 bash -c 'file_content=$(cat $0 | jq -Rs .); \
            file_size=$(stat -c%s $0); \
            redis-cli -h $redis_server json.set $0 . "{\"file\":\"$0\",\"size\":\"$file_size\",\"content\":$file_content}"' 
    }

edited Sep 9, 2024 at 3:10

answered Sep 9, 2024 at 3:04

dongocanh96

1291 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

OneCricketeer Over a year ago

What if $file_content has quotes or other non escaped characters?

mark Over a year ago

The files are already valid JSON. But you are eliminating the bulk load feature completely here. Let me emphasize in the question that I still want to have the bulk load performance.

dongocanh96 Over a year ago

@mark I forgot that. And i think the use of jq tool is enough to handle quotes or other non escaped characters.

Lior Kogan Over a year ago

Note that there is no "bulk load" for the JSON data structure. Redis will need to phase your input and construct an internal tree representation. JSONs are not stored internally as strings. If you don't plan to use JSON.XXX commands nor index and search JSONs with FT.XXX commands - you should better use strings.

mark Over a year ago

So it does not make sense to pipe JSON.SET commands similarly to the SET command, because redis --pipe would not behave as efficiently in that case? I thought redis --pipe is going to spare the 1M individual round trips.

Collectives™ on Stack Overflow

How to bulk load ~1M of JSON files into Redis?

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related