0

I'm trying to calculate a hash from a string for best-effort ordering and partioning purposes in Athena. There is no String to hashCode() similar in Athena, so as a best effort, I try to get the 2nd character and calculate its codepoint and get the modulus. (As I said, best effort, maybe a nice effort)

Consider the query:

SELECT
    doc_id,
    substring(doc_id, 2, 1),
    typeof(split(substring(doc_id, 2, 1)))
FROM events LIMIT 100

The 3rd row returns a varchar but the codepoint function expects a varchar(1) and casting it does not work as cast(substring(doc_id, 2, 1) as varchar(1)).

FUNCTION_NOT_FOUND: line 6:5: Unexpected parameters (varchar) for function codepoint. Expected: codepoint(varchar(1))

How can I accomplish this task without modifiying the data source? I'm open to ideas.

1 Answer 1

3

You can compute a hash code with the xxhash64 function. It takes a varbinary as input, so first cast the string to that type. Since the function also returns a 64-bit varbinary value, you can convert it to a bigint via the from_big_endian_64 function

WITH t(x) AS (VALUES 'hello')
SELECT from_big_endian_64(xxhash64(cast(x AS varbinary)))
FROM t

output:

        _col0
---------------------
 2794345569481354659
(1 row)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.