0

I have a list of c.11k strings that I need to partially match to string IDs within a BigQ table.

T1 is the original BQ table and T2 is a table with a column of the 11,000 strings

At the minute I have following query

Select T1.ID,T2.ID from T1

JOIN T2 on T1.ID LIKE '%' || T2.ID || '%' 

I have tested this on a subset of the data and it works as expected producing

T1 T2
15688463 56884

But I am aware that when I scale this to the full population it will be a very expensive query.

Is there a way that I can optimise this query or indeed another method for getting the result (other than splitting the 11,000 into subsets),

Thanks in advance

3
  • 1
    If you want all the partial matches you have no choice but test all the pairs. Commented Jun 26, 2024 at 15:42
  • For large tables, take a look in Search Index feature. Commented Jun 26, 2024 at 16:22
  • My experience is that BigQuery is superb for massive queries. 11,000 records doesn't feel like a lot. Some BigQuery tables have billions of rows. Do you reall mean to look for the T1.ID contained "within" the T2.ID as opposed to a direct equality match? Commented Jun 29, 2024 at 17:34

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.