Spark SQL pass variable to query

Question

I have looked all over for an answer to this and tried everything. Nothing seems to work. I'm trying to reference a variable assignment within a spark.sql query in python. Running python 3 and spark version 2.3.1.

bkt = 1

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
                FROM pwrcrv_tmp\
                where EXTR_CURR_NUM_CYC_DLQ=$bkt\
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
                group by ((year(fdr_date))*100)+month(fdr_date)\
                order by ((year(fdr_date))*100)+month(fdr_date)")

prime.show(50)

The error:

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts                FROM pwrcrv_tmp         where EXTR_CURR_NUM_CYC_DLQ=$bkt                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')                group by ((year(fdr_date))*100)+month(fdr_date)                order by ((year(fdr_date))*100)+month(fdr_date)")
                                                                                                                                                                                                                                                                                                                                                                                         ^
SyntaxError: invalid syntax

bkt = 1 prime = spark.sql("SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\ FROM pwrcrv_tmp\ where EXTR_CURR_NUM_CYC_DLQ="%bkt%"\ and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\ group by ((year(fdr_date))*100)+month(fdr_date)\ order by ((year(fdr_date))*100)+month(fdr_date)") prime.show(50) — email83
– email83, Commented Nov 1, 2019 at 14:34
Is this a question? Not sure why you've posted more code in a comment, as well. Please read How to Ask a Question too. — David Buck
– David Buck, Commented Nov 1, 2019 at 14:38
First of all s"..." is a syntax error - what is that supposed to mean? Secondly, trying to format a string with $bkt is not valid python syntax. Look up String formatting in python — pault
– pault, Commented Nov 1, 2019 at 14:44
The title of my post is my question. I got the s"..." from this answer which was marked correct on stackoverflow.stackoverflow.com/questions/37284216/… — email83
– email83, Commented Nov 1, 2019 at 14:57
@email83 I don't know what language that is, but the answer you're looking for is this one: stackoverflow.com/a/37284354/5858851 — pault
– pault, Commented Nov 1, 2019 at 15:49

David · Accepted Answer · 2019-11-01 17:17:10Z

1

I found the correct syntax buried in this databricks post.

https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html

You add a lower case f in front of the query and wrap braces around the name of the variable in the query.

bkt = 1

prime = spark.sql(f"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
            FROM pwrcrv_tmp\
            where EXTR_CURR_NUM_CYC_DLQ={bkt}\
            and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
            group by ((year(fdr_date))*100)+month(fdr_date)\
            order by ((year(fdr_date))*100)+month(fdr_date)")


prime.show(50)

edited Nov 1, 2019 at 17:17

David

11.6k4 gold badges44 silver badges46 bronze badges

answered Nov 1, 2019 at 17:16

email83

211 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Swiffy Over a year ago

Greetings from 2023! I just wanted to add that this same method also works with multiline query strings expressed with triple quotes like: q = spark.sql(f"""SELECT * FROM {bkt}""")

n1tk · Accepted Answer · 2023-01-20 23:35:33Z

0

This should work

p_filename ='some value'
z='some value'

query = "INSERT into default.loginfordetails (filename,logdesc) values ('{}','{}') ".format(p_filename,z)
 
spark.sql(query)

edited Jan 20, 2023 at 23:35

n1tk

2,5502 gold badges25 silver badges36 bronze badges

answered Jul 11, 2022 at 11:42

Nihar Handoo

1

1 Comment

Community Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

n1tk · Accepted Answer · 2023-01-22 11:33:09Z

Since your query do use multi-line query is it advised as good coding style, to use """ """ to be easier and avoid slash all together as a good coding style"\" that does have a lot of issues or conversion to be parsed in python and pyspark in general or when using nbconvert for notebook to scrip and variables in parentheses {} inside of query or use .format(bkt) :

option{}:

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={bkt}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""")

prime.show(50);

option .format():

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""".format(bkt)

prime.show(50);

Collectives™ on Stack Overflow

Spark SQL pass variable to query

3 Answers 3

1 Comment

This should work

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

This should work

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related