1

I have looked all over for an answer to this and tried everything. Nothing seems to work. I'm trying to reference a variable assignment within a spark.sql query in python. Running python 3 and spark version 2.3.1.

bkt = 1

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
                FROM pwrcrv_tmp\
                where EXTR_CURR_NUM_CYC_DLQ=$bkt\
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
                group by ((year(fdr_date))*100)+month(fdr_date)\
                order by ((year(fdr_date))*100)+month(fdr_date)")

prime.show(50)

The error:

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts                FROM pwrcrv_tmp         where EXTR_CURR_NUM_CYC_DLQ=$bkt                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')                group by ((year(fdr_date))*100)+month(fdr_date)                order by ((year(fdr_date))*100)+month(fdr_date)")
                                                                                                                                                                                                                                                                                                                                                                                         ^
SyntaxError: invalid syntax
6
  • bkt = 1 prime = spark.sql("SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\ FROM pwrcrv_tmp\ where EXTR_CURR_NUM_CYC_DLQ="%bkt%"\ and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\ group by ((year(fdr_date))*100)+month(fdr_date)\ order by ((year(fdr_date))*100)+month(fdr_date)") prime.show(50) Commented Nov 1, 2019 at 14:34
  • Is this a question? Not sure why you've posted more code in a comment, as well. Please read How to Ask a Question too. Commented Nov 1, 2019 at 14:38
  • First of all s"..." is a syntax error - what is that supposed to mean? Secondly, trying to format a string with $bkt is not valid python syntax. Look up String formatting in python Commented Nov 1, 2019 at 14:44
  • The title of my post is my question. I got the s"..." from this answer which was marked correct on stackoverflow.stackoverflow.com/questions/37284216/… Commented Nov 1, 2019 at 14:57
  • @email83 I don't know what language that is, but the answer you're looking for is this one: stackoverflow.com/a/37284354/5858851 Commented Nov 1, 2019 at 15:49

3 Answers 3

1

I found the correct syntax buried in this databricks post.

https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html

You add a lower case f in front of the query and wrap braces around the name of the variable in the query.

bkt = 1

prime = spark.sql(f"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
            FROM pwrcrv_tmp\
            where EXTR_CURR_NUM_CYC_DLQ={bkt}\
            and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
            group by ((year(fdr_date))*100)+month(fdr_date)\
            order by ((year(fdr_date))*100)+month(fdr_date)")


prime.show(50)
Sign up to request clarification or add additional context in comments.

1 Comment

Greetings from 2023! I just wanted to add that this same method also works with multiline query strings expressed with triple quotes like: q = spark.sql(f"""SELECT * FROM {bkt}""")
0

This should work

p_filename ='some value'
z='some value'

query = "INSERT into default.loginfordetails (filename,logdesc) values ('{}','{}') ".format(p_filename,z)
 
spark.sql(query)

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
0

Since your query do use multi-line query is it advised as good coding style, to use """ """ to be easier and avoid slash all together as a good coding style"\" that does have a lot of issues or conversion to be parsed in python and pyspark in general or when using nbconvert for notebook to scrip and variables in parentheses {} inside of query or use .format(bkt) :

option{}:

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={bkt}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""")

prime.show(50);

option .format():

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""".format(bkt)

prime.show(50);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.