2

I have a dataframe that has a lot of entires similar to the table to the left shown below. I was to query it using SQL to get a result similar to the table to the right shown below. So that I will be able to plot a stacked bar chart with the data with each bar representing a state and Severity count S03, S04 will add up.

+--+-----+--------+
|ID|State|Severity|
+--+-----+--------+
|01| NY  | 3      |        +-----+---+---+
|02| CA  | 4      |        |State|S03|S04|
|03| NY  | 4      |    =>  +-----+---+---+
|04| CA  | 3      |        | CA  | 1 | 3 |
|05| CA  | 4      |        | NY  | 1 | 1 |
|06| CA  | 4      |

I tried the following SQL query but it is giving the same result for every entry in S03 and same for S04.

city_accidents = spark.sql("\
    SELECT State, \
    (SELECT COUNT(ID) AS Count FROM us_accidents WHERE Severity = 3 ) AS S03, \
    (SELECT COUNT(ID) AS Count FROM us_accidents WHERE Severity = 4 ) AS S04 \
    FROM accidents \
    GROUP BY State \
    ORDER BY State DESC LIMIT 10")
city_accidents.show()
+-----+---+---+
|State|S03|S04|
+-----+---+---+
| NY  | 1 | 3 |
| CA  | 1 | 3 |

That is probably because I haven't entered any filter for the inner select statement from which state to select from. Is there a way I can access those inner variables in the select query? What I meant is if I could change inner select statements to (SELECT COUNT(ID) AS Count FROM us_accidents WHERE Severity = 3 AND State = this.State ) AS S03..

2 Answers 2

1

You can try below way -

city_accidents = spark.sql("\
    SELECT State, \
    COUNT(case when Severity = 3 then ID end) AS S03, \
    COUNT(case when Severity = 4 then ID end) AS S04 \
    FROM accidents \
    GROUP BY State \
    ORDER BY State DESC LIMIT 10")
city_accidents.show()
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for this too. I wonder which one to choose as the right answer as both of them do what I want :)
@Blogger . . . As the OP you can accept whatever answer you like. However, in the case of equivalent answers, often people choose to accept the first answer. Juergen answered a minute and a half before this. (That said you might prefer this one because it has the full spark.sql statement or for some other reason.)
1
SELECT State,
       sum(case when Severity = 3 then 1 else 0 end) AS S03,
       sum(case when Severity = 4 then 1 else 0 end) AS S04
FROM accidents 
GROUP BY State 
ORDER BY State DESC 
LIMIT 10

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.