SQL querying multiple select statements

Question

I have a dataframe that has a lot of entires similar to the table to the left shown below. I was to query it using SQL to get a result similar to the table to the right shown below. So that I will be able to plot a stacked bar chart with the data with each bar representing a state and Severity count S03, S04 will add up.

+--+-----+--------+
|ID|State|Severity|
+--+-----+--------+
|01| NY  | 3      |        +-----+---+---+
|02| CA  | 4      |        |State|S03|S04|
|03| NY  | 4      |    =>  +-----+---+---+
|04| CA  | 3      |        | CA  | 1 | 3 |
|05| CA  | 4      |        | NY  | 1 | 1 |
|06| CA  | 4      |

I tried the following SQL query but it is giving the same result for every entry in S03 and same for S04.

city_accidents = spark.sql("\
    SELECT State, \
    (SELECT COUNT(ID) AS Count FROM us_accidents WHERE Severity = 3 ) AS S03, \
    (SELECT COUNT(ID) AS Count FROM us_accidents WHERE Severity = 4 ) AS S04 \
    FROM accidents \
    GROUP BY State \
    ORDER BY State DESC LIMIT 10")
city_accidents.show()

+-----+---+---+
|State|S03|S04|
+-----+---+---+
| NY  | 1 | 3 |
| CA  | 1 | 3 |

That is probably because I haven't entered any filter for the inner select statement from which state to select from. Is there a way I can access those inner variables in the select query? What I meant is if I could change inner select statements to (SELECT COUNT(ID) AS Count FROM us_accidents WHERE Severity = 3 AND State = this.State ) AS S03..

Fahmi · Accepted Answer · 2020-04-22 05:27:54Z

1

You can try below way -

city_accidents = spark.sql("\
    SELECT State, \
    COUNT(case when Severity = 3 then ID end) AS S03, \
    COUNT(case when Severity = 4 then ID end) AS S04 \
    FROM accidents \
    GROUP BY State \
    ORDER BY State DESC LIMIT 10")
city_accidents.show()

answered Apr 22, 2020 at 5:27

Fahmi

37.5k5 gold badges26 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Padmal Over a year ago

Thank you for this too. I wonder which one to choose as the right answer as both of them do what I want :)

Gordon Linoff Over a year ago

@Blogger . . . As the OP you can accept whatever answer you like. However, in the case of equivalent answers, often people choose to accept the first answer. Juergen answered a minute and a half before this. (That said you might prefer this one because it has the full spark.sql statement or for some other reason.)

juergen d · Accepted Answer · 2020-04-22 05:26:28Z

1

SELECT State,
       sum(case when Severity = 3 then 1 else 0 end) AS S03,
       sum(case when Severity = 4 then 1 else 0 end) AS S04
FROM accidents 
GROUP BY State 
ORDER BY State DESC 
LIMIT 10

answered Apr 22, 2020 at 5:26

juergen d

205k40 gold badges305 silver badges377 bronze badges

Collectives™ on Stack Overflow

SQL querying multiple select statements

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related