I have a dataset with multiple rows of data for each subject. This dataset is very long, and each person has dozens of rows of each session they are in. I want to collapse this dataset into a more manageable format. The current dataset looks something like this:
| name | session | personal vote | party vote | Voted with party? |
|---|---|---|---|---|
| John | 114 | yea | nay | no |
| John | 114 | yea | yea | yes |
| John | 114 | nay | nay | yes |
| John | 115 | yea | yea | yes |
| John | 116 | yea | nay | no |
| John | 116 | yea | yea | yes |
| Tim | 114 | nay | yea | no |
| Tim | 115 | nay | yea | no |
| Tim | 115 | nay | nay | yes |
| Tim | 116 | yea | nay | no |
I want to use this data to create a new dataframe where each person gets one row for each session. In this new dataset I want to create two new variables. The first, votecount, is just a raw count of the number of votes that person had in that session. The second new variable, party loyalty, should indicate what proportion of the person's votes were aligned with the how the party voted. It would look something like this:
| name | session | votecount | party loyalty |
|---|---|---|---|
| John | 114 | 3 | .66 |
| John | 115 | 1 | 1 |
| John | 116 | 2 | .5 |
| Tim | 114 | 1 | 0 |
| Tim | 115 | 2 | .5 |
| Tim | 116 | 1 | 0 |
Any help would be greatly appreciated!