I want to use python to read an Excel file and transform it into a different structure (Example).
- On the left side of the red marked area there are about 15 more columns
- The red marked area continues for 5 more years to the right
My current approach is following
- Read the black area (it always has a fixed number of columns, so know where to start and to end)
- Read the remaining red area
- Merge the data
Python dataframe example to generate the red marked area above:
df = pd.DataFrame({'col1': {0: 'Year' , 1: 'Option' , 2: 'Category' , 3: 'Type' , 4: 'Country' , 5: 'Australia', 6: 'New Zealand'},
'col2': {0: '2024' , 1: 'S' , 2: 'FTE' , 3: 'A' , 4: '' , 5: '-1,0' , 6: '-2,0'},
'col3': {0: '' , 1: '' , 2: 'Budget' , 3: 'B' , 4: 'EUR' , 5: '-100,5' , 6: '-200,5'},
'col4': {0: '' , 1: '' , 2: '' , 3: 'C' , 4: 'EUR' , 5: '-1000' , 6: '-2000'},
'col5': {0: '' , 1: 'T' , 2: 'FTE' , 3: 'A' , 4: '' , 5: '1,0' , 6: '2,0'},
'col6': {0: '' , 1: '' , 2: 'Budget' , 3: 'B' , 4: 'EUR' , 5: '100,5' , 6: '200,5'},
'col7': {0: '' , 1: '' , 2: '' , 3: 'C' , 4: 'EUR' , 5: '1000' , 6: '2000'},
'col8': {0: '2025' , 1: 'S' , 2: 'FTE' , 3: 'A' , 4: '' , 5: '-3,0' , 6: '-4,0'},
'col9': {0: '' , 1: '' , 2: 'Budget' , 3: 'B' , 4: 'EUR' , 5: '-300,5' , 6: '-400,5'},
'col10': {0: '' , 1: '' , 2: '' , 3: 'C' , 4: 'EUR' , 5: '3000' , 6: '-4000'},
'col11': {0: '' , 1: 'T' , 2: 'FTE' , 3: 'A' , 4: '' , 5: '3,0' , 6: '4,0'},
'col12': {0: '' , 1: '' , 2: 'Budget' , 3: 'B' , 4: 'EUR' , 5: '300,5' , 6: '400,5'},
'col13': {0: '' , 1: '' , 2: '' , 3: 'C' , 4: 'EUR' , 5: '3000' , 6: '4000'},
})
I am struggling to read the data and set the multi index columns of the dataframe, as the country column does not fit into the hierarchy.
Because I have to use df = pd.read_excel(...usecols='T:Z', header=None... I am reading the data and the header separately and then add the headers using df.columns = pd.MultiIndex.from_arrays(...).
The result (for 2024) looks like this
| 2024 | |||||||
|---|---|---|---|---|---|---|---|
| S | T | ||||||
| A | B | C | A | B | C | ||
| Country | |||||||
| 0 | Australia | -1,0 | -100,5 | -1000 | 1,0 | 100,5 | 1000 |
| 1 | New Zealand | -2,0 | -200,5 | -2000 | 2,0 | 200,5 | 2000 |
Here I am stuck, I tried using .stack and .melt to achieve the target structure, however was not able to achieve it.