0

I currently trying to clean up a large dataset using Excel VBA. The dataset structure looks like this.

enter image description here

However, I would like to make it look like this instead, whereby if the cells in columns A:D all contain the same value, transpose the cells in column E. (And remove the duplicated cells from A:D)

enter image description here

Here is the code I did

Dim ws As Worksheet: Set ws = Sheets("test")
lastrow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row

Dim j As Integer
j = 6

For i = 2 To lastrow

    If (Range("A" & i).Value = Range("A" & i + 1).Value) And (Range("B" & i).Value = Range("B" & i + 1).Value) And (Range("C" & i).Value = Range("C" & i + 1).Value) Then
        Cells(i, j).Value = Cells(i + 1, 5).Value
        j = j + 1
    End If
    
    'Reset J back to 6 if columns A to D does not match previous
    If (Range("A" & i).Value <> Range("A" & i + 1).Value) Or (Range("B" & i).Value <> Range("B" & i + 1).Value) Or (Range("C" & i).Value <> Range("C" & i + 1).Value) Then
        j = 6
    End If
    
Next i

How can this be done?

3
  • Is the data always sorted like all ABCD come in one block and then DEFG starts or an they be mixed? Commented Aug 12, 2021 at 14:49
  • It is always sorted in one block. Commented Aug 12, 2021 at 14:54
  • However, it can also be ABCD followed by XYCD (whereby only cells in columns A:B have different values) but I only want to transpose when cells in all 4 columns (from A:D) contain the same values. Commented Aug 12, 2021 at 15:09

3 Answers 3

4

This ended up more complex than I'd thought but seems to work OK

Sub Compact()

    Const KEY_COLS As Long = 4
    Dim ws As Worksheet, i As Long, k As String, nextEmpty As Long
    Dim dict As Object
    Set dict = CreateObject("scripting.dictionary")
    
    Set ws = Sheets("test")
    For i = 1 To ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
        'create a row "key" from first KEY_COLS cells
        k = Join(Application.Transpose(Application.Transpose( _
                  ws.Cells(i, 1).Resize(1, KEY_COLS))), "~~")
        
        If Not dict.exists(k) Then
            'move this row up?
            If nextEmpty > 0 Then
                ws.Cells(i, 1).Resize(1, KEY_COLS + 1).Cut ws.Cells(nextEmpty, 1)
                dict.Add k, nextEmpty 'new key - store row#
                nextEmpty = 0
            Else
                dict.Add k, i 'new key - store row#
            End If
        Else
            'seen this key before - move value to that row and clear
            ws.Cells(dict(k), Columns.Count).End(xlToLeft).Offset(0, 1).Value = _
                ws.Cells(i, KEY_COLS + 1).Value
            ws.Rows(i).ClearContents
            If nextEmpty = 0 Then nextEmpty = i 'available row
        End If
    Next i
End Sub

Edit: this is a bit cleaner I think. It's split into separate "read" and "write" parts.

Sub Compact2()

    Const KEY_COLS As Long = 4
    Const SEP As String = "~~"
    Dim ws As Worksheet, i As Long, k, col As Long, v
    Dim dict As Object
    Set dict = CreateObject("scripting.dictionary")
    
    Set ws = Sheets("test")
    'collect all the unique combinations and associated values 
    For i = 1 To ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
        'create a row "key" from first KEY_COLS cells
        k = Join(Application.Transpose(Application.Transpose( _
                  ws.Cells(i, 1).Resize(1, KEY_COLS))), SEP)
        
        If Not dict.exists(k) Then dict.Add k, New Collection
        dict(k).Add ws.Cells(i, KEY_COLS + 1).Value
        ws.Rows(i).ClearContents 'clear row
    Next i
    
    're-populate the sheet from the dictionary
    i = 1
    For Each k In dict
        ws.Cells(i, 1).Resize(1, KEY_COLS).Value = Split(k, SEP)
        col = KEY_COLS + 1
        For Each v In dict(k)
            ws.Cells(i, col) = v
            col = col + 1
        Next v
        i = i + 1
    Next k
End Sub
Sign up to request clarification or add additional context in comments.

4 Comments

Cool answer as always (I think I upvote more competing answer from you than anyone else). If you have time, can you take a look at my answer, and see if you can get my solution down to a dynamic formula in column m? Irks me I can't figure it out. I may post a question on it.
@pgSystemTester - I can take a look later but that formula stuff is mostly beyond me... I only just got the version with that capability so I'm a bit behind.
No worries. I'll post it as a question and someone could point out what I'm doing wrong. BigBen or Peh have a history of answering my questions in like 55 seconds and making me feel like I should have known better.
1

You can do this pretty easily using Power Query

  • Group by the first four columns
  • Aggregate the 5th column into a delimiter (semicolon) separated text string.
  • Split the delimited string into new columns

For the example, I added some rows where the four columns didn't match

To use Power Query

  • Select some cell in your Data Table
  • Data => Get&Transform => from Table/Range
  • When the PQ Editor opens: Home => Advanced Editor
  • Make note of the Table Name in Line 2
  • Paste the M Code below in place of what you see
  • Change the Table name in line 2 back to what was generated originally.
  • Read the comments and explore the Applied Steps to understand the algorithm

M Code

let
    Source = Excel.CurrentWorkbook(){[Name="Table17"]}[Content],

//set type for all columns as Text
    #"Changed Type" = Table.TransformColumnTypes(Source,List.Transform(Table.ColumnNames(Source), each {_, Text.Type})),

//group by first four columns, then aggregate the 5th column semicolon separated
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Column1", "Column2", "Column3", "Column4"}, {
        {"ColE", each Text.Combine([Column5],";"), Text.Type}
    }),

//split the aggregated text into new columns
//may need to edit this step depending on maximum number in the group
    #"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "ColE", 
        Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"ColE.1", "ColE.2", "ColE.3"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{
        {"ColE.1", Int64.Type}, {"ColE.2", Int64.Type}, {"ColE.3", Int64.Type}})
in
    #"Changed Type1"

enter image description here

Comments

1

Over Two Years Later...

This will work leveraging the newer function and will by dynamic, no vba required.

=LET(alldata,FILTER(A:E,A:A<>""),    zSplitter,"?", zColumnCount,4,
allNames,HSTACK(CHOOSECOLS(alldata,SEQUENCE(1,zColumnCount))),
zNames,UNIQUE(allNames),
keysFiltered, BYROW(zNames,LAMBDA(eRow,TEXTJOIN(zSplitter,TRUE,eRow))),
keysFull,BYROW(allNames,LAMBDA(eRow,TEXTJOIN(zSplitter,TRUE,eRow))),
splitNums,BYROW(keysFiltered,LAMBDA(eRow,TEXTJOIN(zSplitter,TRUE,FILTER(CHOOSECOLS(alldata,zColumnCount+1),keysFull=eRow)))),
zElementCount,BYROW(splitNums,LAMBDA(eRow, LEN(eRow)-LEN(SUBSTITUTE(eRow,zSplitter,""))+1)), zMaxCol,MAX(zElementCount),
zSideArray,MAKEARRAY(ROWS(zNames),zMaxCol,LAMBDA(r,c,LET(tempRng,INDEX(splitNums,r,1),tempSplitRng,TEXTSPLIT(tempRng,zSplitter),IF(c>CHOOSEROWS(zElementCount,r),"",CHOOSECOLS(tempSplitRng,c))))),
HSTACK(zNames,zSideArray))

Original Answer

Agree with the Tim Williams this is tricky. I got sort of close to a solution without using VBA in this worksheet (requires spill range enabled). I didn't get a dynamic formula to spill down for the numeric values, but you could make a macro to drag it or something.

See this spreadsheet.

You would need the below formula in cell i1

=UNIQUE(FILTER(A:D,NOT(ISBLANK((A:A)))))

The following formula would be in M1, and dragged down to match the respective columns to the immediate left. You could setup a macro that actually did this for you on a change event. There's probably a way to make this dynamic with an array formula, but I couldn't assemble it in time I tinkered with it.

=TRANSPOSE(FILTER(E:E,(NOT(ISBLANK(E:E))*(A:A&B:B&C:C&D:D=I1&J1&K1&L1))))

Again if you don't have excel spill range capabilities, this won't work. To view with spill range, checkout the excel file via a web browser so it looks like the below image. The gray cells contain the respective formulas.

Sample of browser result

1 Comment

I posted a question regarding my own answer on this here: stackoverflow.com/questions/68761456/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.