-3

I have a VBA-based tool that basically for each row in the Excel takes a defined word document and performs number of find-replace operations. This has been working well so far, but I am starting to feel the limitations of this approach.

  1. The Word Find-Replace action works only with strings up to 256 characters, so I have to perform complex string splitting shenanigans if I want to insert longer strings.

  2. The performance leaves a lot to be desired and doesn't scale super well.

Is there a way to do this more efficiently?

I am aware that the core problem is working with the Word document and Word operations in the first place; when I do the same with a TXT input/output, I can work directly with the string in the memory, which is way faster, however in Word, I need to worry about stuff like preserving formatting, so just can't read the whole document to a variable. I understand that the Word document is basically just a zipped XML file but I have no idea how to get to the underlying XML or markup in a reasonable manner.

Edit: OK, so sharing the code even though it is not going to be very revealing:

This is part of the main function

For row = 2 To rowLast

    'feed dictionary of dynamic variables
    set dictionary = Get_Dictionary   

    With appWD
        .Documents.Open Filename:=sPathTemplate
        .Activate
        .ScreenUpdating = False
    End With
                
    Call Merge(appWD, Opn, Cls, dictionary)
    With appWD.ActiveDocument    
         .Save
         .Close
    End With
 Next row

The called procedure is this.

Sub Merge(appWD As Object, sOpn As String, sCls As String, dictionary as object)
    Do
        With appWD.Selection.Find            
            For Each Key In dictionary
                .Text = Opn & Key & Cls
                .Execute ReplaceWith:=dictionary(Key), Replace:=2
                .Wrap = 1 'wdFindContinue
            Next Key
        
        'search for closing symbol to confirm whether an additional run is necessary (.Execute returns True when Find is successful)
            .Text = Cls
            bRepeat = .Execute
        End With
    
        iCounter = iCounter + 1
    
        If iCounter = 50 Then
            MsgBox "Error in merging."
        End        
    Loop While bRepeat = True
End Sub

To explain, I am using a closing and opening delimiter to determine sections to be replaced. The procedure loops through a dictionary of key-value pairs and searches each key within the delimiters in order to replace it. It also contains recursion to allow for more complex replacements:

[foo] -> "blah"
[blahbar] -> "hello world"
[[foo]bar] -> "hello world"

This all works perfectly fine under normal cirumstances - a single document takes a second or three to produce. But when I have multiple layers of recursion and a lot of find-replace tokens, the resource requirements grow geometrically and start impacting performance - a single document may take even a coouple minutes, which is a problem when I need 500 documents.

However, the bottleneck is in the Word's Find-Replace functionality. I have another set of procedures following exactly the same principles, but working with pure strings (to achieve the same thing in an email, where I can access gthe underlying HTML Body as a text string); the Find-Replace operation on a string isn't impacted anywhere near this and remains lightning fast even with layers of recursion and lots of tokens.

So basically I need a way to work with the text in the Word document as with a regular string, avoinding Word's native Find-Replace functionality and using VBA's Replace function. However I just can't simply read the Word document text into the string because often I need to preserve the document formatting, so I hoped for a way to access the raw Word XML data instead.

17
  • 3
    So how about posting your code? You haven't told us anything about whether you're using early/late binding, whether it's a wildcard Find/Replace, or much else that is relevant. Commented Jun 26 at 10:22
  • 2
    Since you're unwilling to post your code, I'm voting to close this thread. All you're doing is leaving us with a guessing game. It's easy to avoid all those "complex string splitting shenanigans", but not without seeing your code - which might benefit from other optimizations, too. Commented Jun 27 at 21:57
  • 2
    FYI this question is off-topic as written, as there is nothing specific other than mentioning "complex string splitting shenanigans" (no way to know what this really is), a general find/replace requirement, and vague observations such as "performance leaves a lot to be desired" and "doesn't scale super well" - neither of these being quantifiable. Please edit to provide a minimal reproducible example specifically about the work you did, and a specific issue you're having. Commented Jul 3 at 11:46
  • 1
    Pretty sure nothing in the in the Word object model is going to do anything faster than an appropriate Find/Replace so if Find/Replace isn't enough, you almost certainly would have to start looking at doing something with the XML, and frankly I suspect that that would be extremely difficult in the general case, so you'd be looking for a library, which is off-topic here. Commented Jul 5 at 16:33
  • 1
    @Eleshar for thé XML stuff, you either have to work with the .docx directly, I.e. not using the Word object model but, say, the Office Open SDK, or you have to access the XML via the Word object model, using e.g. Range.XML, WordOpenXML, InsertXML. Just my opinion, but doing the latter would be unlikely to help with the efficiency aspect unless you were able to get and process all the XML in one chunk or large chunks before re-inserting it. Commented Jul 17 at 8:18

3 Answers 3

2

You haven't told us how your dictionary is being populated all you've given us is:

set dictionary = Get_Dictionary

Without recourse to that (or having to split longer replacement strings), you could use something like:

Sub BulkWordDocFindReplace()
'Note: VBA References to MS Word and the Microsoft Forms 2.0 Object Library are required
Dim DocWkSht As Worksheet, FRWkSht As Worksheet, wdApp As New Word.Application, wdDoc As Word.Document
Dim i As Long, r As Long, ArrFnd() As String, ArrRep() As String, StrFnd As String, StrRep As String
'Initialize Microsoft Forms 2.0 Object Library
Dim DataObj As New MSForms.DataObject
' Reference Document List & Find/Replace WorkSheets
Set DocWkSht = ThisWorkbook.Sheets("Sheet1"): Set FRWkSht = ThisWorkbook.Sheets("Sheet2")
i = FRWkSht.Cells(FRWkSht.Rows.Count, 1).End(xlUp).Row - 1: ReDim ArrFnd(i): ReDim ArrRep(i)
'Get the F/R strings
For r = 2 To i
  ArrFnd(r - 1) = FRWkSht.Cells(r, 1).Value: ArrRep(r - 1) = FRWkSht.Cells(r, 2).Value
Next
'Process the documents
For i = 1 To DocWkSht.Cells(DocWkSht.Rows.Count, 1).End(xlUp).Row
  Set wdDoc = wdApp.Documents.Open(DocWkSht.Cells(i, 1).Value, AddToRecentFiles:=False)
  With wdDoc.Content.Find
    .Forward = True
    .Wrap = wdFindContinue
    .MatchCase = False
    .MatchWholeWord = False
    For r = 1 To UBound(ArrFnd)
      StrFnd = ArrFnd(r): StrRep = ArrRep(r)
      If Len(StrRep) > 255 Then DataObj.SetText StrRep: DataObj.PutInClipboard: StrRep = "^c"
      .Execute FindText:=StrFnd, ReplaceWith:=StrRep, Replace:=wdReplaceAll, Wrap:=wdFindContinue
      wdApp.DoEvents
    Next
  End With
  wdDoc.Close True
Next
wdApp.Quit: Set wdDoc = Nothing: Set wdApp = Nothing: Set DocWkSht = Nothing: Set FRWkSht = Nothing
MsgBox "Finished!", vbInformation
End Sub

The key is the automatic use of the MS Forms clipboard for replacement strings longer than 255 characters.

Sign up to request clarification or add additional context in comments.

1 Comment

Tested. Your solution is 3 times faster than mine.
0

The solution to apply long replacement strings is below. The idea is just to find the sought string then replace it outside of Find.Execute:

Sub Test()
  FR Array("<<ad>>", "<<bc>>"), _
    Array("==buirsdfg iuhuierhf aesgh iuhewr iuhiuher uhbuirsdfg iuhuierhf aesgh iuhewr iuhiuher uhbuirsdfg iuhuierhf aesgh iuhewr iuhiuher uhbuirsdfg iuhuierhf aesgh iuhewr iuhiuher uhbuirsdfg iuhuierhf aesgh iuhewr iuhiuher uhbuirsdfg iuhuierhf aesgh iuhewr iuhiuhe==261", _
    "==jfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowerer hkjfgsjkerfg iouhwuioerg iuoohuiosdrg iuohiuoh uiowere==787")
End Sub

Sub FR(find, repl)
  Dim r As Word.Range, j&
  With ThisDocument ' the macro is in the Word document to be processed
    For j = LBound(find) To UBound(find)
      Set r = .Content
      While r.find.Execute(FindText:=find(j))
        r.Text = repl(j)
        r.Start = r.End
        r.End = .Content.End
      Wend
    Next
  End With
End Sub

Performance test on a real document

The real 45 MB Word document consists of 102520 words. Two strings to search of 788 and 595 entries. Replacement strings as per macro above. The procedure takes ~30 s.

Looking at the overall Word performance I'm not able to say what this is too inefficient.

P. S.

Macro to generate a text for testing:

Sub GenerateSampleText()
  Dim s$, w$, i&, j&
  For i = 1 To 100
    w = ""
    For j = 1 To 2 + 7 * Rnd()
      w = w & Chr$(65 + 60 * Rnd())
    Next
    s = s & " " & w
  Next
  s = s & Chr$(13)
  For i = 1 To 1000
    ThisDocument.Content.InsertAfter s
  Next
End Sub

The test on the generated text took ~15 s or less.

Tests show that performance is dependent of paragraphs' length. The test for the text generated as the single paragraph (remove s = s & Chr$(13)) took 300 s.

4 Comments

That is not inserting the replacement text into the Find/Replace expression. All you're doing is using a loop to insert the string from the array. Very inefficient.
This is not the place to argue about an answer or sling insults. Also, the OP's question is off-topic as written: zero code, vague non-quantifiable references to performance, and just "how to do this more efficiently?" The answer here (efficient or not) has nothing to do with any work the OP did. Best not to answer off-topic code-something-for-me questions.
@rotabor - There is no reason for macropod to post any answer at this point. The OP hasn't asked a specific question. They posted requirements. Stack Overflow isn't set up for "post requirements and have people implement those requirements." It's for specific programming Q&A.
@rotabor My use case is less "replace two strings in a 45 MB document" but rather "replace 100 nested/layered strings in a document 500 times in a row".
0

Courtesy to user @jonsson and his comment below my question I figured out the solution. Word allows accessing the documents underlying XML, which turn the slow and limited .Find.Execute ReplaceWith into a lightning fast string operation.

Dim DocXML As String: DocXML = appWD.ActiveDocument.Range.WordOpenXML
Dim DocXMLReplaced As String: DocXMLReplaced = DoReplacements(DocXML, Opn, Cls, Dictionary)
appWD.ActiveDocument.Content.Delete
appWD.Selection.InsertXML sXMLReplaced

The .WordOpenXML reads the internal XML from the document into a string.
The .InsertXML replaces the internal XML with a string. It is important to keep in mind that this actually needs to be proper XML, so its functional characters need to be escaped accordingly (took me a while to realise this).
DoReplacements is essentially for each key in dictionary, within the passed string replace the delimited key by the value.

I still haven't optimised all the functionalities (e.g. inline formating, line breaks etc.) but already the speed increased sevenfold by not relying on Word's built-in find-replace.

2 Comments

How many keys are actually being replaced? If the number is over 32000 then consider using FastDictionary instead of the Windows Scripting.Dictionary.
In one iteration, about a hundred keys, but often recursion is necessary because of nesting. At the moment, I am happy with the performance but I suspect bottlenecks are not in the dictionary but rather the logistics: creating a copy of the template and saving it, especially when I need it as a PDF. Then some other functionalities have certain innate bottlenecks I have no power to resolve.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.