3

I will be receiving a .dat file which contains multiple pdf files encoded as base64 string which will be separated by a new line or some character.

Initial approach is read -> payload splitBy “\n” - foreach - decode base64 - save as .pdf

It’s working fine if the dat file size is small. However, it’s started throwing heap memory error my hunch is due to splitBy loads the entire content as string in memory.

How this can be fixed? Any better way to solve this problem?

<flow name="dat-to-pdfFlow" doc:id="7f23d7a6-7187-454b-bd60-8e0319b52028" >
        <file:listener doc:name="Read .DAT" doc:id="aba64085-5b24-48b6-a6d6-f10658c991f1" config-ref="File_Config" directory="/Users/test/Work/POC/input" autoDelete="true" recursive="false" outputMimeType="application/octet-stream; streaming=true">
            <scheduling-strategy >
                <fixed-frequency />
            </scheduling-strategy>
        </file:listener>
        <logger level="INFO" doc:name="Logger" doc:id="ebd52647-c467-459f-bdeb-a30a997aba76" message="Read .DAT from #[attributes.path]"/>
        <ee:transform doc:name="Transform Message" doc:id="dbc765a9-ba1f-49be-b996-a7a883c8a6c5">
            <ee:message>
                <ee:set-payload><![CDATA[%dw 2.0
output application/java
---
payload splitBy "\n"]]></ee:set-payload>
            </ee:message>
        </ee:transform>
        <parallel-foreach doc:name="Parallel For Each" doc:id="7ba70e6b-630a-4259-9ad6-fb4b5c197402">
            <vm:publish doc:name="Publish" doc:id="7d1c9b34-6150-4195-b142-45ef43a9e2db" config-ref="VM_Config" queueName="write" />
        </parallel-foreach>
        <logger level="INFO" doc:name="Logger" doc:id="72bc4cfc-383d-4c95-add9-409bd4fdfeeb" message="Completed" />
    </flow>
    <flow name="consume-pdf" doc:id="9e531c3b-b4b6-4348-9848-bef76df138bb" >
        <vm:listener doc:name="Listener" doc:id="5907406d-d4cf-4d5b-a82a-8a3829fcc425" config-ref="VM_Config" queueName="write" outputMimeType="text/plain"/>
        <ee:transform doc:name="Transform Message" doc:id="aacf98b0-b212-46bf-b615-abadfddc87f7">
            <ee:message>
                <ee:set-payload><![CDATA[%dw 2.0
import * from dw::core::Binaries
output multipart/form-data
---
{
    parts:{
        base64Content:{
            headers:{
                "Content-Type":"application/pdf"
            },
            content: fromBase64(payload)
            },
        }
}
]]></ee:set-payload>
            </ee:message>
        </ee:transform>
        <set-payload value="#[payload]" doc:name="Set Payload" doc:id="215630ca-ebd1-4eb7-9325-536f034eaff3" mimeType="application/pdf" />
        <file:write doc:name="Write" doc:id="87623dd2-6051-4918-ab36-f76bf1c9544e" config-ref="File_Config" path="#['/Users/test/Work/POC/output/' ++ uuid() ++ '.pdf']" mode="APPEND" />
    </flow>

**

java.lang.OutOfMemoryError: Java heap space
Dumping heap to /Applications/AnypointStudio.app/Contents/Eclipse/plugins/org.mule.tooling.server.4.9.ee_7.21.0.202502030106/mule/logs/dump_mule-393ef4bd-6139-49d5-bc8a-3401c8045277.hprof ...
JVM received a signal SIGKILL (9).
Heap dump file created [419386973 bytes in 0.255 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError=""/Applications/AnypointStudio.app/Contents/Eclipse/plugins/org.mule.tooling.server.4.9.ee_7.21.0.202502030106/mule/bin/kill.sh" %p"
#   Executing ""/Applications/AnypointStudio.app/Contents/Eclipse/plugins/org.mule.tooling.server.4.9.ee_7.21.0.202502030106/mule/bin/kill.sh" 66579"...
JVM process is gone.
JVM process exited with a code of 1, setting the Wrapper exit code to 1.
JVM exited unexpectedly.
Automatic JVM Restarts disabled.  Shutting down.
<-- Wrapper Stopped

**

12
  • Add the stack trace of the OutOfMemoryException to the question. Commented Aug 22 at 18:58
  • @aled Updated the question Commented Aug 22 at 19:02
  • Apparently there is no stack trace in the error. Is there a hs_err_pid<pid>.log file generated when this error happens? There maybe more information in it. Commented Aug 22 at 19:40
  • Why do you use a parallel foreach to send the values to the VM queue? A VM queue is inherently asynchronous. Not sure if there is relationship with the issue. Commented Aug 22 at 19:42
  • @aled Heap memory arises even before reaching the loop. Looking for an alternative solution for SplitBy. Something like a java method which receives the stream split using newline and chunk it to a disk with a temp name. Not sure how to achive this. Commented Aug 22 at 20:09

2 Answers 2

1

Payload splitBy "\n" loads all the content in memory and throws heap memory issue.

It's solved by passing the stream to Java class which process the stream adn writes it to /tmp dir without blowing up the heap.

Inspiration took from Mule File repeatable streaming strategy.

Sign up to request clarification or add additional context in comments.

2 Comments

You need to be careful if the Java class assumes that the payload is a stream. Any changes to the Mule implementation may change the type of the payload. It may be more resilient to implement a custom Mule module using the Mule SDK instead.
Thanks aled for the advice. Will take this in account
-2

splitBy("\n") is reading the entire file into a single in-memory string, and for a .dat containing hundreds of PDFs, that’s going to blow the heap.

What you need is streaming processing instead of batch loading. In Mule 4, there are a few ways to do this cleanly

<file:listener 
    doc:name="Read .DAT" 
    config-ref="File_Config" 
    directory="/Users/test/Work/POC/input" 
    autoDelete="true" 
    recursive="false"
    outputMimeType="application/java">
    <scheduling-strategy>
        <fixed-frequency />
    </scheduling-strategy>
</file:listener>

<ee:transform doc:name="Transform to lines">
    <ee:message>
        <ee:set-payload><![CDATA[%dw 2.0
import * from dw::core::Streams
output application/java
---
readLines(payload)  // Streaming, not full load
]]></ee:set-payload>
    </ee:message>
</ee:transform>

<foreach>
    <!-- Each line (base64 PDF) handled individually -->
    <vm:publish queueName="write" />
</foreach>

1 Comment

Worst answer ever in the history of SO.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.