Mule 4 Batch Process on large input file

Question

I have a huge file that can have anywhere from few hundred thousand to 5 million records. Its tab-delimited file. I need to read the file from ftp location , transform it and finally write it in a FTP location.

I was going to use FTP connector get the repeatable stream and put it into mule batch. Inside mule batch process idea was to use a batch step to transform the records and finally in batch aggregate FTP write the file to destination in append mode 100 records at a time.

Q1. Is this a good approach or is there some better approach?

Q2. How does mule batch load and dispatch phase work (https://docs.mulesoft.com/mule-runtime/4.3/batch-processing-concept#load-and-dispatch ) Is it waiting for entire stream of millions of records to be read in memory before dispatching a mule batch instance ?

Q3. While doing FTP write in batch aggregate there is a chance that parallel threads will start appending content to FTP at same time thereby corrupting the records. Is that avoidable. I read about File locks (https://docs.mulesoft.com/ftp-connector/1.5/ftp-write#locks) . My assumption is it will simply raise File lock exception and not necessarily wait to write FTP in append mode.

aled · Accepted Answer · 2021-04-08 21:37:57Z

3

Q1. Is this a good approach or is there some better approach?

See answer Q3, this might not work for you. You could instead use a foreach and process the file sequentially though that will increase the time for processing significantly.

Q2. How does mule batch load and dispatch phase work (https://docs.mulesoft.com/mule-runtime/4.3/batch-processing-concept#load-and-dispatch ) Is it waiting for entire stream of millions of records to be read in memory before dispatching a mule batch instance ?

Batch doesn't load big numbers of records in memory, it uses file based queues. And yes, it loads all records in the queue before starting to process them.

Q3. While doing FTP write in batch aggregate there is a chance that parallel threads will start appending content to FTP at same time thereby corrupting the records. Is that avoidable. I read about File locks (https://docs.mulesoft.com/ftp-connector/1.5/ftp-write#locks) . My assumption is it will simply raise File lock exception and not necessarily wait to write FTP in append mode

The file write operation will throw a FILE:FILE_LOCK error if the file is already locked. Note that Mule 4 doesn't manage errors through exceptions, it uses Mule errors.

If you are using DataWeave flatfile to parse the input file, note that it will load the file in memory and use significantly more memory than the file itself to process it, so you probably are going to get an out of memory error anyway.

answered Apr 8, 2021 at 21:37

aled

26.5k4 gold badges36 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pathshala Over a year ago

i have a ffd file (flat file) . Is the handling different in a csv vs flat file? Does the repeatable stream option not work to stream one row at a time . For loop is taking 8 hours and need to improve performance hence looking for a better solution.

aled Over a year ago

Flat file can have memory usage 40 to 1 of the original file size when processing. That is documented at docs.mulesoft.com/mule-runtime/4.3/dataweave-formats-flatfile. You'll want to be mindful of the memory usage of the transformation. Using non repeatable streaming may perform a bit better.

Collectives™ on Stack Overflow

Mule 4 Batch Process on large input file

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related