I'm trying to parallelise with openxlsx and its function writeData the export of many Excel individual files that should be somehow summarised in a central Excel file.
As can be shown in the reprex below I first create the central Excel file before creating the parallel processes and then I create the individual files in their respective parallel clusters.
The problem is that nothing (i.e. no 'summary', here simply the Row counter) is written in the central Excel file, although I exported its associated workbook object to the parallel processes.
lapply(c("openxlsx", "parallel"), library, character.only = TRUE)
NameOutputFolder <- "Output"
NameOutputSubfolder <- "Individual files"
OutputFolder <- file.path(".", NameOutputFolder)
if(!dir.exists(OutputFolder)) dir.create(OutputFolder)
OutputSubfolder <- file.path(".", NameOutputFolder, NameOutputSubfolder)
if(!dir.exists(OutputSubfolder)) dir.create(OutputSubfolder)
OutputFile_Central <- file.path(OutputFolder, "Excel_Central.xlsx")
Workbook_Central <- createWorkbook()
addWorksheet(wb = Workbook_Central, sheetName = "Summary", zoom = 80, gridLines = FALSE)
no_cores <- detectCores()
print(paste0("Configuring parallelisation (", no_cores, " cores found) and setting up clusters"))
MyCluster <- makePSOCKcluster(no_cores - 1)
clusterEvalQ(MyCluster, {
library(openxlsx)
})
clusterExport(MyCluster, c("OutputSubfolder", "Workbook_Central"))
parLapply(cl = MyCluster, X = 1:10, fun = function(Row){
OutputFile_Individual <- file.path(OutputSubfolder, paste0("Excel_Individual_", Row, ".xlsx"))
Workbook_Individual <- createWorkbook()
writeData(wb = Workbook_Central, sheet = "Summary", x = Row, startCol = 1, startRow = Row)
saveWorkbook(wb = Workbook_Individual, file = OutputFile_Individual, overwrite = TRUE)
})
stopCluster(MyCluster)
saveWorkbook(wb = Workbook_Central, file = OutputFile_Central, overwrite = TRUE)
I guess the solution would be to create temporary central Excel files in each parallel process (indexed by Sys.getpid()) and then to merge them in the unique central Excel file after the parallel code has run, right?
No other solution?