I have a dataproc cluster, we are running INSERT OVERWRITE QUERY through HIVE CLI which fails with OutOfMemoryError: Java heap space.
We adjusted memory configurations for reducers and Tez tasks, including parameters like mapreduce.reduce.memory.mb and tez.task.resource.memory.mb, and optimized ORC settings with buffer adjustments. Despite these changes, tasks continued to fail with java.lang.OutOfMemoryError: Java heap space, indicating that more tuning or different resource strategies might be necessary.
These are some of set parameters we tested but no success.
set hive.exec.max.dynamic.partitions=4000;
set hive.exec.max.dynamic.partitions.pernode=500;
set hive.exec.dynamic.partition.mode=nonstrict;
SET mapreduce.map.java.opts=-Xmx3686m;
SET mapreduce.reduce.java.opts=-Xmx3686m;
SET mapred.child.java.opts=-Xmx10g;
set hive.tez.container.size=16384;
set tez.task.resource.memory.mb=16384;
set tez.am.resource.memory.mb=8192;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;
set hive.support.concurrency=false;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled=true;
set hive.exec.orc.split.strategy=BI;
set hive.exec.reducers.max=150;
ERROR thrown
Status: Failed
Vertex failed, vertexName=Reducer 4, vertexId=vertex_1731327513546_0052_5_08, diagnostics=[Task failed, taskId=task_1731327513546_0052_5_08_000045, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space
at java.base/java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:75)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.createOutputStream(GoogleHadoopOutputStream.java:198)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:177)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.lambda$create$5(GoogleHadoopFileSystem.java:547)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem$$Lambda$273/0x000000080077e040.apply(Unknown Source)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding$$Lambda$274/0x000000080077d040.apply(Unknown Source)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.create(GoogleHadoopFileSystem.java:521)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1234)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1211)
at org.apache.orc.impl.PhysicalFsWriter.<init>(PhysicalFsWriter.java:95)
at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:187)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:94)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:334)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:95)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:990)
at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:816)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.createForwardJoinObject(CommonJoinOperator.java:504)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genObject(CommonJoinOperator.java:661)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genJoinObject(CommonJoinOperator.java:533)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:936)
at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:331)
at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:294)
, errorMessage=Cannot recover from this error:java.lang.OutOfMemoryError: Java heap space
at java.base/java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:75)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.createOutputStream(GoogleHadoopOutputStream.java:198)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:177)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.lambda$create$5(GoogleHadoopFileSystem.java:547)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem$$Lambda$273/0x000000080077e040.apply(Unknown Source)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding$$Lambda$274/0x000000080077d040.apply(Unknown Source)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.create(GoogleHadoopFileSystem.java:521)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1234)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1211)
at org.apache.orc.impl.PhysicalFsWriter.<init>(PhysicalFsWriter.java:95)
at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:187)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:94)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:334)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:95)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:990)
at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:816)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.createForwardJoinObject(CommonJoinOperator.java:504)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genObject(CommonJoinOperator.java:661)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genJoinObject(CommonJoinOperator.java:533)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:936)
at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:331)
at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:294)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:49, Vertex vertex_1731327513546_0052_5_08 [Reducer 4] killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0```