Hive on spark下insert overwrite partition慢的优化

Hive版本: 2.1.1, Spark版本是1.6.0

这几天发现insert overwrite partition运行的很慢，看了下是hive on spark引擎，这引擎平时比mapreduce快多了，但是怎么今天感觉比mapreduce慢了好几倍，运行了1h多还没运行完。

将SQL拿来手动hive -f 文件.sql执行了，看到spark的stage状态一直都是处于0，几乎没有改变，如List-1所示。

List-1

[xx@xxxx xx]# hive -f sql.sql 
...
Query ID = root_20200807155008_80726145-e8f2-4f4e-8222-94083907a70c
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = d5e51d11-0254-49e3-93c7-f1380a89b3d5
Running with YARN Application = application_1593752968338_0506
Kill Command = /usr/local/hadoop/bin/yarn application -kill application_1593752968338_0506

Query Hive on Spark job[0] stages:
0

Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2020-08-07 15:50:47,501	Stage-0_0: 0(+2)/3	
2020-08-07 15:50:50,530	Stage-0_0: 0(+2)/3	
2020-08-07 15:50:53,555	Stage-0_0: 0(+2)/3	
2020-08-07 15:50:56,582	Stage-0_0: 0(+2)/3	
2020-08-07 15:50:57,590	Stage-0_0: 0(+3)/3	
2020-08-07 15:51:00,620	Stage-0_0: 0(+3)/3	
2020-08-07 15:51:03,641	Stage-0_0: 0(+3)/3	
2020-08-07 15:51:06,662	Stage-0_0: 0(+3)/3	
2020-08-07 15:51:09,680	Stage-0_0: 0(+3)/3	
2020-08-07 15:51:12,700	Stage-0_0: 0(+3)/3	
...

运行1h多了，但是还是处于那个状态，感觉不对立即搜索了下，别人也遇到了这个问题，没找到好的解决方法

我暂时对这个任务设置mr作为执行引擎——使用set hive.execution.engine=mr，不使用spark作为引擎，这样就解决了一直卡住不动的问题

之后hive又报错了，提示超过了单个node的max partition数，如List-2

List-2

...
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
	... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to 100 partitions per node, number of dynamic partitions on this node: 101
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:933)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:704)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
	... 9 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
...

再设置partitions和partitions.pernode，如下List-3

List-3

set hive.execution.engine=mr;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=100000;
set hive.exec.max.dynamic.partitions=100000;
...

这个问题，google了下，在Spark的jira issue里面有，说是个bug，后面修复了。

这样就解决了，但是mr还是慢，没办法要么更换hive/spark版本，要么自己去修改spark源码，先用mr暂时解决下。