Boosting spark.yarn.executor.memoryOverhead

2020-05-23 18:02发布

I'm trying to run a (py)Spark job on EMR that will process a large amount of data. Currently my job is failing with the following error message:

Reason: Container killed by YARN for exceeding memory limits.
5.5 GB of 5.5 GB physical memory used.
Consider boosting spark.yarn.executor.memoryOverhead.

So I google'd how to do this, and found that I should pass along the spark.yarn.executor.memoryOverhead parameter with the --conf flag. I'm doing it this way:

aws emr add-steps\
--cluster-id %s\
--profile EMR\
--region us-west-2\
--steps Name=Spark,Jar=command-runner.jar,\
Args=[\
/usr/lib/spark/bin/spark-submit,\
--deploy-mode,client,\
/home/hadoop/%s,\
--executor-memory,100g,\
--num-executors,3,\
--total-executor-cores,1,\
--conf,'spark.python.worker.memory=1200m',\
--conf,'spark.yarn.executor.memoryOverhead=15300',\
],ActionOnFailure=CONTINUE" % (cluster_id,script_name)\

But when I rerun the job it keeps giving me the same error message, with the 5.5 GB of 5.5 GB physical memory used, which implies that my memory did not increase.. any hints on what I am doing wrong?

EDIT

Here are details on how I initially create the cluster:

aws emr create-cluster\
--name "Spark"\
--release-label emr-4.7.0\
--applications Name=Spark\
--bootstrap-action Path=s3://emr-code-matgreen/bootstraps/install_python_modules.sh\
--ec2-attributes KeyName=EMR2,InstanceProfile=EMR_EC2_DefaultRole\
--log-uri s3://emr-logs-zerex\
--instance-type r3.xlarge\
--instance-count 4\
--profile EMR\
--service-role EMR_DefaultRole\
--region us-west-2'

Thanks.

2条回答
Viruses.
2楼-- · 2020-05-23 18:22

After a couple of hours I found the solution to this problem. When creating the cluster, I needed to pass on the following flag as a parameter:

--configurations file://./sparkConfig.json\

With the JSON file containing:

[
    {
      "Classification": "spark-defaults",
      "Properties": {
        "spark.executor.memory": "10G"
      }
    }
  ]

This allows me to increase the memoryOverhead in the next step by using the parameter I initially posted.

查看更多
别忘想泡老子
3楼-- · 2020-05-23 18:27

If you are logged into an EMR node and want to further alter Spark's default settings without dealing with the AWSCLI tools you can add a line to the spark-defaults.conf file. Spark is located in EMR's /etc directory. Users can access the file directly by navigating to or editing /etc/spark/conf/spark-defaults.conf

So in this case we'd append spark.yarn.executor.memoryOverhead to the end of the spark-defaults file. The end of the file looks very similar to this example:

spark.driver.memory              1024M
spark.executor.memory            4305M
spark.default.parallelism        8
spark.logConf                    true
spark.executorEnv.PYTHONPATH     /usr/lib/spark/python
spark.driver.maxResultSize       0
spark.worker.timeout             600
spark.storage.blockManagerSlaveTimeoutMs 600000
spark.executorEnv.PYTHONHASHSEED 0
spark.akka.timeout               600
spark.sql.shuffle.partitions     300
spark.yarn.executor.memoryOverhead 1000M

Similarly, the heap size can be controlled with the --executor-memory=xg flag or the spark.executor.memory property.

Hope this helps...

查看更多
登录 后发表回答