title | description | keywords | services | documentationcenter | author | manager | editor | ms.assetid | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
General Spark Peformance Troubleshootinng Guidance | Microsoft Docs |
Use the Spark FAQ for answers to common questions on Spark on Azure HDInsight platform. |
Azure HDInsight, FAQ, troubleshooting guide, common problems, remote submission |
Azure HDInsight |
na |
Sunilkc |
multiple |
na |
na |
na |
article |
03/01/2018 |
sunilkc |
- Was this application working fine?
- If application was working fine then check what changed between working and non-working scenario?_
- _How are the Spark application submitted to the HDInsight?
- Spark applications can be submitted using notebooks (Juptyer/Zeppelin)/ using ODBC clients that gets connected to Spark Thrift Server/ directly on from Headnode using Spark-Shell)_
- if Application is submitted using notebooks then is it a batch or interactive session?_
- Is this a Spark SQL / Spark Steaming or Just a batch job?
Minimum data required to better understand any Spark Application performance or Spark Application failure issues.
Spark Applications are usually submitting to HDInsight clusters from Azure Data Factory, Jupyter, Zeppelin, JDBC, SSH or Livy directly using curl command.
Front End | Servied Used on HDInsight |
---|---|
Jupyter | Livy |
ADF | Livy |
Zeppelin | Interpreter(Livy, Spark) |
Curl | Livy |
* Slow Performance : Spark application takes more time compared to another HDInsight cluster, still complete successfully.
* Unexpected Failure: Spark Application starts processing data but fails to complete with some exception
* Application fails with Exception :Spark Application starts processing data but fails to complete with some exception
* Application hangs-Never gets into finished state
* Spark application fails to start when submitted from Spark-CLI
-
Spark application submission if initiated from Azure Data Factory / Jupyter or any other client application like curl that uses livy, then follow the steps below.
- Confirm livy server is started on HN0 from Ambari UI, incase it is stopped start the service.
- In case livy server is not starting, and you see
java.lang.OutOfMemoryError: unable to create new native thread
in livy logs/var/log/livy/livy-livy-server.out
then follow steps detailed in livy-nativethread-exhaustion. - If exception metioned in Point b. was not found then Capture the livy logs from the cluster (
/var/log/livy/livy-livy-server.out
), - Get Jupyter logs (
/var/log/jupyter/
) when troubleshooting Spark Application issues that were submitted using jupter notebook.- Jupyter uses livy to submit Spark application, get Livy logs as well.
-
If application is submitted using JDBC that uses Spark Thrift Service then get Spark Thrift Driver logs from
/var/log/spark/sparkthriftdriver.log
-
In case the Spark job is submitted from spark-shell then get the complete spark-submit command.
-
For any spark application performance issues (including the three scenarios list above) first note the Application ID, next capture YARN logs for the application that is experiencing performance issue (Slow/Hang) or failures. a. How do I download Yarn logs from HDInsight cluster?, this article details different options to capture YARN Logs. b. Download all Application Master logs. c. Get logs for all containers (Driver and Executor).
-
Get screenshot of YARN UI showing the start datetime, end datetime and the status for the failed application.
-
If this application had completed successful early then capture start, end datetime, application status and also the YARN logs for this successfully completed Spark Application How do I download Yarn logs from HDInsight cluster?.
-
Apart from the YARN Logs get details about the environment.
- Number of Workernodes.
- Executors
- Source and Sink (Ex Kakfa to Storage)
For General Spark Tuning Refer