Zoomdata Version

Common Spark-It Validation and Troubleshooting Steps

In this article, we will highlight and briefly discuss some common validation procedures and troubleshooting steps that can be performed if you encounter issues with setting up Spark-It. For more information about Spark or configuring Spark, refer to the How Zoomdata Uses Spark article.

By default, Spark Proxy and Zoomdata should be located on the same machine. Keep in mind that the "Spark Proxy" and "Spark server" are two separate entities. The Spark server contains the actual location of Spark and is what you are connecting to using Spark-It (or SparkSQL), while Spark Proxy is the service that Zoomdata uses to connect to your Spark server.

Assuming this is the case, please try the following if you are encountering a "connection refused" error when trying to set up Spark-It:

  1. Verify that your /etc/hosts file is correct on both the Zoomdata and Spark server
  2. Verify that port 9292 is open wherever your Spark Proxy is running (by default Zoomdata server) and that port 7077 is open on your Spark server. For guidance about ports used for communication, refer to Prerequisite #5 in our Support article Configuring a Connection to a Standalone Spark Server
  3. In your zoomdata.conf file (located in /etc/zoomdata ), you can try adding the following parameter:
    spark.proxy.host= Zoomdata_hostname / IP

If you are using a standalone Spark server and are encountering a "smoke-test query is hanged up" error, try the following:

  1. Check whether your Spark instance is running by opening up the Spark admin console. The default port is 8080 if no additional configuration was made. You can use an URL with the following format: http:// spark_host_ip :8080
  2. Navigate to your Spark install location (SPARK_HOME) and run the following command:
    /bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark:// sparkhost :7077 lib/spark-examples*.jar 10
    If this command executes successfully, you should see the 'pi' value output in the terminal. This indicates that your Spark master is properly configured and that Zoomdata should be able to connect to your Spark server.< br/>
    The 'sparkhost' parameter to use in the above command can also be obtained from the Spark admin console.
  3. Try the same troubleshooting steps recommended above in the "connection refused" error section first.
  4. Within your spark-env.sh.conf file (found in SPARK_HOME/conf ), set the following parameters:
    SPARK_LOCAL_IP=<Spark host IP>
    SPARK_PUBLIC_DNS=<Spark hostname>
  5. ONLY if your Spark master node and slave node is on the same machine(one stand-alone instance), you can try removing the existing entry and add the spark hostname to the slaves file (located in SPARK_HOME/conf ).

If you are using a Spark cluster and encountering a "smoke-test query is hanged up" error, try the following:

  1. Run through the previously suggested validation/troubleshooting steps first.
  2. Verify network connectivity between the Spark master and slave nodes. Verify that all the appropriate ports are open on both your Zoomdata server and your Spark nodes. Check if the Spark master and slave nodes are able to SSH into each other as this is how the Spark nodes communicate with each other. Please refer to the Spark documentation for more information.