Integration Spark with SQL, Hive In MapR

1. Choose the Spark master node, Worker nodes and history server nodes.
2. Install mapr-spark on all the worker nodes and spark-history server in any of the node
3. Install spark-master on the Master node
4. Run configure.sh -R script on all the nodes
5. Set the password less ssh between the master node and all the worker nodes

ssh-keygen -t rsa
ssh-copy-id <RSA Public key Path> <hostname>

6. Edit the node configuration file under /opt/mapr/spark/spark-1.6.1/conf directory , rename the slaves.template to slaves
and add the list of worker nodes hostnames or IP addresses
7. Once you done the configure.sh -R, Spark master and spark history server will be started automatically by the warden service
8. Start the worker nodes by using below commands

/opt/mapr/spark/spark-1.6.1/sbin/start-slaves.sh

This script will start the worker daemons in all the nodes, configured under the slaves file

Checking the spark functionality

Running Sample Spark Pi program

MASTER=spark://<Spark Master node hostname>:7077 /opt/mapr/spark/spark-<version>/bin/run-example org.apache.spark.examples.SparkPi 10

Configuring Spark-sql


1. Copy the hive-site.xml to conf directory of spark
2. Mention the hive version in /opt/mapr/spark/spark-1.6.1/mapr-util/compatibility.version file

Checking the spark on Hive functionality

MASTER=<master-url> <spark-home>/bin/run-example sql.hive.HiveFromSpark

This will connect to the hive and run queries on tables;

We can use spark-sql under bin to connect to hive

[mapr@sat-node1 bin]$ ./spark-sql
16/10/03 22:02:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/03 22:02:50 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/10/03 22:02:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SET spark.sql.hive.metastore.sharedPrefixes=com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,com.mapr.fs.shim.LibraryLoader,com.mapr.security.JNISecurity,com.mapr.fs.jni
SET hive.support.sql11.reserved.keywords=false
SET spark.sql.hive.version=1.2.1
SET spark.sql.hive.version=1.2.1
spark-sql> show tables;
decode_test     false
h1_test2        false
ic_batch_run_info_test  false
m       false
src     false
t1      false
t2      false
test    false
test10  false
xxx     false
Time taken: 1.921 seconds, Fetched 10 row(s)
spark-sql>

Comments

  1. Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop Administration Online Training Bangalore

    ReplyDelete

Post a Comment

Popular posts from this blog

Querying MapR DB tables from Drill

OpenSSL Certificate creations

HTTP Fs Installation