Integration Spark with SQL, Hive In MapR

- September 16, 2016

1. Choose the Spark master node, Worker nodes and history server nodes.

2. Install mapr-spark on all the worker nodes and spark-history server in any of the node
3. Install spark-master on the Master node
4. Run configure.sh -R script on all the nodes
5. Set the password less ssh between the master node and all the worker nodes

ssh-keygen -t rsa
ssh-copy-id <RSA Public key Path> <hostname>

6. Edit the node configuration file under /opt/mapr/spark/spark-1.6.1/conf directory , rename the slaves.template to slaves
and add the list of worker nodes hostnames or IP addresses
7. Once you done the configure.sh -R, Spark master and spark history server will be started automatically by the warden service
8. Start the worker nodes by using below commands

/opt/mapr/spark/spark-1.6.1/sbin/start-slaves.sh

This script will start the worker daemons in all the nodes, configured under the slaves file

Checking the spark functionality

Running Sample Spark Pi program

MASTER=spark://<Spark Master node hostname>:7077 /opt/mapr/spark/spark-<version>/bin/run-example org.apache.spark.examples.SparkPi 10

Configuring Spark-sql

1. Copy the hive-site.xml to conf directory of spark
2. Mention the hive version in /opt/mapr/spark/spark-1.6.1/mapr-util/compatibility.version file

Checking the spark on Hive functionality

MASTER=<master-url> <spark-home>/bin/run-example sql.hive.HiveFromSpark

This will connect to the hive and run queries on tables;

We can use spark-sql under bin to connect to hive

[mapr@sat-node1 bin]$ ./spark-sql
16/10/03 22:02:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/03 22:02:50 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/10/03 22:02:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SET spark.sql.hive.metastore.sharedPrefixes=com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,com.mapr.fs.shim.LibraryLoader,com.mapr.security.JNISecurity,com.mapr.fs.jni
SET hive.support.sql11.reserved.keywords=false
SET spark.sql.hive.version=1.2.1
SET spark.sql.hive.version=1.2.1
spark-sql> show tables;
decode_test false
h1_test2 false
ic_batch_run_info_test false
m false
src false
t1 false
t2 false
test false
test10 false
xxx false
Time taken: 1.921 seconds, Fetched 10 row(s)
spark-sql>

Comments

Unknown31 October 2018 at 05:00
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop Administration Online Training Bangalore
ReplyDelete
Replies
sathyaramesh12 April 2019 at 22:40
Nice article. I liked very much. All the informations given by you are really helpful for my research. keep on posting your views.
Hadoop Training in Chennai
Big Data Training in Chennai
Web Designing Course in chennai
PHP Training in Chennai
German Classes in Chennai
Python Training in Chennai
Hadoop Training in Adyar
Hadoop Training in OMR

ReplyDelete
Replies

Add comment

Search This Blog

Hadoop Admin's Blog

Integration Spark with SQL, Hive In MapR

Comments

Post a Comment

Popular posts from this blog

Querying MapR DB tables from Drill

OpenSSL Certificate creations

HTTP Fs Installation