Integration Spark with SQL, Hive In MapR
1. Choose the Spark master node, Worker nodes and history server nodes.
2. Install mapr-spark on all the worker nodes and spark-history server in any of the node3. Install spark-master on the Master node
4. Run configure.sh -R script on all the nodes
5. Set the password less ssh between the master node and all the worker nodes
ssh-keygen -t rsa
ssh-copy-id <RSA Public key Path> <hostname>
6. Edit the node configuration file under /opt/mapr/spark/spark-1.6.1/conf directory , rename the slaves.template to slaves
and add the list of worker nodes hostnames or IP addresses
7. Once you done the configure.sh -R, Spark master and spark history server will be started automatically by the warden service
8. Start the worker nodes by using below commands
/opt/mapr/spark/spark-1.6.1/sbin/start-slaves.sh
This script will start the worker daemons in all the nodes, configured under the slaves file
Checking the spark functionality
Running Sample Spark Pi program
MASTER=spark://<Spark Master node hostname>:7077 /opt/mapr/spark/spark-<version>/bin/run-example org.apache.spark.examples.SparkPi 10
Configuring Spark-sql
1. Copy the hive-site.xml to conf directory of spark
2. Mention the hive version in /opt/mapr/spark/spark-1.6.1/mapr-util/compatibility.version file
Checking the spark on Hive functionality
MASTER=<master-url> <spark-home>/bin/run-example sql.hive.HiveFromSpark
This will connect to the hive and run queries on tables;
We can use spark-sql under bin to connect to hive
[mapr@sat-node1 bin]$ ./spark-sql
16/10/03 22:02:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/03 22:02:50 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/10/03 22:02:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SET spark.sql.hive.metastore.sharedPrefixes=com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,com.mapr.fs.shim.LibraryLoader,com.mapr.security.JNISecurity,com.mapr.fs.jni
SET hive.support.sql11.reserved.keywords=false
SET spark.sql.hive.version=1.2.1
SET spark.sql.hive.version=1.2.1
spark-sql> show tables;
decode_test false
h1_test2 false
ic_batch_run_info_test false
m false
src false
t1 false
t2 false
test false
test10 false
xxx false
Time taken: 1.921 seconds, Fetched 10 row(s)
spark-sql>
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop Administration Online Training Bangalore
ReplyDeleteNice article. I liked very much. All the informations given by you are really helpful for my research. keep on posting your views.
ReplyDeleteHadoop Training in Chennai
Big Data Training in Chennai
Web Designing Course in chennai
PHP Training in Chennai
German Classes in Chennai
Python Training in Chennai
Hadoop Training in Adyar
Hadoop Training in OMR