Posts

Featured Article

Setting Up Spark Development Environment on IntelliJ IDE

Image
Here are the Steps which can be followed to Setup Spark Development environment on IntelliJ IDEA IDE     Pre-requisites: Download and Install IntelliJ IDE on Mac / Windows Environment  https://www.jetbrains.com/idea/download/?section=mac Install Java JDK on Mac or Download tar archive and keep in some path Setup Procedure: Create New Project by File >> New Project >> And Choose "Python Project" If you are not seeing "Python" as an option, try to install plugin by Clicking "More Via Plugins" on bottom left and search for "Python" Creating the project with above said procedure will automatically create Project Structure as below Create requirements.txt file with Desired pyspark versions. Available pyspark releases can be found here  https://pypi.org/project/pyspark/#history Open Terminal from the IDE (Bottom left Icon) and Activate the Virtual environment Install pyspark package from the requirements.txt file Once necessary packages ins...

Open SSL - Creating Self-Signed certificates

Open SSL Create Self Signed Certificate Here are the steps involved in creating self signed certificate a) Create a key b) Generate Certificate Sign request .csr c) Sign your certificate (As it is self signed) a) Create a key Create a directory first (to avoid confusions) mkdir certificates cd certificates Generate key file #openssl genrsa 1024 > private.key b) Generate Certificate Sign request .csr c) Sign your certificate (As it is self Signed) #openssl x509 -in certrequest.csr -out selfsigned.crt -req -signkey private.key -days 365 Here the -days is the number of days denotes the validity of certificate Once after this we can get the certificate as file "selfsigned.crt"

KAFKA commands - 1

A) Create KAFKA Topics Command Format #kafka-topics --create --topic [TOPIC NAME] --zookeeper [ZOOKEEPER HOSTNAME] --replication-factor [NUMBER OF REPLICAS] --partitions [NUMBER OF PARTITIONS] Example: #kafka-topics --create --topic mytesttopic --zookeeper myzkhost.cluster.com --replication-factor 2 --partitions 3 Here:- [TOPIC NAME = the name of the topic which is to be created ], [ZOOKEEPER HOSTNAME = zookeeper hostname ] [NUMBER OF REPLICAS = number of replications for each partitions, It should not be greater than the number of Brokers ] [NUMBER OF PARTITIONS = Number of partitions for a topic ] B) Describe a topic Command Format #kafka-topics --describe --topic [ TOPIC NAME ] --zookeeper [ZOOKEEPER HOSTNAME ] Example #kafka-topics --describe --topic mytesttopic --zookeeper myzkhost.cluster.com Topic:mytesttopic PartitionCount:3 ReplicationFactor:2 Configs: Topic: mytesttopic Partition: 0 Leader: 10 Replicas: 10,30 Isr: 10,30 Topic: myt...

OpenSSL Certificate creations

Create Self Signed Certificate Here are the steps involved in creating self signed certificate a) Create a key b) Generate Certificate Sign request .csr c) Sign your certificate (As it is self signed) a) Create a key Create a directory first (to avoid confusions) #mkdir certificates #cd certificates Generate key file #openssl genrsa 1024 > private.key b) Generate Certificate Sign request .csr #openssl req -new -key private.key > certrequest.csr c) Sign your certificate (As it is self Signed) #openssl x509 -in certrequest.csr -out selfsigned.crt -req -signkey private.key -days 365 Here the -days is the number of days denotes the validity of certificate Once after this we can get the certificate as file "selfsigned.crt"

Setting up Simple MIT KDC Server - CentOS and RHEL Environments

Here are the precise steps for configuring MIT Kerberos server on CentOS, You can follow the same steps for configuring in RHEL servers as well Prerequisites 1. Make sure you have proper FQDN (Fully Qualified Domain Name ) present on your servers, ( Example : myhost.mydomain.com) 2. This can be verified by issuing (hostname -f) and (hostname) commands 3. Additionally make sure your have proper DNS resolution for the Hosts (If present) 4. If DNS is not configured on those nodes, make sure there is valid (/etc/hosts) file pointing correct hostnames 5. Make sure you have either local yum repository / Internet repository access Installation 1. Install packages            yum install krb5-libs krb5-server krb5-workstation Here krb5-server is actual kerberos server and krb5-workstation is your client package which needs to be installed on client nodes (Example: Nodemanagers, Datanodes etc.,) 2. Configure KDC server Edit /etc/krb5.conf as ...

MapR - Drill does not start after enabling kerberos - Fails with exception [ YouShouldntSeeThisErrorUnlessYourJVMhadoop.loginPropertiesAreBad ]

This kind of exception messages probably if you could see when your Drillbit does not start after enabling kerberos authentication by following this article . From drillbit.out file Caused by: java.io.IOException: Login failure for mapr/my52cluster@SATZ.COM from keytab /opt/mapr/conf/mapr.keytab: javax.security.auth.log in.LoginException: unable to find LoginModule class: YouShouldntSeeThisErrorUnlessYourJVMhadoop.loginPropertiesAreBad         at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:907)         at org.apache.drill.exec.server.BootStrapContext.login(BootStrapContext.java:145)         ... 5 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: YouShouldntSeeThisErrorUnlessYourJVMhadoop.loginProp ertiesAreBad         at javax.security.auth.login.LoginContext.invoke(LoginContext.java:794)     ...

Integration Spark with SQL, Hive In MapR

1. Choose the Spark master node, Worker nodes and history server nodes. 2. Install mapr-spark on all the worker nodes and spark-history server in any of the node 3. Install spark-master on the Master node 4. Run configure.sh -R script on all the nodes 5. Set the password less ssh between the master node and all the worker nodes ssh-keygen -t rsa ssh-copy-id <RSA Public key Path> <hostname> 6. Edit the node configuration file under /opt/mapr/spark/spark-1.6.1/conf directory , rename the slaves.template to slaves and add the list of worker nodes hostnames or IP addresses 7. Once you done the configure.sh -R, Spark master and spark history server will be started automatically by the warden service 8. Start the worker nodes by using below commands /opt/mapr/spark/spark-1.6.1/sbin/start-slaves.sh This script will start the worker daemons in all the nodes, configured under the slaves file Checking the spark functionality Running Sample...