HADOOP

Hadoop is open-source software for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.

RED HAT ANSIBLE

Ansible is a radically simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs.

Ansible is Designed for multi-tier deployments, Ansible models your IT infrastructure by describing how all of your systems inter-relate, rather than just managing one system at a time.

Task Objective :

Configure Hadoop and start cluster services using Ansible Playbook.

Configuration plan

step 1: Connect to the managed node

step 2: download JDK and Hadoop

step 3: install JDK and Hadoop

step 4: update core-site.xml and hdfs-site.xml

step 5: Format Namenode

step 6: start services

To get a true feeling of automation please use playbook only do not use any command in between

NOTE: all the source code is in GitHub linked provide in the last

Let’s Get Started

To install ansible run the following command:

pip3 install ansible

you need to create an ansible directory in

etc/ansible/                          // ansible directory

Inventory File[Data.txt]

Ansible uses this file to check host connectivity it is a kind of IP database.

[MasterNode]192.168.0.112  ansible_connection=ssh  ansible_user=root  ansible_ssh_pass=root[SlaveNode]192.168.0.113  ansible_connection=ssh  ansible_user=root  ansible_ssh_pass=root

Configuration File[ansible.cfg]

Ansible uses this file to check its configurations.

it is located at /etc/ansible/ansibel.cfg   
 // you need to create file[defaults]inventory = /data.txthost_key_checking = Falsedeprecation_warnings = False

step 1: Connect to the managed node

to check the connectivity of the managed node run the following command

ansible all -m ping

step 2: download JDK and Hadoop

NOTE: I have already downloaded JDK and Hadoop using ansible-playbook.

Download From Here:

JDK:https://github.com/frekele/oracle-java/releases/download/8u171-b11/jdk-8u171-linux-x64.rpmHadoop:https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1-1.x86_64.rpm

To run Playbook use the following command

ansible-playbook FileName.yml

To transfer files between Linux system run the following command

scp -r file1 file2  'ip of destination:/path to save file'

step 3: install JDK and Hadoop

to install JDK and Hadoop we have an ansible playbook you will get in my GitHub

we will use the shell module from the ansible because JDK and Hadoop is not part of yum repository

step 4: update core-site.xml and Hdfs-site.xml

In this step, we will edit the Core-site.xml and Hdfs-site.xml and we will add the property or we can say configuration it is important because without this Hadoop will confuse from where we are getting storage and on which port I have to give the service

NOTE: only Hdfs-site.xml will be different in master and data node. Core-site.xml will remain the same in both.

hdfs-site.xml  // will be different in both  // Master Node<configuration><property><name>dfs.name.dir</name>   // telling take following directory<value>/master</value>   //Folder path</property></configuration>
core-site.xml          // will be same in both
<configuration><property><name>fs.default.name</name><value>hdfs://192.168.0.112:9001</value>  // any port you can give</property></configuration>

NameNode or MasterNode

DataNode or SlaveNode

hdfs-site.xml  // will be different in both  // slave Node<configuration><property><name>dfs.data.dir</name>   // telling take following directory<value>/slave</value>   //Folder path</property></configuration>

step 5: Format Namenode

Hadoop uses its own file system that is HDFS(Hadoop file system) so we need to format the MasterNode only no need to format slave node Because management is handled by Master Node

we have the following command to do this but we will use ansible Playbook for this as well

hadoop namenode -format

We need to pass the command

 echo Y | hadoop namenode -format

in the shell module because this command takes user input while formatting name node

step 6: start services of MasterNode and DataNode

Note: MasterNode already started so we will see DataNode

To start the Cluster services we have command but we will use ansible-playbook only

Master
hadoop-daemon.sh start namenode  
   
// use 'stop' insted of 'start' to stop servicesSlave
hadoop-daemon.sh start datanode

all the services have been configured so We are able to see the GUI of the Hadoop

to access GUI use the following URL in the browser

http://'MasterNode IP':50070     // 50070 GUI port of hadoop

GitHub Link:

https://github.com/venkateshpensalwar/ARTH/tree/main/Ansible/Configure%20Hadoop

Conclusion:

we have learned how to configure Softwares which have clustering technology. In this way, we can set up Hadoop clusters with the help of Ansible Automation.

Hope this blog is helpful to you!!!!

Latest Updates

How To

Tags

Related Post

Configure Hadoop cluster using ansible playbooks

RED HAT ANSIBLE

Task Objective :

Let’s Get Started

No comments

Popular Posts

Blog Archive

About Me

Popular Posts

Comments

Featured Posts

Interesting Posts

Visitor Served

Report Abuse

Stories