Adding Elasticity in Hadoop Datanode Using LVM

So what is Hadoop?

Apache Hadoop is an open-source software used for maintaining the environment of distributed storage and computing cluster, as in real life we need storage in real used we need storage which is not possible to provide by a single storage device, so here Hadoop comes into play which combines all its slave node storage and creates an illusion so that client feels a lot of storage in a single device. Hadoop is mainly used for achieving greater computational at a reduced cost.

So what is LVM?

LVM is a tool for logical volume management that allows us to create dynamic storage units that can be resized as per our needs without worrying about the issues of data leakage. Everything is done by a single command.

In this article,

Firstly we would be creating a physical volume.
Next, we would create the volume groups.
Next, we will use the volume group as a storage device we will create a logical volume from it and format it as we do always with the partition.
We would then mount this logic volume to the data node folder.
Then we will test the power of LVM by expanding the partition size on the fly.

→First, we have to install LVM software before using LVM commands:

For this, we will use dnf instead of yum as dnf is an upgraded version of yum with many more features.

dnf install lvm2

→So let's start with the Creation of a Physical Volume:

The volume group only take storage from the Physical volume so we need to convert our storage to a physical volume

fdisk -l

As we can we have 1GB volume connect with our OS

For creating physical volume we will use a command

pvcreate /dev/xvdf

To check the status of our physical volume we will use a command

pvdisplay /dev/xvdf

We have successfully created a Physical Volume of 1GB size

→Creating Volume Group:

For creating a new Volume Group we have to provide the Physical Volume which we want to connect with Volume Group and after the creation of Volume Group it makes single storage from all Physical Volume connected with the Volume Group

vgcreate “vg_name” /dev/xvdf

To check the status of the volume group we can use a command

vgdisplay “vg_name”

→Creating Logical Volume:

Logical Volume will behave as a normal partition that we create without using LVM and for creating partition we have to give the size of the partition but the size should be less than that of the Volume Group as the storage for Logic Volume will be allocated from Volume Group.

lvcreate --size 500M --name “lv_name” “vg_name”

To check the status of the logical volume we can use a command

lvdisplay “vg_name”/”lv_name”

As we can see my partition path is “/dev/my_vg/my_partition” at the time of mounting this partition with a folder we will use this path.

Now if check the volume group its size is decreased by 500MB as VG allocated 500MB to Logical Volume

Now we will format out created Logical Volume to store something we first need to format it and we will format our partition in “ext4” format

mkfs.ext4 /dev/”vg_name”/”lv_name”

→Mounting Partition with data node folder:

Before doing any operation, this information should be remembered that “all the volumes and disk(s) have their own path or directory, it’s like a folder and The Hadoop DataNode also uses a folder as Hadoop File System”. Hadoop DataNode Directory in our case the directory name (/dn1) and we will mount our Logical Volume with this folder.

mount /dev/”vg_name”/”pv_name” /”datanode_folder”

We can confirm if my partition is mounted with the data node folder or not

df -h

Now if we check the dfsadmin report we can see the storage contributed by the data node

As we can see the storage contributed by the data node is less than 500MB

→Extending Logical Volume size Elastically

Now we will see the power of LVM by extending the size of the partition on the fly and for extending we will use a command

lvextend --size +“size_to_increase” /dev/”vg_name”/”lv_name”

Let's check if my data node folder size increased or not using the “df -h” command

Why the size is still 500MB?

As I have earlier said we can use only that part of storage that is formatted and in our case, we have to format that 500MB which we increased and for that we will use a command

resize2fs /dev/”vg_name”/”pv_name”

This command will automatically resize and merge the unallocated space by recreating the inode table without erasing the data of the previous one.

Now if use df -h command

Our storage increased on the fly without unmounting the data node folder and without tempering the data that is existed in that folder.

Let's check the dfsadmin report to confirm the amount of storage now contributed by the data node

Now the storage has increased from the last time. I have added just 500MB of space. And now my data node is contributing around 960 MB of space to the Hadoop cluster.

Conclusion

As you can see above that we can control the distribution of the storage which the data node provides to the Hadoop cluster dynamically and can extend easily on the fly whatever and whenever needed. Also, we came to know that LVM helps us to provide Elasticity in the Storage Device using a dynamic partition.

Thanks for reading this article! Leave a comment below if you have any questions.

Search This Blog

Technology Stack