Distributed storage on Debian made easy with GlusterFS

GlusterFS is a mature, elegant and powerful distributed filesystem targeted at very high capacities and availability. Sponsored by Red Hat Inc. and included in their storage server solution, this open-source software is kindly available for some other Linux distributions package system or as sources.

Unlike many other distributed solutions, there is no need to have many computers in order to have a taste of Gluster ease of use. A few minutes to spare is fairly enough to do it on your own computer. Note also that only the amd64 architecture is present in the repository and thus the following apply to those 64 bits machines only.

First, add the GnuPG key for the repository and the corresponding entry for APT:

wget -O - | apt-key add -
echo "deb [ arch=amd64 ] wheezy main" >/etc/apt/sources.list.d/glusterfs.list

The arch option is useful, as documented in Multiarch specs in case you’re using multiarch with some foreign architecture package already installed.

Next, update the packages database and install both the server and client packages:

apt-get update
apt-get install glusterfs-server glusterfs-client

Now, either you have a whole disk or partition available or, like me, you don’t. Let’s just use a file as our disk then. In any case, the goal is to format our disk, preferably with XFS, and mount it.

Doing it with a disk or a partition is left to the reader’s discretion and knowledge ;] with a file, it’s as easy as (thanks to this libgfapi doc):

truncate -s 5GB /srv/xfsdisk
mkfs.xfs -i size=512 /srv/xfsdisk
mkdir -p /export/brick
echo "/srv/xfsdisk /export/brick xfs loop,inode64,noatime,nodiratime 0 0" >> /etc/fstab
mount /export/brick

Last tip before starting our cluster, as Gluster doesn’t want us to use localhost as a valid node hostname, we add a definition for another name on our loopback network:

echo " localnode" >>/etc/hosts

Now the real work with Gluster may begin; first, create a directory in the dedicated mount-point and add it as a brick on our upcoming volume:

mkdir /export/brick/b1
gluster volume create test localnode:/export/brick/b1

Last, start the volume and enjoy, it’s working.

gluster volume start test

And now…? Now you may play a little with the powerful gluster CLI, gluster help will output the available commands. You may also be a client of your cluster storage (yes, you can) by simply mounting the volume somewhere, like:

mkdir /mnt/gluster
mount -t glusterfs localnode:/test /mnt/gluster

Comments are closed.

Comment (1)

  1. Hi Racker Hacker,We have been using GlusterFS version 2.08 for a cpolue of years with good results until a few weeks ago. Our problems started occurring when the concurrent traffic to our PHP site increased to about 200 concurrent users on each front end server. At this point the performance of the site was so bad that page requests would just time out and the users would be unable to use the system. Our system has the following characteristics.-Load Balancer.-Two front end machines with Xeon Quad Core X3360 2.83 GHz, 4 GB of RAM and disks of 7200 RPMs in RAID 1 configuration. Each of these machines are running apache servers and we were using GlusterFS which is installed in both of the front end machines and was used to maintain the files submitted by the users as well as the website code and its contents (PHP, images, css, js, html) synchronized between these two machines.-Database server with 2x Xeon Quad Core E5410 2.33 GHz, 8 GB of RAM and disks of 10 k RPMs in RAID 10 configuration. This server is running an Oracle database.In order to replicate the problem faced by the users I used apache’s ab tool to simulate 200 concurrent requests against one of the front end servers and using top and iostat I saw that the io wait time went to the roof and that GlusterFS was using 30-40% of the server’s CPU. Once 200 concurrent requests were being made to the server it became unable to serve php webpages because the user would need to wait forever to get each page to the point where web pages would time out.Based on these results I disabled GlusterFS and now each server is able to serve over 400 concurrent requests at a time (I haven’t test it using more requests but I am pretty sure it wouldn’t have any problem serving them). However, based on what I have read about GlusterFS being able to handle cloud infrastructures that contain petabytes of data, I am pretty sure that I must be missing something.Can you please help me with this?Thanks and Regards,Fred