Thursday, 12 September 2013

Using Apache Whirr & Amazon EC2

So once you are done with basic stuff of installing hadoop locally and doing some "Hello World" stuff (wordcount in hadoop parlance), now it is time to put hadoop into some real usage to understand it's value. A good infrastructure is required to do some heavylifting with fairly large amount of data. I did go through some of the cloud providers and figured out that Amazon EC2 works out cheaper. I was also promised to get 50$ AWS credit for participating in RedHat survey.

There are plenty of posts available as to how to setup hadoop on EC2. I followed this one. I also figured out that installing hadoop into cloud is fairly cumbersome - if you are doing it by hand. Apache Whirr an installer tool for cloud services could help get things going pretty fast.

So using Apache whirr and Amazon EC2, I was able to setup my first hadoop cluster and was able to run some word count map-reds.

Saturday, 29 June 2013

Quick hands-on with Hadoop for Java Professionals

Apache Hadoop is a suite of components that are aimed at handling large scale data.The traditional n-tier architecture with RDBMS based data store doesnt scale as the data volume exceeds and in the range of peta bytes and this is the space where Hadoop comes into play. Hadoop is not a replacement for traditional architecture, but complements it in the high volume data space.

This series of post is aimed at giving a head start for Java professionals through some practical hands on exercises.

1. Download cygwin.
Make sure that you install the optional packages - SSH, Open SSH, python > 2.6

2. JDK - Install a java distribution which is > 1.6.

3. Hadoop core

4. Follow the instructions here to complete the rest of the setup

5. You could try some examples that are inbuilt in the hadoop package. The equivalent of Hello world in Hadoop is the word count problem where the search for a key word is performed in given data set.

In next post, we will see more practical use case with Hadoop.

Common Issues faced while setting up Hadoop
Install all packages in folder without spaces. For e.g. if you have your JDK in c:/program files/java then you may face issues with the path.