Saturday 29 June 2013

Quick hands-on with Hadoop for Java Professionals

Apache Hadoop is a suite of components that are aimed at handling large scale data.The traditional n-tier architecture with RDBMS based data store doesnt scale as the data volume exceeds and in the range of peta bytes and this is the space where Hadoop comes into play. Hadoop is not a replacement for traditional architecture, but complements it in the high volume data space.

This series of post is aimed at giving a head start for Java professionals through some practical hands on exercises.

1. Download cygwin. http://cygwin.com/setup.exe
Make sure that you install the optional packages - SSH, Open SSH, python > 2.6

2. JDK - Install a java distribution which is > 1.6.

3. Hadoop core
hadoop.apache.org

4. Follow the instructions here to complete the rest of the setup
http://hadoop.apache.org/docs/stable/single_node_setup.html

5. You could try some examples that are inbuilt in the hadoop package. The equivalent of Hello world in Hadoop is the word count problem where the search for a key word is performed in given data set.

In next post, we will see more practical use case with Hadoop.

Common Issues faced while setting up Hadoop
Install all packages in folder without spaces. For e.g. if you have your JDK in c:/program files/java then you may face issues with the path.