iTnews dared open source IT consultant Dez Blanchfield to build a Hadoop testbed that even a lowly tech journalist could build for themselves - you're about to enjoy the result.

Below we have posted step-by-step instructions on building a Hadoop instance in little over an hour. And while its best deployed on your server, its small enough to run on your laptop.

You will be downloading just under 500MB of software, which once unpacked amounts to around 1.7 GB of disk space on your machine.

We suggest you leave this window open on the machine you are running Hadoop on (you'll need to cut and paste a few commands!) but perhaps also on a tablet/laptop to refer to when you're knee-deep in command lines.

Dez will be hosting a Reddit AMA (ask me anything) on Wednesday October 9 at 3pm for those of you that get stuck.

We recommend our 'zero-to-hero' guide to help you understand the underpinnnings of Hadoop. But if you have any problems, we've also dropped a pre-built appliance (image) into a dropbox as either a .zip (Windows) or .tar (Unix) file that you can import and run. We call this the "easy-way-out". You'll find instructions on this process on page two.

Best of luck with your first Hadoop build!

Introduction

In this DIY test bed project we will show you how to do the following:

setup a hypervisor to run a linux virtual machine to host your lab machine

build a linux appliance to build your Hadoop lab on

install and configure java

install and configure a single node Hadoop instance

First up, here are some basic requirements to build your test bed:

A personal computer or server of some form.

A reasonably powerful x86 hardware (a recent Intel or AMD processor - an Intel-based Windows PC, Intel-based Mac or Intel-based Linux machine with at least 2 GB of RAM and 2 GB of Hard Drive space free.

note: You are going to be running a full virtual computer on top of a your own computer, so you need to consider the performance impact, i.e. it could potentially slow your PC down a little while you are running the Hadoop VM under VirtualBox.

1. Download

The first thing we need you to do is download the following two key components:

Virtualbox:

This is the hypervisor platform we’ll be running the test bed within.

Download and install Virtualbox from:

https://www.virtualbox.org/wiki/Downloads

Windows :

http://download.virtualbox.org/virtualbox/4.2.18/VirtualBox-4.2.18-88781-Win.exe

: http://download.virtualbox.org/virtualbox/4.2.18/VirtualBox-4.2.18-88781-Win.exe Mac :

http://download.virtualbox.org/virtualbox/4.2.18/VirtualBox-4.2.18-88780-OSX.dmg

: http://download.virtualbox.org/virtualbox/4.2.18/VirtualBox-4.2.18-88780-OSX.dmg Linux (select for your distro from):

http://download.virtualbox.org/virtualbox/4.2.18

Linux virtual appliance:

This is a tiny Linux system “appliance” virtual machine we’ll use to install and run Hadoop on.

We will be importing this self configuring Linux appliance with Virtualbox to build the linux virtual machine (VM) we need to start from.

Download base linux virtual machine OVF (from TurnKey Linux):

http://www.turnkeylinux.org/download?file=turnkey-core-12.1-squeeze-amd64-ovf.zip

Save it to a folder where you will setup your Hadoop test bed

it to a folder where you will setup your Hadoop test bed Expand the following downloaded file:

turnkey-core-12.1-squeeze-amd64-ovf.zip



Note: this will expand to a folder called: turnkey-core-12.1-squeeze-amd64

2. Install

Install the Virtualbox hypervisor:

The installation of Virtualbox is very simple, just locate the installer you downloaded, open it (i.e. double click on it), and follow the prompts.

Under Windows simply double click the download and it will lead you from there.

Under Linux and Mac OS X, you need to open the downloaded disk image or TAR file, and run the installer from within.

Follow the prompts, defaults will do what we need, you do not need to change anything during the install.

Simply double-click the base installer, follow the prompts and accept all the defaults, and in a few minutes you will have a full working version of Virtualbox installed and ready to run and import your Linux appliance.

Install and configure the base Linux VM:

The set of the Linux virtual machine is a little more detailed but the key steps are pretty straightforward.

If you get lost, just close the Appliance Import window and start again.

The whole process should not take more than about 10 minutes from start to finish.

Let’s get started. First run Virtualbox.

From the main "File" menu select "Import Appliance"

+ a new window will open titled "Appliance to import" + click on "Open appliance" button + navigate to the "turnkey-core-12.1-squeeze-amd64" folder + select the file "turnkey-core-12.1-squeeze-amd64.ovf" and click "Open" + click "Continue" + click "Import"

Note: you will now have a new virtual machine called "vm"

We now need to change a few settings

+ right click on "vm" and select "Settings" + rename the VM from "vm" to Hadoop" + click on the "system" icon + change the "Base memory" from 256 MB to 1024 MB ( 1 GB ) + in the "Boot order" window unselect "Floppy" and "CD" (leave Hard Disk checked) + click on "OK" to save settings

Now you can start up your Hadoop VM.

Double click on the "Hadoop" VM listed as "Powered Off" to start it

Note: you can also single click on the Hadoop VM icon and the click START button

+ the Hadoop VM will start up and auto-boot + you will be prompted for a new "Root Password" + set it to "hadoop" so it's easy to remember + it will ask you for the password twice to confirm you didn't make any typo's + you are then asked to "Initialise Hub services" + press the TAB key to select "Skip" and press return once + you are then asked to install "Security updates" + press the TAB key to select "skip" and press return once + your VM will then boot up and be running + you will have a window displaying URL's you can use to connect to your new VM

Note: this is only your "base" linux OS, we have not installed Hadoop yet. But you're doing great!

Congratulations, you’ve successfully installed Virtualbox and imported and configured your Linux appliance.

To confirm you can now connect to your Hadoop virtual machone via a web browser, make a note of the IP address displayed on the final screen when your Linux VM finishes booting (it will show up in the URL’s on the final screen), and use a web browser to connect that ip address on port 12320 to the built in web shell, i.e if the IP address was 10.10.10.50 then connect to:

http://10.10.10.50:12320

You will be presented with what looks like a terminal console. You can now login using the root user account and password, i.e.:

core login:

You are now ready to proceed to download and install the Oracle Java development kit (JDK) version 7, and the core distribution of Hadoop - we’ll be using version 1.2.1.



3. Setup and configure your Linux VM and Hadoop

To begin this section you need to be connected to your Hadoop VM. Do this via the web shell console using a web browser.

Use a web browser to connect the IP address displayed on the final screen on the Linux VM once it was booted up, on port 12320 to connect to the built in web shell:

http://10.10.10.50:12320

You will be presented with what looks like a terminal console. You can now login using the root user account and password, i.e.:

core login: root Password: hadoop

If this was successful you will now be logged in as the root user with a “#” prompt and you will see a screen similar to the following, and you will be at a prompt that looks like this:



root@core ~#

Welcome to Core, TurnKey Linux 12.1 / Debian 6.0.7 Squeeze System information (as of date) System load: 0.00 Memory usage: 12% Processes: 72 Swap usage: 0% Usage of /: 3.4% of 16.73GB IP address for eth0: 10.10.10.50 TKLBAM (Backup and Migration): NOT INITIALIZED To initialize TKLBAM, run the "tklbam-init" command to link this system to your TurnKey Hub account. For details see the man page or go to: http://www.turnkeylinux.org/tklbam Last login: Thu Oct 1 08:55:05 2013 from 10.10.10.123 root@core ~#

Note: that once you are logged in as root, you are in fact the super user, so tread gently as you have the power to break the system)!!

The first thing we will do is setup a “group” for Hadoop with the following command:

addgroup hadoop

It should look like this (commands are in bold):

root@core ~# addgroup hadoop

Adding group `hadoop' (GID 1001) ...

Done.

Now we need to add a user for Hadoop with the following command line:

adduser --ingroup hadoop hduser

It should look like this (commands are in bold):

Note: means press the “enter” key ( or the “return” key ). You will be prompted to enter a password twice ( to verify typos ), use hadoop, leave the name and other details blank as they are not required, and at the end enter a capital Y and press enter.

root@core ~# adduser --ingroup hadoop hduser

Adding user `hduser' ...

Adding new user `hduser' (1000) with group `hadoop' ...

Creating home directory `/home/hduser' ...

Copying files from `/etc/skel' ...

Enter new UNIX password: hadoop

Retype new UNIX password: hadoop

passwd: password updated successfully

Changing the user information for hduser

Enter the new value, or press ENTER for the default

Full Name []:

Room Number []:

Work Phone []:

Home Phone []:

Other []:

Is the information correct? [Y/n] Y

Now add our Hadoop user “hduser” to the sudo group ( so it can run commands as root ):

adduser hduser sudo

It should look like this (commands are in bold):

root@core ~# adduser hduser sudo

Adding user `hduser' to group `sudo' ...

Adding user hduser to group sudo

Done.

Now we are going to generate Secure Shell “keys” (we’ll explain what these are in the Webinar):

ssh-keygen -t rsa -P ""

It should look like this (commands are in bold):

(Note: means press the “enter” key or the “return” key).

root@core ~# ssh-keygen -t rsa -P ""

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Created directory '/root/.ssh'.

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

cb:41:83:4f:6f:52:d6:d3:c7:f7:b8:1b:29:c0:ac:b0 root@core

The key's randomart image is:

+--[ RSA 2048]----+

| |

| . . . . |

| . + o o . +|

| + B . oo|

| . S * . .|

| + * . o |

| E + . + |

| . o |

| . |

+-----------------+

Now add our new public key to the known keys file:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

It should look like this (commands are in bold):

root@core ~# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Now let’s confirm that our new SSH keys work and we can login with out entering a password.

The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file.

ssh localhost

It should look like this (commands are in bold):

Note: you need to type “yes” and press enter when it asks you if you want to continue connecting:

root@core ~# ssh localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established.

RSA key fingerprint is 24:96:3b:ce:08:93:43:b3:0e:58:44:05:f9:48:82:7b.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (RSA) to the list of known hosts.

Welcome to Core, TurnKey Linux 12.1 / Debian 6.0.7 Squeeze

System information (as of date)

System load: 0.00 Memory usage: 12%

Processes: 72 Swap usage: 0%

Usage of /: 3.4% of 16.73GB IP address for eth0: 10.10.10.50

TKLBAM (Backup and Migration): NOT INITIALIZED

To initialize TKLBAM, run the "tklbam-init" command to link this

system to your TurnKey Hub account. For details see the man page or

go to:

http://www.turnkeylinux.org/tklbam

Last login: Thu Oct 1 08:55:05 2013 from 10.10.10.123

root@core ~#



What we’ve done now is connect to our own system using an SSH public key stored so we don’t need to type in our account password – this allows Hadoop to run commands on the system without needing to know or enter the password.

Now exit from the login to your own server with this simple command line:

exit

It should look like this (commands are in bold):

root@core ~# exit

logout

Connection to localhost closed.

4. Download and setup Java

We will be using Oracle Java version 7 update 40, which you can download directly from the following URL:

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

You will need to open the above URL in your web browser and click on a button confirming you “Accept License Agreement” – once you click on the check box for this, you will be able to down load the following URL for the Jave JDK version 7 update 40:

http://download.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz

As we are installing the Java JDK on a Debian based Linux distribution, we will download a 32-bit linux version.

Oracle assume you are using a desktop web browser to download, but we’re doing it from a Linux command line, so we need to enter a slightly detailed URL to pretend we are a desktop web browser ( note: this is a single long line but the text is being wrapped here as it’s too long to fit on one line - you can cut & paste this to save from having to type it all in ):

wget --no-check-certificate --no-cookies --header "Cookie: gpw_e24=http%3A%2F% HYPERLINK "http://2Fwww.oracle.com" 2Fwww.oracle.com" " HYPERLINK "http://download.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz" http://download.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz"

It should look like this (commands are in bold):

root@core ~# wget --no-check-certificate --no-cookies --header "Cookie: gpw_e24=http%3A%2F% HYPERLINK "http://2Fwww.oracle.com" 2Fwww.oracle.com" " HYPERLINK "http://download.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz" http://download.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz" --2013-10-03 10:31:28-- http://download.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz Resolving download.oracle.com... 23.205.115.73, 23.205.115.75 Connecting to download.oracle.com|23.205.115.73|:80... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https://edelivery.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz [following] --2013-10-03 10:31:29-- https://edelivery.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz Resolving edelivery.oracle.com... 23.53.150.140 Connecting to edelivery.oracle.com|23.53.150.140|:443... connected. WARNING: certificate common name `www.oracle.com' doesn't match requested host name `edelivery.oracle.com'. HTTP request sent, awaiting response... 302 Moved Temporarily Location: http://download.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz?AuthParam=1380796415_1ccd08e79a9e1d8c453240d244958632 [following] --2013-10-01 10:31:35-- http://download.oracle.com/otn-pub/java/jdk/7u40-b43/jdk-7u40-linux-x64.tar.gz?AuthParam=1380796415_1ccd08e79a9e1d8c453240d244958632 Reusing existing connection to download.oracle.com:80. HTTP request sent, awaiting response... 200 OK Length: 138021223 (132M) [application/x-gzip] Saving to: `jdk-7u40-linux-x64.tar.gz.1' 100%[===============================>] 138,021,223 1.05M/s in 2m 9s 2013-10-01 10:33:45 (1.02 MB/s) – 'jdk-7u40-linux-x64.tar.gz' saved

We can quickly check that our download worked with the list subdirectories command:

ls -l

It should look like this (commands are in bold):

root@core ~# ls -l total 134788 -rw-r--r-- 1 root root 138021223 Oct 1 10:32 jdk-7u40-linux-x64.tar.gz

So now we have a file called “jdk-7u40-linux-x64.tar.gz” of approx. 138 MB in size

Now we extract the GZIP’ed Tape Archive, move it into the /usr/local directory and create a symbolic link to it to avoid typing long directory names, with the following steps:

Extract the JDK tar.gz file:

tar zxvf jdk-7u40-linux-x64.tar.gz

It should look like this (commands are in bold):

root@core ~# tar zxvf jdk-7u40-linux-x64.tar.gz

jdk1.7.0_40/

jdk1.7.0_40/COPYRIGHT

jdk1.7.0_40/README.html

jdk1.7.0_40/THIRDPARTYLICENSEREADME.txt

jdk1.7.0_40/lib/

…truncated…

Next we need to move it into the /usr/local directory:

(note: we’re going to rename “”jdk-7-oracle” in the process)

mv jdk1.7.0_40 /usr/local/jdk-7-oracle

It should look like this ( commands are in bold ):

root@core ~# mv jdk1.7.0_40 /usr/local/jdk-7-oracle

Note: we’ll add the Java bin directory to our PATH environment variable in a few steps.

5. Download and install Hadoop

Download Hadoop version 1.2.1 from:

wget https://www.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz

Note: this so all one single long line without breaks, it may wrap on the page.



It should look like this (commands are in bold):



root@core ~# wget https://www.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz --2013-10-01 11:12:21-- https://www.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz Resolving www.apache.org... 192.87.106.229, 140.211.11.131 Connecting to www.apache.org|192.87.106.229|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 63851630 (61M) [application/x-gzip] Saving to: `hadoop-1.2.1.tar.gz' 100%[===============================>] 147,456 1.05M/s in 1m 19s 2013-10-01 10:33:45 (1.02 MB/s) - ` hadoop-1.2.1.tar.gz' saved

Now extract the GZIP’ed Tape Archive using the following command:

tar zxvf hadoop-1.2.1.tar.gz



It should look like this (commands are in bold):

root@core ~# tar zxvf hadoop-1.2.1.tar.gz

hadoop-1.2.1/

hadoop-1.2.1/.eclipse.templates/

hadoop-1.2.1/.eclipse.templates/.externalToolBuilders/

hadoop-1.2.1/.eclipse.templates/.launches/

hadoop-1.2.1/bin/

…truncated…

Now move it to the /usr/local directory with this command line:

mv hadoop-1.2.1 /usr/local

It should look like this (commands are in bold):

root@core ~# mv hadoop-1.2.1 /usr/local

Next, create a softlink for /usr/local/hadoop with this command line:

ln -s /usr/local/hadoop-1.2.1 /usr/local/hadoop

It should look like this (commands are in bold):

root@core ~# ln -s /usr/local/hadoop-1.2.1 /usr/local/hadoop

Now we need to setup a couple of environment variables and update our command path.

To do this we need to edit our .bashrc ( dot bash rc ) file in the root users /home directory and add the following lines (cut and paste them to save typing them in):

export HADOOP_HOME=/usr/local/hadoop

export JAVA_HOME=/usr/local/jdk-7-oracle

export PATH=$PATH:$JAVA_HOME/bin

export PATH=$PATH:/usr/local/hadoop/bin

If you’re familiar with Linux use your editor of choice. I’m a VI user myself, but if you’re new to Linux you may way to use the nano editor. VI users will know their way around adding these lines to the .bashrc file. If you use the nano editor, add these extra lines just below the existing PATH setting so it looks like this:

Existing PATH setting: PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Add these lines below it:

export HADOOP_HOME=/usr/local/hadoop

export JAVA_HOME=/usr/local/jdk-7-oracle

export PATH=$PATH:$JAVA_HOME/bin

export PATH=$PATH:/usr/local/hadoop/bin

To put these changes into effect in our current shell we need to re-spawn a new shell with the following command:

exec bash

It should look like this (commands are in bold):

root@core ~# exec bash

We can quickly check that our command shell’s PATH environment variable can now find the java and hadoop commands with the following commands.

Check we can find the java command - it should look like this (commands are in bold):

root@core ~# which java

/usr/local/jdk-7-oracle/bin/java

Next we should confirm the version of Java installed (1.7.0_40):

root@core ~# java -version

java version "1.7.0_40"

Java(TM) SE Runtime Environment (build 1.7.0_40-b43)

Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)

5. Configure Hadoop as a single node instance

You're almost there! Next we need to make a directory for Hadoop to use for storage, which we’ll include in the configuration in the next few steps, change the directory permissions and ownership / group:

mkdir -p /usr/local/hadoop/tmp chmod 750 /usr/local/hadoop/tmp chown -R hduser.hadoop /usr/local/hadoop/tmp

It should look like this ( commands are in bold ):

root@core hadoop/conf# mkdir -p /usr/local/hadoop/tmp root@core hadoop/conf# chmod 750 /usr/local/hadoop/tmp root@core hadoop/conf# chown -R hduser.hadoop /usr/local/hadoop/tmp

Now we need to make a couple changes to the Hadoop configuration and set it up as a single node instance.

First change into the Hadoop conf directory using this command line:

cd /usr/local/hadoop/conf

It should look like this ( commands are in bold ):

root@core ~# cd /usr/local/hadoop/conf

Now we need to make the following changes to the respective files (edit and change to the following configuration).

Use your preferred editor to add / edit the files listed below to include the following lines. You can cut and paste to save having to type it all in manually:

File: core-site.xml <!--?xml version="1.0"?--> <!--?xml-stylesheet type="text/xsl" href="configuration.xsl"?--> <!-- Put site-specific property overrides in this file. --> hadoop.tmp.dir /usr/local/hadoop/tmp fs.default.name hdfs://localhost:9000 File: mapred-site.xml <!--?xml version="1.0"?--> <!--?xml-stylesheet type="text/xsl" href="configuration.xsl"?--> <!-- Put site-specific property overrides in this file. --> mapred.job.tracker localhost:9001 dfs.data.dir /usr/local/hadoop/tmp/dfs/data File: hdfs-site.xml <!--?xml version="1.0"?--> <!--?xml-stylesheet type="text/xsl" href="configuration.xsl"?--> <!-- Put site-specific property overrides in this file. --> dfs.replication 1

That’s all the configuration changes done!

Now for the next step!

Format the Hadoop Distributed File System (HDFS), with the following command:

hadoop namenode -format

It should look like this (commands are in bold):

root@core local/hadoop# hadoop namenode -format 13/10/03 12:13:32 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = core/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.7.0_40 ************************************************************/ 13/10/03 12:13:33 INFO util.GSet: Computing capacity for map BlocksMap 13/10/03 12:13:33 INFO util.GSet: VM type = 64-bit 13/10/03 12:13:33 INFO util.GSet: 2.0% max memory = 1013645312 13/10/03 12:13:33 INFO util.GSet: capacity = 2^21 = 2097152 entries 13/10/03 12:13:33 INFO util.GSet: recommended=2097152, actual=2097152 13/10/03 12:13:33 INFO namenode.FSNamesystem: fsOwner=root 13/10/03 12:13:33 INFO namenode.FSNamesystem: supergroup=supergroup 13/10/03 12:13:33 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/10/03 12:13:33 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 13/10/03 12:13:33 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 13/10/03 12:13:33 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 13/10/03 12:13:33 INFO namenode.NameNode: Caching file names occuring more than 10 times 13/10/03 12:13:34 INFO common.Storage: Image file /usr/local/hadoop/tmp/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds. 13/10/03 12:13:34 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/usr/local/hadoop/tmp/dfs/name/current/edits 13/10/03 12:13:34 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/usr/local/hadoop/tmp/dfs/name/current/edits 13/10/03 12:13:34 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted. 13/10/03 12:13:34 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at core/127.0.1.1 ************************************************************/

And that’s it – you’re all done! You can now start up your single node Hadoop cluster and check the core components are running as expected.

To do this we use the following command:

/usr/local/hadoop/bin/start-all.sh

It should look like this (commands are in bold):

root@core local/hadoop# /usr/local/hadoop/bin/start-all.sh starting namenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-core.out localhost: starting datanode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-core.out localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-core.out starting jobtracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-core.out localhost: starting tasktracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-core.out

We can now check that all of the required Hadoop daemons started up ok and are operational with the following command:

jps

It should look like this (commands are in bold):

root@core local/hadoop# jps 4406 DataNode 4777 TaskTracker 4269 NameNode 4553 SecondaryNameNode 4637 JobTracker 4926 Jps

If you have a NameNode, SecondaryNameNode, JobTracker, TaskTracker, and DataNode processes running (jps is the command we just entered of course ), the Hadoop is running.

Congratulations, you’ve just successfully built your very own DIY Hadoop test bed.

At the end to shut down our Hadoop cluster, we use the following command:

/usr/local/hadoop/bin/stop-all.sh

It should look like this (commands are in bold):

root@core local/hadoop# /usr/local/hadoop/bin/stop-all.sh stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode

To shut down and power off your Linux VM you can simply use the following command before exiting Virtualbox.

halt

It should look like this ( commands are in bold ):

root@core local/hadoop# halt

So, that’s it folks, hope you had fun.

Register for Dez' Reddit AMA (Wednesday October 9) or iTnews' Big Data webinar (Wednesday October 16) and we'll provide some Java-based example apps to show off what your Hadoop instance can do.

If you've had any dramas with the install, click through to page two for our 'appliance install' option.