Hands on with Hadoop
Posted by J Singh | Filed under Amazon EC2, Hadoop
This post begins a series of exercises on Hadoop and its ecosystem. Here is a rough outline of the series:
- Hello Hadoop World, including loading data into and copying results out of HDFS.
- Hadoop Programming environments with Python, Pig and Hive.
- Importing data into HDFS.
- Importing data into HBase.
- Other topics TBD.
We assume that the reader is familiar with the concepts of Map/Reduce. If not, feel lucky you have Google. Here are two of my favorite introductions: (1) The Story of Sam and (2) Map Reduce: a really simple introduction.
Hello Hadoop World
We build this first environment on Amazon EC2. We begin with ami-e450a28d, an EBS-based 64-bit ubuntu machine that's pretty bare-bones. To run Hadoop, we need at least an m1.large instance — using a micro instance is just too painful.
- Make sure python 2.6 ...