The Hadoop Ecosystem
Posted by J Singh | Filed under Map Reduce, Hadoop, NoSQL
Here's a more detailed outline of my talk on March 12.
Introduction
- What Hadoop is, and what it's not
- Origins and History
- Hello Hadoop, how to get started.
The Hadoop Bestiary
- Core: Hadoop Map Reduce and Hadoop Distributed File System (HDFS)
- Data Access: HBase, Pig and Hive
- Algorithms: Mahout
- Data Import: Flume, Sqoop and Nutch
The Hadoop Providers
- Apache
- Cloudera
- What to do if your data is in a database
The Hadoop Alternatives
- Amazon EMR
- Google App Engine
Big Data Executive Briefing
Posted by J Singh | Filed under Executive Briefing, Big Data, NoSQL, Map/Reduce
- What is Big Data and what problems does it solve? What opportunities does it present that weren't available before?
- These terms are often seen in articles about Big Data: NoSQL, Map Reduce, Analytics, Schema-less Databases. How do they relate to each other and how do they differ?
- What problems you are better off solving with traditional database solutions?
- The various types of NoSQL databases and what are they each appropriate for?
- Key-value stores (e.g., Riak, Redis, Voldemort, Tokyo Cabinet)
- Document stores (e.g., MongoDB, CouchDB)
- Wide column stores (e.g., Cassandra, Hypertable)
- Graph Databases (e.g., Neo4J)
- Analysis and Visualization
- What is Map / Reduce and the opportunities it presents for your business.
- The cost / performance trade-offs of running Map Reduce in the cloud, on Amazon EC2 or on Google, or in-house.
Big data: Does size matter?
Posted by J Singh | Filed under big data, nosql, hadoop, map reduce, statistical analysis, numerical methods
Big data is about so many things:
- Size, of course, but you don't have to be Google-scale to need big data technologies. Heck, a few hundred gigabytes will suffice.
- Ad-hoc. Big Data platforms enable ad-hoc analytics on non-relational (ie unmodelled data). This allows you to uncover insights to questions that you never think to ask.
- Streaming. You cannot deliver true analytics of Big Data relying only on batch insights. You must deliver streaming and real-time analytics.
- Inconsistent. Air or water quality is measured in impurities-per-million. Perhaps we should have similar consistency metrics for data?
Mongo Boston 2011
Posted by J Singh | Filed under MongoDB, Map Reduce, Data Analysis
Mongo Boston 2011 was held at the New England Research & Development (NERD) on October 3.
Convergence of Analysis and Data
Posted by J Singh | Filed under SQL , Map Reduce, Big Data , NoSQL Hadoop
Moon in the Water
Moon-in-the-water happens when the moon shines in the water.When there is no moon, there is no moon-in-the-water.When there is no water, there is no moon-in-the-water.
The Convergence
BigData Solutions
Hands on Hadoop with Amazon EC2
Posted by J Singh | Filed under Amazon AWS, Map Reduce, Hadoop
A few months ago, I gave a talk at the Chelmsford Technology Skill Share Group. It was focused on the whys and wherefores of NoSQL and Map/Reduce. If you are interested in a copy of the presentation, please contact me.
Next week (9/14/11, 4:00 pm, Chelmsford Public Library, McCarthy Meeting Room. Directions.), I'll be giving a hands-on introduction to running Hadoop on Amazon EC2.
It will be as hands-on as the previous talk was conceptual. For the actual material, we will use the excellent tutorial by Michael Noll. Michael first wrote it in 2007 and has kept it up to date. The tutorial helps you install map/reduce and use it for computing the count of every word in a large text. We will use Ulysses by James Joyce as our sample text. Can we do this in 90 minutes? Yes, we can. But the goal is to get the most out of the journey.
To get the most out of the talk, you should be prepared to sign up for an Amazon account. They require a credit card but the credit card won't actually get charged because our usage will be a few ...