Big data: Does size matter?
Posted by J Singh | Filed under big data, nosql, hadoop, map reduce, statistical analysis, numerical methods
Big data is about so many things:
- Size, of course, but you don't have to be Google-scale to need big data technologies. Heck, a few hundred gigabytes will suffice.
- Ad-hoc. Big Data platforms enable ad-hoc analytics on non-relational (ie unmodelled data). This allows you to uncover insights to questions that you never think to ask.
- Streaming. You cannot deliver true analytics of Big Data relying only on batch insights. You must deliver streaming and real-time analytics.
- Inconsistent. Air or water quality is measured in impurities-per-million. Perhaps we should have similar consistency metrics for data?
But the biggest difference is in the tools we use to analyze and present big data. Big data analysis involves a heavy dose of numerical analysis, statistical methods, algorithms for teasing signals from noise, and techniques that would be more familiar to a scientist than a database analyst.
Mongo Boston 2011
Posted by J Singh | Filed under MongoDB, Map Reduce, Data Analysis
Mongo Boston 2011 was held at the New England Research & Development (NERD) on October 3.
Here is our presentation from the conference.