Clustering Big Data


Join Us for a discussion on Clustering Big Data

When: Thursday June 4, noon – 1:00 pm EST. 
Where: (Virtual Meeting) Contact Us for coordinates.

Description: Approximate Nearest Neighbor methods for clustering and indexing have been actively researched ever since the K-Means algorithm was published in 1975 (and coded in FORTRAN). A recent book lists about 300 variants and related topics.

The 50th Anniversary issue of Communications of the ACM in 2008 cited two pieces of "Breakthrough Research". One was MapReduce, the other was clustering based on Locality Sensitive Hashing (LSH). Locality Sensitive Hashing is for sets of large data and alleviates many of the issues seen with k-means. Want to see if a body of code has remarkable similarity to a public github repo? Want to see "similar" fragments of DNA that are common between several species? LSH will get you there faster than most other techniques.

The talk will demonstrate OpenLSH, an open source implementation of LSH we have been working on.

Speaker Bios: Dr. J Singh is a Principal at DataThinks.org, a Cloud and Big Data consulting company. He is a frequent speaker on NoSQL, Hadoop--Map/Reduce and analytics of social media. He is the originator ...

OpenLSH - a framework for locality sensitive hashing


Presentation at Pivotal IO Meetup in New York