Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Scalable Machine Learning with Apache Mahout

Seattle Hadoop Day Advanced Track presentation on doing distributed machine learning on Hadoop with Mahout

Jake Mannix

on 2 October 2010

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Scalable Machine Learning with Apache Mahout

Scalable Machine Learning

Linear, or near linear scaling:
Double the data, should take
twice the time. Collaborative Filtering Clustering Dimensional Reduction Frequent Pattern Mining Classification Note: mention of "cluster size" implies that the algorithm is parallelizable.
This is very helpful, but neither necessary nor sufficient to imply "scalable". Clustering Math or: doubling the cluster
size should halve the time Text Utilities Math Random Forests Perceptron / Winnow (Complementary) Naive Bayes Dirichlet Process KMeans / Fuzzy KMeans Canopy Mean-Shift Recommenders: User
SVD Online/Offline
Taste Web App: SingularValueDecomposition Latent Dirichlet Allocation Lucene Text to Vectors Collocations Examples: Wikipedia
(nGrams / frequent phrases) (index as input) COLT Primitive Collections Distributed Matrices Online Stats Log-Likelihood Jake Mannix
Unemployed Layabout (work done while Principal Search Engineer at LinkedIn)
(starting Monday @twitter) jake.mannix@gmail.com @pbrane /in/jakemannix Resources:
Mahout in Action (Manning EAP)
Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman Powered By Powered By Here are a few
companies powered by Mahout
Full transcript