Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Scalable Machine Learning with Apache Mahout

Seattle Hadoop Day Advanced Track presentation on doing distributed machine learning on Hadoop with Mahout
by

Jake Mannix

on 2 October 2010

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Scalable Machine Learning with Apache Mahout

Scalable Machine Learning


Linear, or near linear scaling:
Double the data, should take
twice the time. Collaborative Filtering Clustering Dimensional Reduction Frequent Pattern Mining Classification Note: mention of "cluster size" implies that the algorithm is parallelizable.
This is very helpful, but neither necessary nor sufficient to imply "scalable". Clustering Math or: doubling the cluster
size should halve the time Text Utilities Math Random Forests Perceptron / Winnow (Complementary) Naive Bayes Dirichlet Process KMeans / Fuzzy KMeans Canopy Mean-Shift Recommenders: User
Item
SVD Online/Offline
JDBC-connectors
Taste Web App: SingularValueDecomposition Latent Dirichlet Allocation Lucene Text to Vectors Collocations Examples: Wikipedia
GroupLens
NetFlix
20Newsgroups
(nGrams / frequent phrases) (index as input) COLT Primitive Collections Distributed Matrices Online Stats Log-Likelihood Jake Mannix
Unemployed Layabout (work done while Principal Search Engineer at LinkedIn)
(starting Monday @twitter) jake.mannix@gmail.com @pbrane /in/jakemannix Resources:
http://mahout.apache.org
Mahout in Action (Manning EAP)
Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman Powered By Powered By Here are a few
companies powered by Mahout
Full transcript