Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Spark ML

This is an intro talk on machine learning and on how to Use Spark MLlib and Spark ML. Please check the repo with the code: https://github.com/zoltanctoth/spark-ml-intro . If you have any questions drop me a line at zoltanctoth@gmail.com
by

Zoltan C. Toth

on 14 June 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Spark ML

RDD
SchemaRDD
DataFrame
1.0
SparkSQL
1.3
hdfs
hdfs :(
:)
:)
>
API
lazy
immutable
local API = cluster API
|
MLlib
Spark ML
Unified API for Data Preprocessing / Transformation / ML
Estimator
Transformator
Pipelines
DataFrame support from day 1
(learning algorithm)
(learning model, feature transformer, ...)
E.g. Logistic Regression
ML
:(
hdfs
hdfs
https://github.com/zoltanctoth/spark-ml-intro
Machine Learning
+
=
at a glance
Regression
Classification
Clustering
dimensionality
x^2+y^2 = 1
overfitting
Logistic Regression
Featurisation
Training
Evaluation
identify features
extract features
transform features
create feature vectors
X-fold cross-validation
Developed to be used in
a parallel environment.
Modern algorithms
kmeans||
distributed random forests
use Weka, Scikit-learn on a single node
Works on RDDs
supervised
example
sensitivity
Full transcript