Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Tuberias (o Pipelines) en R (o en Python)

Cómo realizar en R y Python transformaciones de datos de forma reproducible y compacta.
by

Carlos Ortega

on 7 October 2018

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Tuberias (o Pipelines) en R (o en Python)

TubeRías (o Pipelines)
en R (o Python)

Fontaneros
R for Data Science
http://r4ds.had.co.nz/
R for Data Science
http://r4ds.had.co.nz/
Problema
Tras importar datos quiero:
Eliminar NAs
Transformar algunas variables
Crear bines
Centrar
Escalar
...
Pero ....
Lo quiero con:
Funciones encadenadas.
Parametrizables.
.... a modo de
https://topepo.github.io/recipes/index.html
https://topepo.github.io/recipes/articles/Custom_Steps.html
recipe()
prep()

bake()
http://www.business-science.io/business/2017/11/28/customer_churn_analysis_keras.html
http://www.business-science.io/business/2017/11/28/customer_churn_analysis_keras.html
http://www.business-science.io/business/2017/11/28/customer_churn_analysis_keras.html
PIPELEARNER
https://s3-us-west-2.amazonaws.com/kevinykuo/rsconf-sparklyr/rstudio-conf-2018-sparklyr.html#1
https://www.rstudio.com/resources/webinars/creating-and-preprocessing-a-design-matrix-with-recipes/
https://spark.rstudio.com/guides/pipelines/
https://github.com/drsimonj/pipelearner
CONCLUSIONES
Python
(scikit-learn) incluye una solución para crear "pipelines" de forma muy compacta.

R
, por fin, tiene alternativas si cabe más variadas para crear "pipelines" (incluyendo Spark).
Por variedad, extensión e integración con
tidyverse
, se recomienda "
recipes
".
Veamos lo que hay
Full transcript