Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Platform Madness

No description

Tanya Schlusser

on 30 May 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Platform Madness

original image: http://www.tomrichmond.com/blog/2014/12/01/monday-madness-caricature-deja-vu/
Any ones I missed? *
Tanya Schlusser
4 February 2015
(Making the sausage)
Analytics is:
Data + Algorithms + UI
Distributed file system.
Custom codes follow the MapReduce paradigm.
Command line.
Java with options for Scala, Python, and piped scripts.
Distributed like Hadoop.
Subset of Mahout's, with nPath and sessionization. Custom codes follow MapReduce.
PostgreSQL; works with any client. Options for piped scripts and Java.
Your choice. Vertica is parallel in a different way than Hadoop.
Sessionization, gap filling (Vertica). R bindings, face recognition, QR code read, PDF to Text (IDOL).
Query-based Web API.
Your choice.
Your choice.
Your choice.
GUI drag-and-drop, but you must code in the blocks.
Connectors for R, plus built-in clustering, classification, regression and some text.
Like Heroku. Web GUI to connect blocks, then connect with Github.
Includes connectors to Watson, voice, and your custom code in the language you choose.
Links to trial versions of all the named platforms:
* please ping me @tanyaschlusser to point out others - yeah!
Anything that was ever rattling around in Stephen Wolfram's brain. You have to watch the video.
Query-based Web API
a distributed file system and a distributed computation platform
a Machine Learning and parallel linear algebra library that has a Spark + Scala shell implementing the distributed matrix libraries only (with plans for the rest). Written Java.

Best place to find parallel machine learning algorithms:
an in-memory parallel computation library that sources data from Hadoop's HDFS or local files. It has API bindings for Scala and Python, a shell (Python and Scala dialects) for interactive coding, and a smaller Machine Learning library.
Web dashboard for demo namenode:
(Tanya's master dashboard)
The best explanation of MapReduce I've ever heard
Problem: we want pizza -- but we're broke --
Must count our change:

MAP step
Everyone empty their pockets and bring them to the table
I (the master / queen) pile them out by denomination
everyone count a pile
We query for everyone's counts and sum
My trial license expired but here is a video demo of an app using IDOL:
Free widgets from a GUI -- basically a structured WolframAlpha query. Otherwise a real API (bindings for .net, Perl, Python, Ruby, PHP, Mathematica)
Thank you for coming - The End.
Aster Management Console
(GUI cluster management)
Queries are performed in PostgeSQL. The main novelty is their 'nPath' -- a way to label events given boolean tests (like URL click) and then track different customer journeys. The output is in the database and can be plotted via Aster Lens - a GUI server that connects to the database.
A dashboard API that resembles Heroku -- to choose and combine tools
A platform to run these tools -- hosts an internal git repo named 'jazz' and hosts your app
Developer console: https://console.ng.bluemix.net/
Git repo: https://hub.jazz.net/
A GUI: drag-and-drop connections and analytics. Microsoft at its best.
Full transcript