Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

MetaR and the NextflowWorkbench

Bio in Docker Symposium London Nov 9 2015
by

Fabien Campagne

on 20 January 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of MetaR and the NextflowWorkbench

Data Analysis:
GUI or Programming?

User Interfaces
There is a galaxy of programming languages

Great?
Not so much. Each language behaves like its own planet
Each planet is isolated from the others

Cross-language programming is tedious if it works at all
Java
C
Forth
Perl
Python
Erlang

Best to automate
analyses

Traditional
Programming
Languages rely on
compiler technology

Compilers do not
support language
composition




Languages Are Not
Composable
Lisp
Haskell
Gentle learning curve for beginners, but inefficient for repetitive analyses
beginners
experts
Even Worse: Both technologies support fragmentation inside their own community of users
GUIs: using several GUI tools is rarely seamless
Compiler technology has resulted in hundreds of languages, ~10 of them common.
beginners
experts
GUIs
Compilers
R
C++
Language Workbench Technology
Ideas introduced in the 90s (Simonyi C 1995)
Support Composable Languages (CLs)
Robust implementations refined in the last 10 years:
Meta-Programming System (JetBrains, open-source)
Intentional Software (commercial)
Spoofax (open-source)
XText (open-source)
Developed for the software engineering domain
Erdweg, S., van der Storm, T., Völter, M., Boersma, M., Bosman, R., Cook, W. R., et al (2013). The state of the art in language workbenches. In Software Language Engineering (pp. 197–217). Springer.
Language Workbench Technology
Ideas introduced in the 90s (Simonyi C 1995)
First open-source implementations ~2005
Workbench: interactive work environment
Support Composable Languages (CLs)
Several robust implementations available today
Developed for software engineering
2013: First application to bioinformatics

Summer 2014:
First Application
to Data Analysis
Now in press at
PeerJ
https://peerj.com/articles/800/
https://peerj.com/articles/241/
Application
to
biomarker development
project configuration
selection of modeling approaches
Assay platform and dataset(s)
prediction endpoints
A composable R language
The R language is used widely for data analysis in Bioinformatics

We developed a composable R language
Provides the ability to blend user interfaces and scripting, in the R world.
Enables first-class meta-programming in R.

Acknowledgments
Manuele Simi (MetaR, NextflowWorkbench, GobyWeb)
Jason P Kurs (NextflowWorkbench)
William ER Digan (Metar)
JetBrains (MPS)
Paolo Di Tommaso, CRG, Barcelona (Nextflow.io)
NIH NIAID award 5R01AI107762
Weill Cornell Medical College CTSC (UL1 RR024996, NIH, National Center for Research Resources)
http://metaR.campagnelab.org
http://workflow.campagnelab.org/
http://jetbrains.com/mps
Data Analysis:
from data to understanding
"Piled Higher and Deeper" by Jorge Cham
And yet, automating analysis in Bioinformatics is still hard!
Real-world analyses are done with tens of bioinformatics software tools
Installing each of these tools on one machine can be difficult
Installing on tens of nodes is not practical without automation
What if you could have the advantages of BOTH, in the same platform?
This is a little bit what electronic notebooks (IPython, Jupyter, Beaker) are trying to do..
languages non-programmers can use
the ability to extend and compose languages
Here's your heatmap
A simple data analysis language for beginners with no programming experience
Works
with
Git&
SVN
Works
with
Git&
SVN
example of language composition
but offer:
MetaR blends GUI and scripting, all in the same user interface
Platform supports designing objects to help with specific analysis tasks.
Here, annotation of the columns in a table of data
User Training
We teach MetaR to beginners in training sessions ~1h 30'.
Taught >100 beginners to perform diff exp and build a heatmap with MetaR.
MetaR analysis scripts generate R code that executes on the user laptop.
We found the installation of R packages (both CRAN and BioConductor) to be unreliable.
Users who take the MetaR training often have never used R before and need a reliable way to install R and the package dependencies.
We tried the R checkpoint package, works for CRAN packages but not BioConductor.


met
We needed a better solution to achieve consistent installation on the trainees' machines
Building the Docker Image
We use
rocker-base
to provide an image with Linux, and R.

We pre-install packages inside the image that we need for the training session.

Installing some packages sometimes requires installation of specific versions of libraries (e.g., the R Cairo package depends on a specific libcairo2-dev).

If the image builds, containers started from the image will run with a consistent environment.
Running MetaR Analyses
Inside Docker Containers
Trainees know neither programming nor scripting.
Starting docker on the command line is out of
We needed a seamless solution to run MetaR inside a docker container
Since MetaR is built with MPS, we can customize how analyses are run
the question.
We need a bit of information to interface with docker
This Preferences panel integrates with the MPS platform
For bioinformaticians who develop analysis pipelines
Built on top of with Language Workbench Technology from
When MetaR built on R, NextflowWorkbench builds on Nextflow
All the advantages of Nextflow:
Dataflow paradigm for implicit parallelization
Workflow portability (Local execution, Sun Grid Engine, Slurm, etc.)
Docker integration

In a language designed from the ground-up to provide:
Modularity (reusable Processes can be organized in libraries)
Interactive assistance
Extensibility and composability




Auto-completion for language elements facilitate learning
Intentions & Refactorings
Effective language typesystem helps write correct workflows
Execute workflows from inside MPS to help development

NextflowWorkbench is an Integrated Development Environment
Here's what Workflow and Processes
look like in the Workbench
Processes can be reused across Workflows
You can run workflows directly from the Workbench (code generates to Nextflow as part of running)
We needed a better solution than manual installation of tools
Since release 1.2
Pressing this button
will pull the image to the development machine
An interactive use container waiting
for docker exec commands
Demonstrating
interactive auto-completion
for files
inside the docker container
Also introduced in 1.2
Docker IDE features
Composable Dockerfile Language
Also, support to tag, then push image to docker registry
Since release 1.2
Dorff et al PLOS One 2013
Using image artifacts/software, the workbench can automatically install the following software and data resources:
Excerpt from the NextflowWorkbench documentation booklet
http://gobyweb.campagnelab.org
@FabienCampagne
Requests installation of specific resources
Salmon index for human transcriptome
Release 1.3
Makes it easier to create frozen docker images that include a specific set of GobyWeb resources
Useful for:
Clinical applications where analysis pipelines need to be frozen
Training to simplify installation on participants' laptops
To learn more about MetaR, see this preprint
Resources will be installed before the script starts
Release 1.3.1
You have developed a BASH script or Process and specified which the resources it needs.

A simple intention produces a Dockerfile.
Use it to freeze these resources into an image.
To recap
draft with BASH
Easily convert to
a Nextflow Process/
Workflow
Build a Frozen Docker Image
Enjoy the workbench's interactive features

Auto-completion for:
syntax
resources
files/dirs inside an image
Typesystem
Seamless execution

Reproducibility does not have to be hard
LWT helps us hide most technical details from the end-user and create simple GUIs
or zoom in here
Our lab is recruiting:
post-doc
research assistant level
Full transcript