Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Open Source Configuration

of Bioinformatics Infrastructure

This presentation:

Beyond CloudBioLinux

Implemented on top of git submodules

http://bit.ly/bosc2013

github.com:chapmanb/cloudbiolinux.git

config/

puppet/

modules/

lwr

biocloudcentral

apache

.....

...

github.com:bioconfig/puppet-lwr.git

Upshot:

github.com:bioconfig/puppet-biocloudcentral.git

John Chilton1, Pratik Jagtap1, Benjamin Lynch1, Brad Chapman2, Timothy Griffin3

github.com:puppetlabs/puppetlabs-apache.git

They can be easily integrated the same way by institutions or teams with their own Chef or Puppet repositories or by tools such as Globus Provision.

1 University of Minnesota Supercomputing Institute

2 Harvard School of Public Health

3 University of Minnesota

Community?

biopython, bioperl, biojava...

bioconfig?

Initial Applications

http://github.com/bioconfig/XXXXX

LWR is a tool to stage and run Galaxy jobs on remote servers.

https://lwr.readthedocs.org/

Hope to get this tightly integrated into CloudMan instances by default, potentially a path forward

for cloud bursting Galaxy instances.

Puppet module for configuring LWR

has been integrated into CloudBioLinux.

https://github.com/bioconfig/puppet-lwr

Clearing house for high quality

interoperable modules for use with

CloudBioLinux, Globus Provision, or institutional repositories.

- LWR

- Globus

- BioCloudCentral

The Globus Toolkit provides utilities for federated data transfer, identity management, etc...

https://github.com/bioconfig/chef-globus

Fork of the Globus Provision Chef recipes.

Instructions for using GridFTP to transfer data into CBL instance created with Gloubs.

http://bit.ly/cbl-gridftp

Django application allowing users to easily launch CloudBioLinux and CloudMan instances

https://github.com/bioconfig/puppet-biocloudcentral

Powers https://biocloudcentral.msi.umn.edu

allowing end users to easily launch Galaxy-P

instances on Amazon.

...we can do better

with Puppet and Chef!

CloudBioLinux Extensions

High Level

Fabric is a low-level procedural library. Chef & Puppet are DSLs with higher level constructs for services, dependencies, packages, etc...

Extended CBL to allow use of Puppet modules and Chef cookbooks.

Composable

Built-in easy templating (great for config files).

Puppet/Chef remotely installed as needed, packages are bundled up, shipped to remote server, and applied to server,

Applications broken down into packages

that can be easily shared.

Integrates with existing CBL structure for 'properties' and 'packages'.

Huge wealth of existing best practice configurations exist.

Apache, Firewalls, etc...

Can set Puppet and Chef properties via Fabric

Testable

Can define what modules/cookbooks configured via new YAML package types.

Great unit testing frameworks available.

CloudBioLInux a Start but...

Fabric is library used by CBL to remotely

run install commands.

Fabric is great at recreating identical

deployments on multiple machines.

The Problem: Different institutions/teams want to build different environments with applications configured differently.

Fabic is NOT a configuration management tool.

bit.ly/prodcloudman-slides

Core Idea

Configuring complex applications is hard!

Building on open source frameworks can

simplify this task.

CloudBioLinux (& CloudMan) is an example.

Packages (YAML)

bio_nextgen:

- bio-linux-fastqc

- fastx-toolkit

- maq

- plink

bio_proteomics:

- xsltproc

- libxml-sax-expat-perl

- libgd2-xpm-dev

- libbz2-dev

Can be OS packages,

language libraries, or

custom installs

Fabric (Python) Methods

@_if_not_installed("bfast")

def install_bfast(env):

"""BFAST: Blat-like Fast Accurate Search Tool.

http://sourceforge.net/apps/mediawiki/bfast/index.php?title=Main_Page

"""

default_version = "0.7.0a"

version = env.get("tool_version", default_version)

major_version_regex = "\d+\.\d+\.\d+"

major_version = re.search(major_version_regex, version).group(0)

url = "http://downloads.sourceforge.net/project/bfast/bfast/%s/bfast-%s.tar.gz"\

% (major_version, version)

_get_install(url, env, _configure_make)

Background

Learn more about creating dynamic, engaging presentations with Prezi