Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Deploying Production Galaxy Instances on OpenStack with CloudBioLinux and CloudMan

No description
by

John Chilton

on 8 October 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Deploying Production Galaxy Instances on OpenStack with CloudBioLinux and CloudMan

John Chilton
Minnesota Supercomputing Institute

Deploying Production Galaxy
Instances with CloudBioLinux
and CloudMan

Private Cloud Computing is Coming
Businesses large and small are flocking to Amazon et. al. because they are cheap.
Storage Costs
High Utilization
Data Access Polices
Reasons research institutions might not immediately switch to Amazon
Enter OpenStack
Deploy open source cloud infrastructure on your own hardware.
Galaxy
OpenStack
Python
Open Source
Vibrant Community
Admin
Developer
DB Server
File Server
App Server
Repository
Galaxy is not the code in the repository, it is the whole stack on the application server.
Cloud Infrastructure
(OpenStack)
DB Server
File Server
Application VMs (Web and Compute)
Repository
Common Scenario
Two people or teams need
to be intimately familiar with
Galaxy and must frequently
communicate.
Opportunity to reduce workload by building Galaxy using common community template.
CloudBioLinux
CloudMan
"A fully automated infrastructure installs software and data, with packages specified in simple configuration files."
https://github.com/chapmanb/cloudbiolinux
Do not wasting effort manually installing software, automate it.
"CloudMan is a cloud manager that orchestrates all of the steps required to provision a complete compute cluster environment on a cloud infrastructure; subsequently, it allows one to manage the cluster, all through a web browser. "
However...
Saving money however is not the only reason to employ cloud computing, as I will argue for the specific case of Galaxy - cloud computing can also help manage complexity.
Why?
What?
How?
What is OpenStack and private cloud computing?
Why deploy Galaxy in a (private) cloud?
How to build production Galaxy instances for the cloud.
User-Data
A block of YAML text used to
configure VM at launch time.
http://wiki.galaxyproject.org/CloudMan/UserData
Splitting Galaxy into Multiple Processes
CloudMan uses to configure virtual machine - Galaxy, nginx, NFS, arbitrary other files.
configure_multiple_galaxy_processes: True
web_thread_count: 2
handler_thread_count: 2
galaxy_conf_dir: /mnt/galaxyTools/galaxy-central/conf.d
user-data
galaxy_conf_dir
https://bitbucket.org/galaxy/galaxy-central/pull-request/44/
Very useful in non-cloud contexts as well. Allows universe_wsgi.ini to be split into a directory of
files (ala /etc/sudoers.d or /etc/apache/conf.d).
Benefits
Allow some properties set in repository others in runtime environment.
Easier for configuration management tools such as Puppet or Chef to work with.
Separate development/production properties and/or developer/admin properties.
External Authentication
galaxy_conf_dir: /mnt/galaxyTools/galaxy-central/conf.d
galaxy_universe_use_remote_user: True
galaxy_universe_remote_user_maildomain:<domain_name>
galaxy_universe_remote_user_logout_href: \
https://logout@<galaxy_url>/
galaxy_universe_require_login: True
User-Data
Galaxy Reports Application
Powerful tool provides a wealth of valuable data on every job that Galaxy has run as well as disk usage accounting, etc....
Implemented CloudMan "service" for this...
user-data
services:
- name: Galaxy
- name: GalaxyReports
- name: Postgres
SSL
conf_files:
- path: /usr/nginx/conf/key
content: <base64 encoding of key>
- path: /usr/nginx/conf/cert
content: <base64 encoding of cert>
user-data
Configure arbitrary config files on VM
server {
listen 80;
server_name galaxyp.msi.umn.edu;
rewrite ^ https://$server_name$request_uri? permanent;
}

server {
listen 443 default_server ssl;
ssl_certificate /usr/nginx/conf/cert;
ssl_certificate_key /usr/nginx/conf/key;
....
}
nginx.conf
(Infrastructure Engineer?)
(Application Engineer?)
`
Mounting External File Systems
master_prestart_commands:
- "mkdir -p /mnt/galaxyData"
- "mount -t nfs4 -o sec=sys spider.msi.umn.edu:/export/galaxyp /mnt/galaxyData/"
- "mkdir -p /project/db"
- "mount -t nfs4 -o ro buzzard.msi.umn.edu:/zprod2/misc/db /project/db/"
worker_prestart_commands:
- "mkdir -p /mnt/galaxyData"
- "mount -t nfs4 -o sec=sys spider.msi.umn.edu:/export/galaxyp /mnt/galaxyData/"
- "mkdir -p /project/db"
- "mount -t nfs4 -o ro buzzard.msi.umn.edu:/zprod2/misc/db /project/db/"
user-data
Run arbitrary commands on master and worker nodes at startup...
Accessing External Compute Resources
... via the LWR
https://lwr.readthedocs.org/
https://bitbucket.org/jmchilton/lwr/
Run normal Galaxy jobs from normal tools on a remote server without requiring shared file systems.
Run jobs on *nix or Windows.
Ephemeral cloud VM submitting jobs to a dedicated compute resources... a little backwards but it works.
galaxy_tool_runner_proteinpilot: "lwr://https://<secretkey>@remotehost:8913/"
user-data
Any tool id
Any job runner URL, not just LWR urls
A Different Paradigm
location / {
...

location /admin/jobs {
proxy_pass http://localhost:8079;
}
}
nginx.conf
Hack to fix admin panel.
...i.e. enable Load Balancing
Creating Cloud Images
Launch vanilla Ubuntu instance
Install CloudMan via CloudBioLinux
Create image from configured instance
Launch new instance from this image
CloudMan runs, configures Galaxy
Production Instances
Load balancing
External Authentication
SSL
Advanced Reporting
Utilize to External Resources
Databases
File Servers
Compute
http://bit.ly/prodcloudman
Full details @ ReadTheDocs
Configure CloudBioLinux
Configure with CloudMan
nginx_enable_module_ldap = true
fabricrc
CloudBioLinux must compile nginx with LDAP
Setup nginx.conf
Specify LDAP connection
Modify root location to require authorization
Create /api location where auth is optional.
http://bit.ly/prodcloudman-auth
Thanks...
Jim Johnson; Pratik Jagtap, Ph.D.; Daniel Debertin; Kevin Silverstein; Ph.D.; Anne-Françoise Lamblin, Ph.D.; Benjamin Lynch, Ph.D.
Minnesota Supercomputing Institute Galaxy and Cloud Teams
Principle Investigator
Timothy Griffin, Ph.D.
Pull Request Acceptors
Funding
This work was funded by Minnesota Partnership for Biotechnology and Medical Genomics and the National Science Foundation.
Enis Afgan, Ph.D.
Brad Chapman, Ph.D.
Nate Coraor and Dannon Baker
https://bitbucket.org/galaxy/cloudman
https://github.com/chapmanb/cloudbiolinux
https://bitbucket.org/galaxy/galaxy-central
Reuse
Replace e-mail with API calls,
documentation with scripts.
Redistribute
There is no GalaxyAdmin walk through - each
large Galaxy installation represents numerous innovations.
Galaxy App
App Runtime Environment
System Environment
API
Display Application
Tool Wrappers
Workflow Engine
User Interface
System Libraries
Bioinformatics Applications
Cron Jobs
Init Scripts
Job Scheduler
xvfb
virtualenv
Proxy Webserver
Poorly Defined Boundaries
Log Management
Bio Data
`
(at least)
Galaxy App
App Runtime Environment
System Environment
&
Big And Small
Building more of the stack on community templates, provides a way to share more innovations back with community.
The rest of this presentation describes contributions I have made along these lines.
External Database Server
user-data
services:
- name: Galaxy
- name: GalaxyReports
galaxy_universe_database_connection: \
postgres://user:password@host:port/schema
no postgres
Future Work
Other Large CloudMan Deployments
NBIC
http://galaxy.nbic.nl/
http://www.nbic.nl/about-nbic/news-press/bioinformatics-news/detail/article/galaxy-on-cloud/
"production"-y innovations by others
loggly.com Integration
https://bitbucket.org/galaxy/cloudman/pull-request/23/added-optional-loggly-based-cloud-logging/diff
My Ongoing Work
CloudBioLinux Deployer
Proteomics, Proteomics, Proteomics...
CloudBioLinux + Chef
CloudMan+CloudBioLinux Alternatives
Globus Provision
http://www.globus.org/provision/
http://www.cse.buffalo.edu/faculty/tkosar/datacloud2012/papers/datacloud2012_paper_6.pdf
CloudBioLinux Deployer
https://github.com/jmchilton/cloudbiolinux/tree/deploy
CloudBioLinux + Galaxy without CloudMan (formerly galaxy-vm-launcher)
Configured with chef recipes, Globus online integration.
Related Work
Finer grain grid engine configuration
Monitoring
libcloud based deployment scripts
How does one do nagios in the Cloud?
novnc for web accessible console access
Deploy Galaxy from git
https://github.com/jmchilton/cloudbiolinux/tree/proteomics
https://github.com/jmchilton/proteomics-wine-env
https://bitbucket.org/galaxyp/galaxyp-central
https://github.com/jmchilton/cloudbiolinux/tree/chef
https://github.com/jmchilton/cloudbiolinux/tree/deploy
CloudBioLinux + Galaxy without CloudMan (formerly galaxy-vm-launcher)
Ability to package tool shed installs
Good for cloud & traditional staging/production environments.
Full transcript