Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Hauptseminar

No description
by

Martin Steinegger

on 27 September 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Hauptseminar

How it works
MapReduce
Discussion
Idea
Use hadoop for MapReduce
Easy Deployment
Fault Tolerance
Proven Tools
Trivial horizontal Scaling
Problem
Results
Solution
Results
1.) AWS Image
current architecture
cloud architecture
Infrastructure (IaaS)
Platform (PaaS)
Application (SaaS)
Elastic MapReduce
Elastic Compute Cloud
S3 Storage
Elastic Block Store
Pricing
cloud computing is a ‘‘model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction’’

National Institute of Standards and Technology (NIST)
Definition:
PredictProtein
BLAST NCBI
2010
2011
Local cluster cost
Spot Market
be patient and
save money
"Translational bioinformatics in the cloud: an affordable alternative" Dudley et al. Genome Medicine 2010,
spot market
Magic cloud
1000 Genomes Project
Ensembl
Unigene
GenBank
Public Data Sets
MapReduce
map( twice, [2,3,5] ) result=[4,6,10]
reduce(plus, 0, [2,3,5]) result=10
Functional programming
"MapReduce: Simplified Data Processing on Large Clusters"
Jeffrey Dean and Sanjay Ghemawat
Courtesy of Eric Lander, Broad Institute
http://upload.wikimedia.org/wikipedia/commons/e/e0/Google%E2%80%99s_First_Production_Server.jpg
Resources and costs for microbial sequence analysis evaluated using virtual
machines and cloud computing, Angiuoli et al, Pone 2011
"Translational bioinformatics in the cloud: an affordable alternative" Dudley et al. Genome Medicine 2010,
The case for cloud computing in genome informatics, Stein at al Genome Biology 2010
Predicted runtimes using varying bid prices
CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications, Matsunaga at al International Conference on eScience 2008
CloudBLAST outperforms mpiBLAST
Conclusion
cloud computing is best positioned to address bioinformatics data challenge
easy to use
get constantly cheaper
optimal platform for crowdsourcing
-> public over the Internet
-> shared infrastructure between
organsiations.
-> classical cluster with the cloud
-> private infrastructure that is managed
with cloud technologie
Public cloud
Community cloud
Hybrid cloud
Private cloud
The case for cloud computing in genome informatics, Stein at al Genome Biology 2010
personal example
Starcluster
1.
2.
3.
5.
6.
7.
9.
8.
10.
12.
13.
14.
14.
15.
16.
17.
18.
19.
20.
21.
22.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
Time effort to build a cluster in the cloud -> 8 minutes


Time to build a local cluster -> 1 month
"StarCluster Brings HPC to the Amazon Cloud", Riley at al, High Performance Computing (HPC) in the Cloud 2010
2.) PredictProtein in the Cloud
Full transcript