Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Hadoop as the future of cloud based parallel computing: Bioinformatics

No description

Juan Böehler

on 4 December 2012

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Hadoop as the future of cloud based parallel computing: Bioinformatics

Hadoop Cluster
Bioinformatics How Hadoop Works Master Nodes: NameNode Basic scheme based in three categories Master Nodes: JobTracker Safe Mode Status What is Hadoop? Cloud based platform
Distributed filesystem
Parallel Computing Load Balancer Map Function Reduction Function Map Function:
Data Processing Reduction Function:
Data Harvesting Important Facts Posible Implementations? Local Nodes, own Infrastructure. Cloud based Platforms (PaaS) In 24hs we could identify all SNPs on the human genome with only 10 local nodes. In only 3hs we can process DNA sequencing or resequencing of the entire human genome with 320 cores or 40 nodes on AWS Cloud Based Platform with a little less than U$S100. Thanks Juan Carlos Behler
Juan Pablo Behler Bioinformatic Applications DNA sequencing & short read alignment. Analysis & Identification of SNP's Alineamiento de Corta Secuencia Permite comparar secuencias de lectura con el genoma en estudio Para mayor eficiencia se desarrollaron dos algoritmos computacionales:
Alineamiento Global y Alineamiento Local Algoritmo de Burrows Permite comprimir el archivo de salida del genoma emparejado. Se basa en la permutacion base por base, el agrupamiento de bases iguales y la optimizacion o delecion de las mismas. Analisis de SNP's Consta de una simple variacion de nucleotidos en la secuencia de DNA Un metodo de deteccion de SNP's es el polimorfismo en la longitud de fragmentos de restriccion (SNP-RFLP) Parallel Applications Crossbow A cloud based platform that combines two mayor apps:
Bowtie: An ultrafast short read aligner, it aligns to the human genome at a rate of 25-35 million bp reads per hour.
SOAP-SNP: A resequencing utility that identifies polimorphisms based on the comparison of the consensus sequence and the reference with 99% accuracy. So that's All ? Since MapReduce invention driven by Google in 2004 parallel application is growing faster and faster, being used on Physics particle tracking, Architectural simulation, Petabyte data processing and Bioinformatics applications.

Many corporations are using this system for a wide rande of uses , enterprises like: Amazon, AOL, FOX, IBM and Google among others. HDFS Architecture Manage large data sets by distributed replication of blocks.
Full transcript