Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

bioinformatics pipelines for NGS dats analysis

No description
by

ruy jauregui

on 13 March 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of bioinformatics pipelines for NGS dats analysis

The 16s rRNA Part of the ribosome Essential to all living organisms Has variable and conserved regions A sequence alignment Metagenomics We host thousands of bacterial species Our community of interest Description of NGS technology. Bacterial community characterization at family level. Bioinformatics pipelines for the analysis of NGS data:
Metagenomes, trascriptomes and genome assemblies. Results Pygmies a-adults / c-children Illumina Genome Analyser VIII. DNA Library Sequencing VII. Sample Preparation for Illumina Sequencing Samples are prepared for Illumina using New England Biolabs NEBNext® DNA Sample Prep Kit Set 1, Custom Adaptor, Index, PCR and Sequencing Oligonucleotides Ligate Adaptors PCR Enrich Fragments dA Tail Fragment Ends Repair FragmentEnds Fragment cDNA Remove poly A tails
(BpmI digestion) PAP T4 DNA polymerase E. coli
DNA polymerase
DNA ligase Double-stranded cDNA Round 2
Second-Strand
cDNA synthesis aRNA (cRNA) RNase H In Vitro
Transcription Round 1
Second-Strand
cDNA synthesis Round 2
First-Strand
cDNA synthesis Round 1
First-Strand
cDNA synthesis polyadenylated mRNA AAAA 3' 5' 3' 5' 4. mRNA amplification and
ds-cDNA synthesis 5´-Phosphate-Dependent Exonuclease digests RNA having a 5´ monophosphate, i.e. 16S and 23S rRNA 3. Enrich bacterial mRNA Bead-Beating and clean-up with RNA columns 2. Extract and purify
total RNA 1. Strains and conditions S. aureus SH1000
S. aureus 6850
Log growth/stationary phase Transcriptomics (first trial) Total bacterial mRNA
~3.5 – 4.2 million paired-end reads for each strain/condition.

~ 40-80% of reads assigned to 16S/23S rDNA were removed

All reads with ambiguous bases and homo-polymers > 8 nt. were removed. (1.5% - 20%, avg. 7.5%)

~0.6 – 1.8 million paired-end mRNA reads for each strain/condition

Read copies were “collapsed” into single representatives (unique reads).

Unique reads were blasted against individual genes from strains SH1000 and 6850. Some Numbers… Global transcriptomic profile nMDS plot ordinating the global transcriptomic profile of two Staphylococcus aureus strains (SH1000 and 6850) during both mid-log and stationary growth phases. An unamplified sample of the mRNA from SH1000 during mid-log phase was included to compare with the amplified sample (75% similar). Normalization RPKM = Reads Per Kilobase per Million mapped (Mortazavi et al., 2008) Expression of some control genes S. aureus SH1000 (8325)
S. aureus 6850 Two component regulatory systems Thank you!
Transcriptome analysis

Genome sequencing

Amplicon sequencing Genome Analyzer:
> 20 Million (2x) reads per lane
1 Lane – up to 12 samples (indices)
1 Index - 50 subsamples (barcodes)
Subsample: 40.000 reads for 4 €
1 read – up to 150 nt

MiSeq:
>6 million (2x) reads, single lane.
1 read - up to 250 nt.

HiSeq:
300 million - 6 billion reads
1 read - up to 150 nt. (Metzker, 2010) Illumina – paired end sequencing High throughput methods to assess anterior nare community structures Deep sequencing 1st PCR Illumina primer Illumina index primer 2nd PCR PCR Index Primer Multiplexing PCR Primer Pool equal amounts of PCR products

Multiplexing illumina paired-end sequencing Illu FBCx Primer Illu RevAdap Primer 16S rDNA
V1-region Adaptor Primer Primer Linker Barcode Adaptor Deep sequencing 3 Million reads Quality filter 220 Representative sequences 19,000 Representative reads 36,000 Representative reads 2 Million reads Filtering Pre-clustering Filter:
Keep representative reads if:
Present in at least one sample at an abundance >1%
Present in >2% of samples at an abundance >0.1 % Pre-cluster unique rare reads with 2 mismatches Clustering at 98% similarity Trim low quality raw data Remove:
- primers and barcodes
reads with ambiguous bases and homo-polymers > 8 nt
Trim forward end to 80 bp
Collapse reads into single representatives – unique sequences 97 Samples - a semi-nomadic Babongo Pygmy tribe – Gabon (GAB) 92 Samples - Lower Saxony and North Rhine-Westphalia (EUR) to assess anterior nare community structures Deep sequencing Adults Children EDENA assembled 159 119 Minimus 2 First and Second Illumina run.
9M reads -- 3 M reads, 145 nt. 376 320 VELVET assembled 1292 2639 281 344 P. veronii 1YB2 METHODS The genome of novel benzene degrading Pseudomonas veronii strains Assemble strategy First Illumina run
- CLC Workbench (Velvet) assembled -
ca. 3 million reads, 65nt. P. veronii 1YdBTEX2 Nearly 80% of the metabolic functions of a given strain were also observed in the others

There is a larger share of metabolic functions between P. veronii 1YB2 and 1YdBTEX compared to P. fluorescens SBW

102 functions are unique to 1YdBTEX2 and 228 are unique to 1YB2 “Rast”-based comparison of metabolic functions Highest similarity was observed with P. fluorescens SBW25 RESULTS The genome of novel benzene degrading Pseudomonas veronii strains Venn diagram dlt operon They were predominant in that environment Soil samples Isolates The genome of novel benzene degrading Pseudomonas veronii strains Why we sequenced these isolates? Benzene degrader Benzene/Toluene degrader Pseudomonas veronii 1YB2 Pseudomonas veronii 1YdBTEX2 Witzig et al., 2006, Appl. Environ. Microbiol 72:3504-3514 Industrial area, highly contaminated with benzene INTRODUCTION (Naphthalene cluster) (Salicylate cluster) 4-oxalocrotonate decarboxylase 4-hydroxy-2-oxovalerate adolase 2-HMS hydrolase P. veronii 1YB2 Naphthalene dioxygenase beta and alpha subunits Naphthalene dihydrodiol dehydrogenase Salicylaldehyde dehydrogenase EXDO A dmpR type regulator EXDO A-2 Mercury resistance cluster Ferredoxin EXDO A 2-HMS dehydrogenase 2-oxo pentenoate hydratase Acetaldehyde dehydrogenase 4-oxalocrotonate tautomerase Salicylate hydroxylase P. veronii 1YdBTEX2 RESULTS The genome of novel benzene degrading Pseudomonas veronii strains Benzene and/or Toluene degradation pathway
EXDO A cluster (ring cleavage)
Full transcript