Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Surfing into NGS sequencing data
Transcript of Surfing into NGS sequencing data
The bioinformatic point of view Heuristic filters 3 step process:
Integrate annotation Primary analysis Downstream Analysis From raw data to causal variants Exome/Target sequencing projects Biological Advantages From 3 billion to less than 60 million bases Bioinformatic analysis How to analyze and speed-up the whole process and interpret/communicate the NGS data? Millions of reads Many samples Complex disease Data interpretation Sequencing only (?) the coding part of the genome > 20k genes in one shot Select candidate genes and build a molecular platform for a given disease (Target)
Different purpose, same technology:
Family (Trios) Performance metrics Economic/Practical Advantages Easy sample preparation, (relatively) low sequencing cost Different commercial kit available for Exome, Kinome... 1 Illumina lane, 1 sample with high depth sequencing (30-60x, Exome) Scalable technology adaptable for clinical application (Custom target design) Sampling error GC content Bad alignment (SW, repetitive region..) Minimum depth
Filtering out common variant
dbSNP, 1000g, ESP
Pathogenicity prediction filters (SIFT, Polyphen2, Mutation Taster)
Variant classification (Missense, Stop, Splice site...)
Cross reference database integration (HGMD, COSMIC, OMIM) 50/50 red and black Large sampling reflects
50/50 ratio Small sampling...does not High-Quality Mapping SNV Annotation
Prioritization Data visualization Candidate variants Project design prospective... Looking at the future Exome Target Causal Variants Few selected sample Many samples (>50) Many genes (20k) Few selected genes BWA SAMTools BEDTools Quality filters GBrowse IGV MySQL SIFT PolyPhen2 GERP++ ESP dbSNP 1000g Future... Time cost The new bottleneck... 40-60% novel variants
are unknown NGS (bioinformatic) "dictionary" Read: Coverage: Depth: Mismatch: Insertion of Deletion (INDEL): Take home messages : String of nucleotide coming from ultra-deep sequencers
-> Millions of reads in a single NGS experiment In percentage, how many bases of our target (exome) are "covered" with at least 1 read Exome: All the exons of our genome -> 1% of the total genome (30-50 Mb) How many reads cover that particular gene/target
(es. 30x exome mean depth) Single nucleotide difference between raeds and reference (human genome) Insertion or deletion of nucleotides between reads and reference Lots of data does not mean lots of results
Bioinformatics is not magic:
"Good" samples > Good analysis
"Poor" samples > Poor analysis
A good bioinformatician can help you to interpret the data, not only a technician Allele A = RED bean
Allele B = BLACK bean High-depth sequencing Low depth sequencing