Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
Mixed Files
1. When the reads are assembled , only about 20% of the reads were used.
2. When the reads from cotton are assembled, about 87% of the reads were used.
3. When those two files were mixed and assembled, the result was the same.
(20% of the dragonfly reads were used,
80% of the cotton reads were used)
K-mer analysis.
1. Our program identified that more than 100 of the 10-mer in the beginning of each read were repeated more than 1000 times.
2. However, when 10-mer analysis on the whole part of reads was done, the significant point that could indicate the abnormal number of repeat was not found.
3. When the top 20, 100, 200, 500, and 1000 repeated sequences were filtered, no significant improvement on assembly could achieved.
Not enough coverage (Mathmatical analysis)
1. N = ln(1 - P) / ln(1 - a/b)
where N = number of reads,
P = probability of the library containing desired piece of DNA.
a = average size of read
b = size of genome
2. We got P = 0.897
Not enough coverage (Analysis using simulated data)
1. Using genome of E.Coli, we created simulated 454 reads with MetaSim.
2. By chaning the number of reads, we ran the assembler multiple times to see how the percentage of reads used changes.
Not enough coverage (Compare with simulated data)
1. We divided the input size of our dragonfly reads, and ran the assembler three times.
2. For each run, the input reads doubles.
3. Compare the data with simulated data.
BLAST against Drosophila Melanogaster
1. Created local database using sequence data downloaded from Genbank.
2. No alignment that has a e-value smaller than 0.001 is found.
Running gene finding program
1. We used pipeline called MAKER.
2. No candidate for gene could be found.
When you have short reads from sequencer that does not produce good contigs or scaffolds.
1. Run k-mer analysis to see if the reads contain the barcodes or not.
2. Change the input size and check how it affects the percentage of reads mapped and compare that data with simulated data.
What could have been done instead of WGS?
1. If creating phylogeny is the sole purpose, we could have used Genomic Reduction to have better coverage.
2. If genes were to compared, we could have just used EST sequencing.
1. Introduction to the assembly process.
2. Introduce troubles we have faced with the assembly attempt of the dragonfly genome.
3. What you can do when bad input data is suspected.
Reads containing tag sequences.
1. The assembler gave warning about high number of repeats.
2. We suspected that the tag sequences are remained in our sff files.
3. We ran k-mer analysis.
Not enough coverage.
1. Mathmatical analysis
2. Compare the coverage with a different experiment that shows normal result in the assembly.
3. Analysis using simulated data.
Experiment with mixed organisms.
1. We suspected that the sequenced reads were contaminated or the files contain reads from different organisms.
2. We assembled reads with a different organism(cotton).
3. No significant change in percentage of reads used.
1. Short contigs due to lack of paired-end reads.
2. Low percentage of reads were used to build contigs by Newbler assembler.
Next-generation sequencing changed methods for sovling different problems thanks to its lower cost.
We attempted to sequence the dragonfly genome using WGS technique to provide useful information for phylogeny with Crustacea.
1. Data came from sequencing center in BYU.
2. 454 sequencer was used to produce short reads.
3. Drosophila genome obtained from Berkely Drosophia Genome Project was used as a reference.
1. Sequencing after genomic reduction.
2. Sequencing ESTs only.
1. Do we have enough reads for the assembler to assemble.
2. Is our data contaminated? Do we have sequences from different organism?
3. Do the reads in sff files still contain tag sequences?
Not enough coverage (Analysis by comparison)