Thank you for your attention!
Acknowledgments
- My parents for driving me to MIT and always being supportive
- Dr. Gil Alterovitz for his guidance in my work
- Dr. Jeremy Warner for being a great mentor, and helping me out when I had difficulties
- Kshitij Marwah for helping me initially in my project.
- Dr. Tanya Khovanova for being a great mentor and giving me helpful suggestions.
Conclusion
For Example, Looking at the Survival Rates over time by Cancer stage can give extra information in determining the Patients' Cancer stage.
Further Work
-Make the Cancer Data Trees more accurate
-Use the RNA sequencing data in Data Tree work better
-Use Other Useful Information to apply in Data Tree (days until death, etc)
-Make Data Tree functions for more cancer types
Using the definitions, the Cancer stage can be computed fairly accurately.
The other five methods do not work, proving that the cancer data is not random.
Introduction
- Cancer patients, by their symptoms, have a cancer stage
- The clinical cancer stage is determined by the patients' T, N, and M Stages
- T: Tumor
- N: Lymphnodes
- M: Metastasis
Motivation
Big Picture
- Anatomic staging
- Surgery is required to find stages
- Expensive, harmful
- A new taxonomy model should be developed
My Research Project
- Some patients' cancer stages are missing
- Develop a function to accurately retrieve missing cancer stage information
- Focused on Breast Cancer first
Source:
Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease
http://books.google.com/books?id=eK-Zca8U1hwC&num=11
TCGA Database
Integrated Gene Expression Probabilistic Models for Cancer Staging
TCGA is a Cancer Database founded in 2006, to fight the "War on Cancer"
Database with over 200 Cancer Types, millions of patients.
Andrew Xia
Jeremy Warner, Gil Alterovitz, Mentors
Methods
- Computed stages against TCGA-given stages
- Naive replacement of unknown stage by most common stage (by data set)
- Naive replacement of unknown stage by most common stage (by external source)
- Equally weighted stage probability to assigned to unknown stage
- Replacement by weighted probabilities (determined from data set)
- Replacement by weighted probabilities (determined from external source)
Second Annual MIT PRIMES Conference
May 20th, 2012
3. Naive, most common stage by external source
Stage I most common
Source: According to NCDB 2001-2002 analysis (AJCC 7th edition page 438)
Accuracy: 18.0%
Kappa: -0.4% unweighted, -0.1% weighted
Data/Results
2: Naive, most common stage within Data
6. Intelligent, weighted probabilities by external source
Stage II most common
Accuracy: 58.6%
Kappa: 0% for weighted and unweighted
Source: According to NCDB 2001-2002 analysis (AJCC 7th edition page 438)
Stage I: 47%
Stage II: 33.5%
Stage III: 14%
Stage IV: 5.5%
Accuracy: 31.0%
Kappa: 1.7% weighted, -2.2% unweighted
1. Computed Stages
5. Intelligent, weighted probabilities by data set
91.7% Accuracy
Kappa: 86.8% unweighted, 78.1% unweighted
Stage I: 18.0%
Stage II: 58.6%
Stage III: 21.4%
Stage IV: 2.0%
Accuracy: 42.4%
Kappa: 1.0% unweighted, 1.4% unweighted
4. Evenly Weighted Stages
Stage I: 25%
Stage II: 25%
Stage III: 25%
Stage IV: 25%
Accuracy: 25.1%
Kappa: 0% weighted and unweighted
By Various Methods
RNA Sequencing
- Map sequences onto a reference
- Calculate number of short reads mapped on each gene
- Find which genes are expressed differently through different cancer stages
- Use correlation found to better predict patient's cancer stage.