Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


881 Presentation

Bio Informatics Algorithms

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of 881 Presentation

Bio Informatics Algorithms Predicting Protein‐Protein Interaction
using One‐class Classification Prudvi Raj & Vijaya Ragavan Agenda References
Protein – Protein Interaction
Data Source
Feature Encoding
Support Vector Machines
Questions Introduction Proteins are large molecules built from small units known as amino acids.
Proteins form an important part in foods like milk, eggs and meat.
Exercise breaks down muscles, allowing new protein to be used to rebuild them larger and stronger.
Protein-Protein Interaction Interactions refers to the association of proteins and long range interactions through electrolyte.
These interactions form the basis for several studies.
Ex. : Signals from the exterior of a cell are mediated to the inside of that cell by Protein-Protein Interactions.
Protein-Protein Interaction Several methods to compare the interactions:
Biochemical methods : Bimolecular Fluorescence Complementation, Affinity electrophoresis, Tandem Affinity Purification etc.
Biophysical and theoretical methods: Dual Polarization Interferometry, Statics Light Scattering, Dynamic Light Scattering etc.
Data Source The Protein Domain data was collected from Protein Families Database (Pfam).
Proteins contain domains.
Eg. : YMR056C 2719
Protein Protein interactions are also available.
Eg. : YMR056C YBR217W
Manual count of domain interactions is found.
Screen Shot : Data Architecture Screen Shot : Code Exec Screen Shot : Output Feature Encoding Each protein pair is represented by a protein domain feature vector.
Each domain feature has a possible value of 0, 1 or 2 in feature vector.
The value is 0 if none of the proteins in the pair contains the domain.
The value is 1 if one protein of the protein
pair contains the domain.
The value is 2 if both proteins of the protein pair contain the domain.
Support Vector Machines Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression.
Tiger-Elephant example.
What happens when a new animal is found ?
500 samples 300 samples LIB SVM Tool LIBSVM is an integrated software for support vector classification, regression distribution estimation (one-class SVM).
We used the SVM Classifier that is build from the LIBSVM tool for our project.
Tough Tasks involved :
a. Parameter Selection
b. Cross Validation to improve the results.
LIB SVM Tool Significance of parameters
-s svm-type
-t kernel-type (linear , polynomial, radial
and sigmoid)
-g gamma value
-c cost value
LIB SVM Tool Significance of parameters
-s svm-type
-t kernel-type (linear , polynomial, radial
and sigmoid)
-g gamma value
-c cost value
Cross Validation We performed two-fold, three-fold, five-fold and ten-fold cross validations.
How the validation is performed ?
Example: In a three fold cross validation, 2/3rd of the data is used for training and 1/3rd is used for testing.
The results are shown in a table showing the accuracy and F-Score results. Results Two Fold Cross Validation Three Fold Cross Validation Results Five Fold Cross Validation Ten Fold Cross Validation Detailed Iteration for Three Fold So, the Accuracy would be
average of these 3 iterations.
Change in F-Measure when Different Kernel Functions are used Since, Radial Kernel gave best results for us, we used it for our SVM Classifier.
Effect Of Negative Data on Results Challenges Faced: Generating negative data is difficult task as there will be many non-interacting protein pairs. It is not possible to list all of them.
So we generated a limited set of negative data and used that in our project. We observed the increase in accuracy and F-Measure as a result of this. Conclusion and Future Work We have successfully presented the results for the protein-protein interactions using one class classification.
We recommend everyone to use libsvm as it a wonderful tool for anyone interested in machine learning.
Our assumption on non-interacting proteins is not accurate. We will employ a better strategy for identifying the negative samples. (Replacing misclassified protein pairs)
Questions ? References Predicting yeast synthetic lethal genetic interactions using protein domains - Bo Li, Feng Luo - Clemson University.
Single Class Classification with mapping convergence - H. Yu.
Thank you, Dr. Luo.
Full transcript