Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Classification in EDM & LA

Presentation about Classification methods and application in Educational Data Mining and Learning Analytics

Hesham Omran

on 26 January 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Classification in EDM & LA

All About Classification
Popular tools & Platforms in the industry
Data Mining tool
Application of Classification
Discussion & Future Insights
in Educational Data Mining & Learning Analytic
Classification Methods
Comparison of Classification Methods
Define briefly EDM & LA.

Define Classification, Different methods & how they differ ?

Popular Classification tools in the industry.

Application of Classification in EDM & LA
Steps :
Choosing a Classifier
Date Preprocessing
Separation of training & testing Data
Classifier Accuracy Evaluation
Probabilistic or Discriminative Classifier?
Linear or Non-Linear Class Boundaries?
Data Cleaning
Feature Extraction
Filling missing values
try correcting error in values
Feature Selection
The goodness of feature extraction and selection cannot be evaluated before the classifier is learned and tested. If the number of attributes is large, all possibilities cannot be tested.
new attributes are produced from combining and transforming the original Data
* Analyzing the dependencies between the class attribute & explanatory attributes
Decision Tree
Bayesian classifier
Neural Network
K-nearest Neighbor
Linear Regression
Support Vector Machine
* Use a learning algorithm and select the most important attributes in the resulting classifier.
To Measure it ...
Receiver Operating Characteristics (ROC)
value and
can be used for assessing the goodness of a predictor
m-folds validation
if the data set is already small, it is not advisable to reduce the training set any more.
Data used for Training never used for testing and vise verse
Classification Accuracy
Over Fitting
Data Size Vs Model
Under Fitting
Data Size Vs Model
E-Learning and Assessment tools
Supervised Vs. Unsupervised
Emerging of the Bayesian Networks
Classification and Clustering Converging
Tools Limitation & a new workbench
Removing outliers
And More ...
Cognitive Tutor , Moodle, ITS ... etc
Needed Definitions of EDM & LA
Classification "Supervised" you have a set of predefined classes and want to know which class a new object belongs to.

Clustering "Unsupervised" tries to group a set of objects and find whether there is some relationship between the objects.
Publication Categorization
Customization and Adaptation of Behavior
Method Improvement and optimization
Providing Monitoring and Feedback for instructors
Predicting student's performance
Student Modeling
Bayesian networks are used for reasoning and under uncertainty (Pearl, 1988).
Bayesian networks are also known as causal networks or belief networks.
Probability theory and graph theory form their basis: random variables are nodes and conditional dependencies are edges in a directed acyclic graph.
Edges typically point from cause to effect
What are Bayesian Networks ?
Why Bayesian Networks ?
Adoption By Wider educational research and practice communities
developing the labels that support supervised learning
distilling relevant and appropriate data features
setting up appropriate cross-validation
configuration and building algorithms for classification
Bottle Neck
Will it be solved in 2012 ?!
it's a Start ...
Version 1.0 of the EDM Workbench
Log Import
Feature Distillation
Clip Generation
Data Sampling
Expected Future Work
In the coming months
label previously collected educational log data with behavior categories of interest.
Allows learning scientists to
automatically distill additional information from log files for use in machine learning.
collaborate with others in labeling data.
(1) The automatically distilled features are hard-coded; future releases will make it easier to alter the feature list.

(2) The process of amending XML to create new features will be made more user-friendly.

(3) The coders cannot change the way in which the text replays are displayed.

(4) Users can currently only sample data and assemble it into clips in a limited number of fashions; we intend to implement more sophisticated sampling and clip-creation strategies
Publication Review
Resource Bin
EDM proceedings 2010
EDM Proceedings 2011
EDM Proceedings 2012
Analysis of Publication in Terms of Charts
Bayesian Networks Vs Decision Trees
Although both could be similar in accuracy
still BN has more advantage with small sized
samples ... Handling incomplete Data ?!
Hesham Omran
Fatemeh Salehian
Supervised by
Dr. Mohamed Amine Chatti
Presented By
Discussion & insights About the Future
Thank You
for your attention
Full transcript