Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading content…
Transcript

Muito OBRIGADO!!

Meta Data View – Before Data Cleaning

961 rows

Attribute Information

6 Attributes in total (1 goal field, 1 non-predictive, 4 predictive attributes) 

Data Set Background

Summary

Source: Uci, Machine Learning Repository, Mammographic Mass Data Set:

http://archive.ics.uci.edu/ml/datasets/Mammographic+Mass

Mammography is the most effective method for breast cancer screening available today. But still, 70% unnecessary biopsies with benign outcomes.

Several computer-aided diagnosis (CAD) systems delivered. Help decide:

  • breast biopsy or
  • short term follow-up examination.

Predict the severity (benign or malignant) of a mammographic mass lesion from:

  • BI-RADS attributes (BI-RADS: Breast Imaging Reporting and Data System; MRI, Ultrasound)
  • patient's age

5

4

  • Decision Tree
  • Graph View
  • Text View
  • Modelling Process
  • Neural Network Classification
  • SVM Classification
  • Interpreting the Results
  • Conclusions
  • References

How well a CAD system performs compared to the radiologists? 

  • Data Set Background
  • Attribute Information
  • Meta Data View
  • Before Data Cleaning
  • After Data Cleaning
  • Data Cleansing Process
  • Data Distribution
  • Histograms
  • Scatter Matrix

3

2

Meta Data View – After Data Cleaning

830 rows (13.7% loss)

Data Cleaning Process

After this _ Nominal to numeric (to give input data to classifiers):

we save the output to a csv file and then we import the data as «numerical» (we had problems otherwise).

6

7

Data Distribution – Scatter Matrix

Data Distribution - Histograms

Decision Tree – graph view

Decision Tree – text view

8

BI-RADS = 0

| Age > 63.500: 1 {1=3, 0=0}

| Age ≤ 63.500: 0 {1=0, 0=2}

BI-RADS = 2: 0 {1=0, 0=7}

BI-RADS = 3

| Age > 59: 1 {1=2, 0=1}

| Age ≤ 59

| | Age > 40.500

| | | Age > 42.500: 0 {1=1, 0=11}

| | | Age ≤ 42.500: 1 {1=1, 0=1}

| | Age ≤ 40.500: 0 {1=0, 0=7}

BI-RADS = 5: 1 {1=286, 0=31}

BI-RADS = 6

| Margin = 1: 0 {1=0, 0=2}

| Margin = 3: 1 {1=3, 0=0}

| Margin = 4: 1 {1=2, 0=0}

| Margin = 5: 1 {1=2, 0=0}

BI-RADS = 4

| Margin = 1: 0 {1=26, 0=259}

| Margin = 2

| | Age > 52.500: 1 {1=4, 0=2}

| | Age = 52.500: 0 {1=0, 0=5}

| Margin = 3

| | Age > 40

| | | Age > 43.500

| | | | Shape = 1: 0 {1=0, 0=2}

| | | | Shape = 2: 0 {1=0, 0=6}

| | | | Shape = 3

| | | | | Age > 55: 1 {1=5, 0=4}

| | | | | Age = 55: 0 {1=0, 0=3}

| | | | Shape = 4: 0 {1=4, 0=9}

| | | Age = 43.500: 1 {1=2, 0=0}

| | Age = 40: 0 {1=0, 0=10}

| Margin = 4: 0 {1=47, 0=53}

| Margin = 5

| | Age > 67.500: 1 {1=5, 0=0}

| | Age = 67.500: 0 {1=10, 0=12}

9

10

11

Neural Network

Modelling Process

Settings with best results:

  • Training cycle: 500
  • learning rate: 0.009
  • momentum: 0.5
  • error epsilon: 1.0E

X-Validation Operator is used for testing the Classifier models

Neural Network and Support Vector Machine

14

12

13

Support Vector Machine

EXPLORAÇAO de DADOS & DATA MINING (2011/2012)

ASSIGNMENT Nº 3 PROJECT ON CLASSIFIERS

Group10

Andrea Pravato 63626

Saswata Banerjee 66092

Settings with best results:

  • kernel type: dot
  • kernel cache: 300
  • convergence epsilon: 4.0E-4

Unlike ANNs:

computational complexity of SVMs does not depend on the dimensionality of the input space

Solution to an SVM is global and unique.

(http://www.svms.org/anns.html)

20

Tabulated Performance

16

17

Interpretation of Results

Conclusions

  • The Decision Tree was helpful is ordering the attributes according to their importance
  • BI-RADS > Margin > Age > Shape > Density
  • BI-RADS assessment was expected to be most important attribute as it is a scientific assessment and already in use for predicting malignant tumors.
  • The significance of Age attribute is more than the other non-predictive attributes. Hence, it is assigned more weight by both classifiers.
  • The Density attribute is proven to be the least significant, as we had predicted in the Data Study Phase. Removing this attribute from the Modelling dataset was found to enhance the performance.
  • Both Classifiers can not handle nominal data, so type conversion is necessary before nominal data could be fed into the classifiers.
  • K-NN and Naive Bayes Classifiers can handle nominal data, but they are primitive when compared to Neural Network and SVM. The accuracy, precision, recall is not as good as in the case of NN and SVM.
  • Accuracy was similar for both Classifiers, though it is desired to be better than just 85% only.
  • Both the classifiers are quick to process, SVM being relatively faster of the two (lesser computation).
  • Each has it’s own merits, though this is not visible w.r.t. the current dataset.

19

18

DEPARTAMENTO DE ELECTRÓNICA TELECOMUNICAÇÕES E INFORMÀTICA

UNIVERSIDADE DE AVEIRO 2011/2012

Learn more about creating dynamic, engaging presentations with Prezi