Introducing

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Classification Mammographic Mass Data Set

Andrea Pravato

Updated May 4, 2012

Transcript

Muito OBRIGADO!!

Meta Data View – Before Data Cleaning

961 rows

Attribute Information

6 Attributes in total (1 goal field, 1 non-predictive, 4 predictive attributes)

Data Set Background

Summary

Source: Uci, Machine Learning Repository, Mammographic Mass Data Set:

http://archive.ics.uci.edu/ml/datasets/Mammographic+Mass

Mammography is the most effective method for breast cancer screening available today. But still, 70% unnecessary biopsies with benign outcomes.

Several computer-aided diagnosis (CAD) systems delivered. Help decide:

breast biopsy or
short term follow-up examination.

Predict the severity (benign or malignant) of a mammographic mass lesion from:

BI-RADS attributes (BI-RADS: Breast Imaging Reporting and Data System; MRI, Ultrasound)
patient's age

Decision Tree
Graph View
Text View
Modelling Process
Neural Network Classification
SVM Classification
Interpreting the Results
Conclusions
References

How well a CAD system performs compared to the radiologists?

Data Set Background
Attribute Information
Meta Data View
Before Data Cleaning
After Data Cleaning
Data Cleansing Process
Data Distribution
Histograms
Scatter Matrix

Meta Data View – After Data Cleaning

830 rows (13.7% loss)

Data Cleaning Process

After this _ Nominal to numeric (to give input data to classifiers):

we save the output to a csv file and then we import the data as «numerical» (we had problems otherwise).

Data Distribution – Scatter Matrix

Data Distribution - Histograms

Decision Tree – graph view

Decision Tree – text view

BI-RADS = 0

| Age > 63.500: 1 {1=3, 0=0}

| Age ≤ 63.500: 0 {1=0, 0=2}

BI-RADS = 2: 0 {1=0, 0=7}

BI-RADS = 3

| Age > 59: 1 {1=2, 0=1}

| Age ≤ 59

| | Age > 40.500

| | | Age > 42.500: 0 {1=1, 0=11}

| | | Age ≤ 42.500: 1 {1=1, 0=1}

| | Age ≤ 40.500: 0 {1=0, 0=7}

BI-RADS = 5: 1 {1=286, 0=31}

BI-RADS = 6

| Margin = 1: 0 {1=0, 0=2}

| Margin = 3: 1 {1=3, 0=0}

| Margin = 4: 1 {1=2, 0=0}

| Margin = 5: 1 {1=2, 0=0}

BI-RADS = 4

| Margin = 1: 0 {1=26, 0=259}

| Margin = 2

| | Age > 52.500: 1 {1=4, 0=2}

| | Age = 52.500: 0 {1=0, 0=5}

| Margin = 3

| | Age > 40

| | | Age > 43.500

| | | | Shape = 1: 0 {1=0, 0=2}

| | | | Shape = 2: 0 {1=0, 0=6}

| | | | Shape = 3

| | | | | Age > 55: 1 {1=5, 0=4}

| | | | | Age = 55: 0 {1=0, 0=3}

| | | | Shape = 4: 0 {1=4, 0=9}

| | | Age = 43.500: 1 {1=2, 0=0}

| | Age = 40: 0 {1=0, 0=10}

| Margin = 4: 0 {1=47, 0=53}

| Margin = 5

| | Age > 67.500: 1 {1=5, 0=0}

| | Age = 67.500: 0 {1=10, 0=12}

Neural Network

Modelling Process

Settings with best results:

Training cycle: 500
learning rate: 0.009
momentum: 0.5
error epsilon: 1.0E

X-Validation Operator is used for testing the Classifier models

Neural Network and Support Vector Machine

Support Vector Machine

EXPLORAÇAO de DADOS & DATA MINING (2011/2012)

ASSIGNMENT Nº 3 PROJECT ON CLASSIFIERS

Group10

Andrea Pravato 63626

Saswata Banerjee 66092

Settings with best results:

kernel type: dot
kernel cache: 300
convergence epsilon: 4.0E-4

Unlike ANNs:

computational complexity of SVMs does not depend on the dimensionality of the input space

Solution to an SVM is global and unique.

(http://www.svms.org/anns.html)

Tabulated Performance

Interpretation of Results

Conclusions

The Decision Tree was helpful is ordering the attributes according to their importance
BI-RADS > Margin > Age > Shape > Density
BI-RADS assessment was expected to be most important attribute as it is a scientific assessment and already in use for predicting malignant tumors.
The significance of Age attribute is more than the other non-predictive attributes. Hence, it is assigned more weight by both classifiers.
The Density attribute is proven to be the least significant, as we had predicted in the Data Study Phase. Removing this attribute from the Modelling dataset was found to enhance the performance.

Both Classifiers can not handle nominal data, so type conversion is necessary before nominal data could be fed into the classifiers.
K-NN and Naive Bayes Classifiers can handle nominal data, but they are primitive when compared to Neural Network and SVM. The accuracy, precision, recall is not as good as in the case of NN and SVM.
Accuracy was similar for both Classifiers, though it is desired to be better than just 85% only.
Both the classifiers are quick to process, SVM being relatively faster of the two (lesser computation).
Each has it’s own merits, though this is not visible w.r.t. the current dataset.

DEPARTAMENTO DE ELECTRÓNICA TELECOMUNICAÇÕES E INFORMÀTICA

UNIVERSIDADE DE AVEIRO 2011/2012

Choose a template

Science - Cranium (AI Assisted)

Unleash your creativity and captivate your audience with our Cranium Prezi AI-assisted presentation template, designed to stimulate innovative thinking and deliver a visually engaging experience for any intellectual endeavor.

Constellations (AI Assisted)

Illuminate your ideas with our captivating Constellations Prezi AI-assisted presentation template, merging celestial elegance with professional design to elevate your content and guide your audience through a stellar visual experience.

Modular - Dark (AI Assisted)

Revolutionize your presentations with our Modular Prezi AI-assisted presentation template, a versatile and customizable solution that adapts to your unique content, providing a visually stunning and cohesive framework for professionals, educators, and creatives.

See more templates →

Presentations from around the world

Estrés docente

Magali Grech

OLFATO

Amanda Rodrigues Pereira

MODELOS MATEMATICOS

aminta rosas

See staff picks →

Learn more about creating dynamic, engaging presentations with Prezi

Why Prezi is better