Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Fast Clustering Algorithm

No description
by

Deepthi Chavan

on 10 June 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Fast Clustering Algorithm

Fast Clustering Algorithm
A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data
Clustering High Dimensional Data
Clustering high-dimensional data is the
cluster analysis
of data with anywhere from a few dozen to many thousands of dimensions.
Need for FAST
reducing dimensionality
removing irrelevant data
increasing learning accuracy
improving result comprehensibility.
Overview
The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features
Feature subset selection
Literature Survey
Feature Subset Selection
Graph Based Clustering
Graph Based Clustering
Any no uniform data contains underlying structure due to the heterogeneity of the data. The process of identifying this structure in terms of grouping the data elements is called clustering , also called data classification
Proposed Model
Proposed FCFS algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features
Proposed FCFS algorithm uses minimum spanning tree based method to cluster features. Meanwhile, it does not assume that data points are grouped around centers or separated by a regular geometric curve. The application of cluster analysis has been demonstrated to make algorithm more effective than traditional feature selection algorithms.

Advantages
The proposed system remove both irrelevant and redundant features from feature set.
The proposed system is applicable to any type of data set(text,image or micro array)
Disadvantages
The proposed system gives proper results with large size dataset.
Data flow Diagram- level 0
Data flow Diagram- level 1

HARDWARE REQUIREMENTS

Hard disk : 20GB
RAM : 128MB
Processor : Intel Pentium 4
SOFTWARE REQUIREMENTS
Operating System : Windows 7
Platform : Java
Front End : Eclipse JEE development environment
Use case diagram
Sequence Diagram
Control Flow Diagram
System Architecture
Existing Systems
The Embedded Systems
The Wrapper Systems
The Filter Systems
Hybrid Systems
Testing
Unit Testing
Integration Testing
Functional Testing
System Testing
Unit Testing
Unit Test case for Fast Users
Unit Test case for Fast Log
Unit Test case for Fast upload
Integration Testing
Integration testing for fast login
Integration testing for Fast logout
Integration testing for Fast Upload
Functional Testing
There are four types of functional testing for User Module.
There are four types of functional testing for login Module.
User Module Example
Login Module Example
System Testing
System Testing for OS version
System Testing for Processor types
System Testing for IDE versions
Conclusion
We have developed a novel clustering-based feature subset selection algorithm for high dimensional data.
The algorithm involves removing irrelevant features, constructing a minimum spanning tree from relative ones, and partitioning the MST and selecting representative features.
References
Almuallim H. and Dietterich T.G., Algorithms for Identifying Relevant Features, In Proceedings of the 9th Canadian Conference on AI, pp 38-45, 1992.
Almuallim H. and Dietterich T.G., Learning boolean concepts in the presence of many irrelevant features, Artificial Intelligence, 69(1-2), pp 279- 305, 1994.
Arauzo-Azofra A., Benitez J.M. and Castro J.L., A feature set measure based on relief, In Proceedings of the fifth international conference on Recent Advances in Soft Computing, pp 104-109, 2004.
Baker L.D. and McCallum A.K., Distributional clustering of words for text classification, In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval, pp 96- 103, 1998.
] Cardie, C., Using decision trees to improve case-based learning, In Proceedings of Tenth International Conference on Machine Learning, pp 25-32, 1993.
Biesiada J. and Duch W., Features election for high-dimensionaldatała Pearson redundancy based filter, AdvancesinSoftComputing, 45, pp 242C249, 2008.
Bell D.A. and Wang, H., A formalism for relevance and its application in feature subset selection, Machine Learning, 41(2), pp 175-195, 2000.
Thank You!
Wrapper systems and Filter Systems
T-relevance
Full transcript