Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Fast Clustering Algorithm
Transcript of Fast Clustering Algorithm
A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data
Clustering High Dimensional Data
Clustering high-dimensional data is the
of data with anywhere from a few dozen to many thousands of dimensions.
Need for FAST
removing irrelevant data
increasing learning accuracy
improving result comprehensibility.
The FAST algorithm works in two steps. In the ﬁrst step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features
Feature subset selection
Feature Subset Selection
Graph Based Clustering
Graph Based Clustering
Any no uniform data contains underlying structure due to the heterogeneity of the data. The process of identifying this structure in terms of grouping the data elements is called clustering , also called data classification
Proposed FCFS algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features
Proposed FCFS algorithm uses minimum spanning tree based method to cluster features. Meanwhile, it does not assume that data points are grouped around centers or separated by a regular geometric curve. The application of cluster analysis has been demonstrated to make algorithm more effective than traditional feature selection algorithms.
The proposed system remove both irrelevant and redundant features from feature set.
The proposed system is applicable to any type of data set(text,image or micro array)
The proposed system gives proper results with large size dataset.
Data flow Diagram- level 0
Data flow Diagram- level 1
Hard disk : 20GB
RAM : 128MB
Processor : Intel Pentium 4
Operating System : Windows 7
Platform : Java
Front End : Eclipse JEE development environment
Use case diagram
Control Flow Diagram
The Embedded Systems
The Wrapper Systems
The Filter Systems
Unit Test case for Fast Users
Unit Test case for Fast Log
Unit Test case for Fast upload
Integration testing for fast login
Integration testing for Fast logout
Integration testing for Fast Upload
There are four types of functional testing for User Module.
There are four types of functional testing for login Module.
User Module Example
Login Module Example
System Testing for OS version
System Testing for Processor types
System Testing for IDE versions
We have developed a novel clustering-based feature subset selection algorithm for high dimensional data.
The algorithm involves removing irrelevant features, constructing a minimum spanning tree from relative ones, and partitioning the MST and selecting representative features.
Almuallim H. and Dietterich T.G., Algorithms for Identifying Relevant Features, In Proceedings of the 9th Canadian Conference on AI, pp 38-45, 1992.
Almuallim H. and Dietterich T.G., Learning boolean concepts in the presence of many irrelevant features, Artificial Intelligence, 69(1-2), pp 279- 305, 1994.
Arauzo-Azofra A., Benitez J.M. and Castro J.L., A feature set measure based on relief, In Proceedings of the fifth international conference on Recent Advances in Soft Computing, pp 104-109, 2004.
Baker L.D. and McCallum A.K., Distributional clustering of words for text classification, In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval, pp 96- 103, 1998.
] Cardie, C., Using decision trees to improve case-based learning, In Proceedings of Tenth International Conference on Machine Learning, pp 25-32, 1993.
Biesiada J. and Duch W., Features election for high-dimensionaldatała Pearson redundancy based filter, AdvancesinSoftComputing, 45, pp 242C249, 2008.
Bell D.A. and Wang, H., A formalism for relevance and its application in feature subset selection, Machine Learning, 41(2), pp 175-195, 2000.
Wrapper systems and Filter Systems