Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading content…
Transcript

How Does K-Mean Clustering Work?

K-Mean clustering is an algorithm for classifying or grouping things based on their common attributes or features.

The K in K-Mean stands for the number of groups the things would be classified into.

For example, suppose the teacher tells his 20 students to form a line according to height in an ascending order. It would be fairly easy to have a line from shortest to tallest. The problem would arise if the teacher asks his 20 students to form a line according to their height and weight in an ascending order. This is where K-Mean clustering would be very useful.

In business, K-Mean clustering is a great data mining method because of the decisions the manager can make once he/she knows the groupings of the company’s data. Here’s how K-Mean clustering can help in decision making.

Average Customer Satisfaction

Budget For Employee Training

Average Food Preparation Time

Loans

Next step is to choose how many grouping should there be or what the K should be.

In this case, K is equal to 2 which means the attributes should be grouped into 2 according to...

Having figured out how many groupings there would be, the next step is choosing the centroids for each group.

A centroid is the geometric center for each group.

In the first iteration, choose the centroid at random. In this case, the centroids are the data from month 7 and 8.

Next step is to compute for the distances of the attributes from the centroids

Using the Euclidean Distance

formula:

You will get something like this.

After getting the distances of the attributes per month from Centroid 1 and Centroid 2, the next step would be to find the new set of attributes grouped under Centroid 1 and Centroid 2.

The attributes nearer

to Centroid 1 would be grouped together and same goes for the attributes nearer to Centroid 2.

1 indicates that the particular attributes of that month belong to that centroid.

The last step is finding the coordinate of the new set of centroids. The average coordinates per group would be the new set of centroids as shown in this table...

The new grouping would look something like this...

Following the same steps which are:

1. Get the distances of the attributes per month from the centroids through the use of Euclidean Distance formula

2. Know the which attributes per month is nearer to Centroid 1 and Centroid 2

3. See if the groupings have changed

4. If no, then it is already the correct grouping of the attributes with the K of 2

5. If yes, repeat the whole step until groupings would not change

This is how the whole process should look like..

1st Iteration

2nd Iteration

3rd Iteration

4th Iteration

Since there was no change in the groupings after the 4th iteration, it can be concluded that the data belong to the correct clusters.

An annual data set of a food company that has the following attributes

Learn more about creating dynamic, engaging presentations with Prezi