Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Transcript of Google Algorithm
Neethu S Kumar
Pooja S. Bhandary Contents Facts
Introduction to Page Rank
The Final Result
Page Rank Zero Facts 30 million pages indexed in 1998
1 billion pages indexed in 2000
8 billion pages indexed in 2004
1.8 million company shares given to Stanford University for its page rank patent. Stanford sold these in 2005 for $336 million
1 trillion pages indexed in 2008
Google search handles over 1 billion searches everyday
7.2 billion daily page views
87.8 billion monthly worldwide searches on Google websites
A 'Noogler' is a new Googler Introduction Page Rank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyper linked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The Basic Idea Page Rank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. Page Rank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The Page Rank computations require several passes, called "iterations", through the collection to adjust approximate Page Rank values to more closely reflect the theoretical true value.
A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly expressed as a "50% chance" of something happening. Hence, a Page Rank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 Page Rank. Simplified Algorithm The basic idea is that every link leading to a page improves its rank. The amount by which its page rank is increased by each link however depends on the source page. Assume a small universe of 4 web pages: A, B, C and D. Page Rank is initialized to the same value for all pages. Probability distribution ranges between 0 and 1 and hence an initial page rank of 0.25 is chosen for each page. The Page Rank transferred from a given page to the targets of its outbound links upon the next iteration is divided equally among all outbound links. Simplified Algorithm ctd... If the only links in the system were from pages B, C, and D to A, each link would transfer 0.25 Page Rank to A upon the next iteration, for a total of 0.75.
Suppose instead that page B had a link to pages C and A, while page D had links to all three pages. Thus, upon the next iteration, page B would transfer half of its existing value, or 0.125, to page A and the other half, or 0.125, to page C. Since D had three outbound links, it would transfer one third of its existing value, or approximately 0.083, to A. Simplified Algorithm ctd... In other words, the Page Rank conferred by an outbound link is equal to the document's own Page Rank score divided by the number of outbound links L( ).
In the general case, the Page Rank value for any page u can be expressed as: Damping factor The Page Rank theory holds that even an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85. Damping Factor The damping factor is subtracted from 1 (and in some variations of the algorithm, the result is divided by the number of documents (N) in the collection) and this term is then added to the product of the damping factor and the sum of the incoming Page Rank scores. That is, The Final Result The formula uses a model of a random surfer who gets bored after several clicks and switches to a random page. The Page Rank value of a page reflects the chance that the random surfer will land on that page by clicking on a link. Taking all these and other factors into consideration, we have : Additional Factors Along with the basic factors effecting page rank that have been discussed so far, several other factors also effect the ranking of a page which greatly increase the complexity of the algorithm. Some of these additional factors include:
Visibility of a link
Position of a link within a document
Distance between web pages
Importance of a linking page
Up to dateness of a linking page Page Rank Zero This was a technique used by Google to block obscene web pages or web pages that had been reported by users from search results. Such web pages were not actually blocked by Google, they were instead given a page rank of zero so that they appeared at the absolute end of any search result after thousands of other results. If one actually had the patience, one could find these results by reaching the end of any search result. Doesn't a users boredom effect page ranking ? Yes it does !!! How does Google filter spam and block content ?? Thank You