Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Blown to Bits Chapter 4
Transcript of Blown to Bits Chapter 4
5. Determine the relevance of each possible result to the query.
6. Determine the ranking of the relevant results.
7. Present the results.
Search is Power
When it comes to web searches, who pays for what?
Needles in the Haystack
How it Works
Mark Wilson ;)
Mark Wilson <3
Blown to Bits Chapter 4
After being separated due to Soviet oppression, The Polotsky cousins were reunited after 70 years because of a web search.
1) Gather Information
-Search engines don't index everything
-Most likely a master list of sites to visit
-The software used to "crawl" the internet is called a spider
-A website may stipulate it doesn't want spiders to visit
-Spidering is not cost free!
1) Gather Information
2) Keep copies
3)Build an index
4) Understand the query
"The digital explosion grants the power of both instant communication and instant retraction—but almost every digital action leaves digital fingerprints."
Yahoo is an acronym for: "Yet Another Hierarchial Officons Oracle"
The Fall of Hierarchy
-URLs are the way that we can classify the different kinds of information we are trying to find, but again if you don't know the URL you could have a hard time trying to find the website if you don't know it
-Information is being pumped out so fast it can sometimes not be recorded or made use of
-soon technology has the opportunity to have the
human memory available to every individual
- High tech inventions make progress attainable, but it is also the device pushing us towards progress
5) Determining Relevance
- Finding all the relevant documents is referred to as “recall.”
-Total recall is unachievable—but it is also unimportant.
-Degree of relevance always trumps level recall.
HOW DO WE KNOW WHICH ARE RELEVANT?
We do calculations of simple relevance
- Count the # of times each term appears
- Add them up
6) Determine Ranking
Ranking is critical in making the search useful.
A search may return thousands of relevant results, and users want to see only a few of them.
The simplest ranking is by relevance
One of Google’s innovations was the PageRank
“Importance” or “Reputation” can be extracted
Documents can be "good results" not just by relevance but quality as well.
WHAT MAKES A PAGE SEARCHABLE?
The formulas remain secret, but there are some factors that might be taken into account:
- Keywords in heading and title;
- Keywords in body text
- Site is “Trustworthy”
- Links on page are relevant
- Age of the Page
- Quality of text.
After all hard work from step 1-6, search engines typically provide the results in a format that is older than Aristotle --- the simple, Top-to-Bottom list.
Introducing these new forms of navigation may shift the balance of power in the search equation. Being at the top of the list may no longer have the same economic value, but something else may replace the currently all-important rank of results.
Therefore, no matter how the results are presented, something else appears alongside them, and probably always will.
- Search engines are a way to find specific things
-With old internet, everything was hard to find
-Now, it is a contributory information source
-Searching is the power users have to find information in vast digital networks
Who Pays for what?
90% use web searches
40% use web searches daily
Why are they popular?
TONS of information, at the click of a button
-Free to use
-No fine print
The most important part
So who pays for all this information?
Universities and federal government payed for information retrieval
Web Crawler (1994)
Search Research needed more money
Big Businesses saw opportunity for more money and got involved with search engines
AOL one of the first and more popular :
-Own search engine
used "target advertising"
to make money
-Aol came close to ethical line of providing search results that can be changed by paid advertising
-Brin and Page paper "mixed motives"
-Thought there would be no compromise in having paid advertising and organic search results
-Open Text : selling page rank to companies
Payments, Clicks, and Auctions
Overture: charged advertisers to be searchable
Argument : If they can spend money on advertising, they can spend money on their own webpage
Believed it would make consumers happy.
Auctions started happening
"Click-to-Play" and auctions were very preferable
Uncle Sam Takes Notes
Search engines offered different ways for their pay-for-placement ads
Terms used ad sponsor and at times became very vague
Didnt label their ads
Government stayed out of it
Ralph Nader and watch dog group Consumer Alert
FTC decided to protect consumers from being tricked by search engines, and forced them to clarify organic and sponsored results
Google Finds Balance
I early 2000's search engines were struggling ethically
Started to use the PageRank algorithim
This was good because it did not interfere with organic searches
Google was able to achieve balance,
Brin and Page were wrong
Regulating or Replacing the Brokers
-Search engines have become a central point of control in the digital world
-Information access has greater value than information created
Should anyone watch over the industry?
-Search engines have the right to ban anything they want
-Even with this, the FTC stays out
-Foreign Leaders are scared of American search engines
Algorithmic Does Not Mean Unbiased
- Having a lot of money will not buy you a high rank
- Search engines are likely to favor fresh items over older and perhaps more comprehensive sources
Not All Search Engines Are Equal
- When we use a search engine, we may think that what we are getting is a representative sample of what’s available (Very Far From Reality)
- A study comparing queries to Google, Yahoo!, ASK, and MSN showed that the results returned on the first page were unique 88% percent of the time
- Ranking determines visibility
- If they don’t find what they are looking for, more than 80% start the search over with the same search engine, changing the keywords—as though confident that the search engine “knows” the right answer, but they haven’t asked the right question.
Search Results Can Be Manipulated
- Search is a remarkable business
- For many students, for example, the library is an information source of last resort
-Because ranking is algorithmic, a set of rules followed with diligence and precision, it must be possible to manipulate the results
- The key to search engine optimization is to understand how particular engines do their ranking
Search Engines Don’t See Everything
- Standard search engines fail to index a great deal of information that is accessible via the Web
Search Control and Mind Control
- Removing information in the digital world does not require removing the documents
-controlling “find ability,”
- Google had a yes-or-no decision
- Completely universal accessibility was already more than Google could lawfully accomplish
You Searched for WHAT? Tracking Searches
- Search engine companies can store everything you look for, and everything you click on
- But why would search companies bother to keep every keystroke and click?
- Search quality can improve if search histories are retained
-The more "important" your search is, the faster it will appear
-Because of the Web being unstructured, there is no correct order to visit pages.
-Some general utility search indexes are: Google, Yahoo and Ask.
-Other search engines are domain specific
2) Keep Copies
-A copy of every web page "spidered" is downloaded
-Caching is another blow to the web as library metaphor
-Pages of dangerous information survive
-The digital explosion grants power of instant communication and retraction
-What about copyrights?
￼FINDING DELETED PAGES
An easy experiment on finding deleted pages is to search using Google for an item that was sold on craigslist. You can use the “site” modifier in the Google search box to limit your search to the craigslist web site, by including a “modifier”:
The results will likely return pages for items that are no longer avail- able, but for which the cached pages will still exist.
3)Build an Index
-A search engine's index is like a book's index
-Indexes are critical because of sequential searching
-Search engines do not start at beginning of index and go through in order
-Binary search is faster than linear search
-How big is the index?
4)Understand the query
-Reducing returned results
-Computers are STUPID!