Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
Research Conducted in Partial Fulfillment of the Requirements for the Degree Bachelors of Arts in the Computer Science at The College of Wooster
By Max Rafferty
Advised by Sofia Visa
College of Wooster - 2010
However,
sharing is limited by a problem
as old as the Internet.
What is wrong with Centralization?
Centralized servers do not scale to usage.
A special class of web cache
BEDIC is a 4-phase algorithm
Note that an individual is working through 2 or 3 central machines just to connect
Keep in mind -
Centralization is sometimes necessary!
Servers subdivide their total available upload bandwidth amongst all their users. If there are many users, each user's portion becomes small.
Central server machines are very expensive, both to buy and to maintain.
Each server can handle only a certain number of users at once. The only way to allow more users is to install new hardware.
A lack of scalability causes problems for three main reasons:
Additional web users currently have a universally negative impact on network performance
Thousands of new users access the Internet daily. Because performance does not scale to the number of users, each one makes Internet access more expensive for everyone.
Consider people trying to access resources across country, from New York to LA and vice versa.
Internet requests are passes along by routing servers (routers) along the way to their destination, and each server has a capacity for all users using it. That means that this commuinication is taking bandwidth and potentially slowing or preventing other connections.
However, add a few geographically located data centers, and slow and intermediate hubs are relieved of unneccisary transfers. "A chain is only as strong as its weakest link" is true of networks, so short chains are advantageous.
Keek in mind however - Such useful proxies come at a cost!
While there are many free sites available to post content, such sites simply allow users to share for free. There is no guarentee that any centralized service will always be available.
There are many reasons a single server could go down, but all result in inaccessibility of data
<- Proxies and Web Servers
The concept of web caching is simple: Build as many proxy servers as are needed to serve every single user.
P2P doesn't imply decentralized. The infamous Napster program was completely dependant on a central server. Actual content, however, was transferred directly between users.
P2P filesharing has been demonized as a result of its usefulness for internet piracy - which is in turn the result of its extreme efficiency at distribution of content.
P2P filesharing programs essentially make the user's PC a proxy for rwhatever files they have downloaded. Anyone who downloads the file then has a copy of it, and can share it with others. That way, as more users request the file, there are more users available who have it to share. This is also essentially the definition of a Distributed Web Cache.
What exactly is a peer, you ask? Looking at our basic chart of the Internet, all of the little PC's (with private IP addresses) are peers. All of the larger shapes are other network components, and communication between them is called host-to-host.
BitTorrent (BT)
A proxy server is a web cache. It copies all the content off of the server it is proxying, and then when users geographically near it request that data, it pretends it is the original server
The most popular and advanced P2P filesharing system available today is BT.
When extended, BT is fully decentralized and is also capable of downloading files from URL, unlike any other P2P system.
BT is immune to slow speeds from slow connections and many users
The only limit on the number of users in BT is net bandwidth
BT maximizes even slow or unreliable connections
BT can operate with only the users sharing files
Files downloaded with BT are impossible to tamper or corrupt
"The Internet", or interconnected networks, refers to all the physical machines and software that make it up. The "World Wide Web" is all of the data that is URL-accessable, or can be found with a browser. This is the data we are concerned with.
It looks like this (lines are links between pages):
----------------------
Wikipedia, Twitter,
Digg, YouTube, Flickr... the list of easy ways of sharing data goes on. This presentation was made
on one such site, Prezi.com
Every day, trillions of bytes of data are added to the World Wide Web by users - and doing so is only getting easier.
Many people are unaware that everything that comes from the Internet is at least temporarily downloaded to their local machine - there is no such thing as viewing a file "online"
BitTorrent Enhanced Distributed Internet Caching
As expectd for large downloads, IE8 performed worst. BEDIC was best at every point, only ever falling behind Vuze outliers to outliers
However, because web files must exist on one's computer to view or use them, all web files can potentially be shared from peer to peer using BitTorrent.
Results almost identical to the 200MB trials, however this marks the only time Vuze averages a better time than BEDIC, and only at one point. BEDIC performance is consistent, but good.
By pairing this with BT's ability to download from the web (web-seeding), BEDIC will theoretically always be equal to or better than current browsing techniques.
Our implementation required us to write a full client program to actually download the pages at hand, a BHO extension for Internet Explorer to display downloaded files and a handful of modifications to the libtorrent library used for BT functions.
As expected, Vuze performed poorly with small filesizes, as that was never what it was intended for. BEDIC maintains superiority when any peer seeds are present, though 10MB appears to be close to the point where IE8 will move up for small files.
Results here are almost identical to the 10MB tests, however IE8 has just overtaken BEDIC, showing the critical point to be somewhere between 5MB and 10MB. Further small file optimization in BEDIC could certainly lower this point.
Digital IS Sources
Thanks To:
Laptop Icon- Daniel Clemente. Wikimedia Commons, 13 June 2006. http://commons.wikimedia.org/wiki/File:Laptop_icon.svg