Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading content…
Loading…
Transcript

BitTorrent-Enhanced Distributed Internet Caching (BEDIC):

Research Conducted in Partial Fulfillment of the Requirements for the Degree Bachelors of Arts in the Computer Science at The College of Wooster

By Max Rafferty

Advised by Sofia Visa

College of Wooster - 2010

Everyone's Web...

However,

sharing is limited by a problem

as old as the Internet.

The Problem

What is wrong with Centralization?

Centralized servers do not scale to usage.

Web Caching

A cache simply a place where something is stored. In the case of computers, we are storing data.

Peer-to-Peer (P2P)

A special class of web cache

BitTorrent for

web-browsing

BEDIC is a 4-phase algorithm

IMPLEMENTATION

Results

Our experiments here were of raw download time, to see the extent to which multiple peers helped speeds, as well as to test our http/BT hybrid client against a pure BT client with the same seeds. We ran 4 sets of tests with differing sized downloads on Internet Explorer 8, the Vuze BT client, and our BEDIC implementation.

We hypothesized that each program tested would perform well for some specific sizes, with BEDIC landing somewhere in between the other two. As shown by the charts to the left, we were fairly uniformly correct.

P.S. - Don't forget to check the extra pages on the outside of the network for more info about the Internet! Advance the slide once more for sources.

A basic model of the Internet.

Note that an individual is working through 2 or 3 central machines just to connect

SPEED

COST

Keep in mind -

Centralization is sometimes necessary!

Servers subdivide their total available upload bandwidth amongst all their users. If there are many users, each user's portion becomes small.

Central server machines are very expensive, both to buy and to maintain.

Each server can handle only a certain number of users at once. The only way to allow more users is to install new hardware.

A lack of scalability causes problems for three main reasons:

1

2

Additional web users currently have a universally negative impact on network performance

A Web Cache Example

Thousands of new users access the Internet daily. Because performance does not scale to the number of users, each one makes Internet access more expensive for everyone.

Consider people trying to access resources across country, from New York to LA and vice versa.

Internet requests are passes along by routing servers (routers) along the way to their destination, and each server has a capacity for all users using it. That means that this commuinication is taking bandwidth and potentially slowing or preventing other connections.

ACCESS

However, add a few geographically located data centers, and slow and intermediate hubs are relieved of unneccisary transfers. "A chain is only as strong as its weakest link" is true of networks, so short chains are advantageous.

Keek in mind however - Such useful proxies come at a cost!

While there are many free sites available to post content, such sites simply allow users to share for free. There is no guarentee that any centralized service will always be available.

3

There are many reasons a single server could go down, but all result in inaccessibility of data

A (Temporary) Solution

<- Proxies and Web Servers

The concept of web caching is simple: Build as many proxy servers as are needed to serve every single user.

+

=

P2P doesn't imply decentralized. The infamous Napster program was completely dependant on a central server. Actual content, however, was transferred directly between users.

P2P filesharing has been demonized as a result of its usefulness for internet piracy - which is in turn the result of its extreme efficiency at distribution of content.

P2P filesharing programs essentially make the user's PC a proxy for rwhatever files they have downloaded. Anyone who downloads the file then has a copy of it, and can share it with others. That way, as more users request the file, there are more users available who have it to share. This is also essentially the definition of a Distributed Web Cache.

What exactly is a peer, you ask? Looking at our basic chart of the Internet, all of the little PC's (with private IP addresses) are peers. All of the larger shapes are other network components, and communication between them is called host-to-host.

BitTorrent (BT)

A proxy server is a web cache. It copies all the content off of the server it is proxying, and then when users geographically near it request that data, it pretends it is the original server

Our P2P system of choice

The most popular and advanced P2P filesharing system available today is BT.

When extended, BT is fully decentralized and is also capable of downloading files from URL, unlike any other P2P system.

Notable BT improvements over proxy caching:

Original (unextended) BitTorrent

Add ons, or extensions have made BT what it is today. While it was still very effective in its unextended form, it was a centralized system, and it lacked the ability to communicate with web (HTTP) servers, giving it a greater risk of download failure

  • Load Balance

  • Scalability

  • Robustness

  • Decentralization

  • Verification

A leech is a peer with none or only a part of the file they want.

A seed is a peer with a complete copy of a file.

BT is immune to slow speeds from slow connections and many users

The only limit on the number of users in BT is net bandwidth

BT maximizes even slow or unreliable connections

BT can operate with only the users sharing files

Files downloaded with BT are impossible to tamper or corrupt

A web cache is a cache that exclusively stores data relating to the Internet

- very often web pages

"The Internet", or interconnected networks, refers to all the physical machines and software that make it up. The "World Wide Web" is all of the data that is URL-accessable, or can be found with a browser. This is the data we are concerned with.

It looks like this (lines are links between pages):

----------------------

The issue of

Centralization

(1-way data from a central server to everyone else)

Facebook

Wikipedia, Twitter,

Digg, YouTube, Flickr... the list of easy ways of sharing data goes on. This presentation was made

on one such site, Prezi.com

Every day, trillions of bytes of data are added to the World Wide Web by users - and doing so is only getting easier.

Web

Server

Applying Peer-to-Peer Protocols to Enhance Performance and Scalability of Browser-Accessible Internet Resources

This presentation is also represented by a content sharing P2P network.

It represents how the World Wide Web could work were BEDIC applied widely.

We call it...

BEDIC

200MB

Many people are unaware that everything that comes from the Internet is at least temporarily downloaded to their local machine - there is no such thing as viewing a file "online"

BitTorrent Enhanced Distributed Internet Caching

As expectd for large downloads, IE8 performed worst. BEDIC was best at every point, only ever falling behind Vuze outliers to outliers

72MB

However, because web files must exist on one's computer to view or use them, all web files can potentially be shared from peer to peer using BitTorrent.

Results almost identical to the 200MB trials, however this marks the only time Vuze averages a better time than BEDIC, and only at one point. BEDIC performance is consistent, but good.

By pairing this with BT's ability to download from the web (web-seeding), BEDIC will theoretically always be equal to or better than current browsing techniques.

10MB

Our implementation required us to write a full client program to actually download the pages at hand, a BHO extension for Internet Explorer to display downloaded files and a handful of modifications to the libtorrent library used for BT functions.

As expected, Vuze performed poorly with small filesizes, as that was never what it was intended for. BEDIC maintains superiority when any peer seeds are present, though 10MB appears to be close to the point where IE8 will move up for small files.

The fact that even our simple BEDIC implementation was able to consistently match and beat heavily developed programs like IE8 and Vuze speaks to the effectiveness of the system

5MB

Results here are almost identical to the 10MB tests, however IE8 has just overtaken BEDIC, showing the critical point to be somewhere between 5MB and 10MB. Further small file optimization in BEDIC could certainly lower this point.

Digital IS Sources

Thanks To:

Laptop Icon- Daniel Clemente. Wikimedia Commons, 13 June 2006. http://commons.wikimedia.org/wiki/File:Laptop_icon.svg

  • Cursor Source- Meßelektronik "Otto Schön" Dresden. http://commons.wikimedia.org/wiki/File:Robotron-KC87_Z9001-OS.gif
  • 4Chan Face- User: Nicoxan1. Photobucket, 2010. URL http://media.photobucket.com/image/4chan%20face/nicoxan1/4chan-happy-face.png
  • File Icon- Andrew Enyart, Wikimedia Commons, 23 April 2006. http://commons.wikimedia.org/wiki/File:38254-new_folder-12.svg
  • Dead Face Icon- User: Cäsium137 (T.). Wikimedia Commons, 6 June 2009. http://commons.wikimedia.org/wiki/File:SMirC-dead.svg
  • Jon Sullivan. Explosions. Wikimedia Commons, 13 Nov 2004. http://commons.wikimedia.org/wiki/File:Explosions.jpg
  • Basic Internet Model – Bryan Ford Massachusetts and Bryan Ford. Peer to Peer Communication across network access translators. In In USENIX Annual Technical Conference, pages 179-192,2005.
  • Generic Web Cache- Jia Wang. A Survey of Web Caching Schemes for the Internet. AMC Computer Communications Review, 29:36-46, 1999.
  • Napster Chart – Jeff Tyson. How the Old Napster Worked. World Wide Web, 2000.
  • Arvid Norberg. Libtorrent. World Wide Web, 2005. URL http://www.rasterbar.com/products/libtorrent
  • John Sudds. Building Browser Helper Objects with Visual Studio 2005. World Wide Web, Oct 2006.
  • Vuze.com
  • Facebook.com
  • Wikipedia.org
  • wikimedia.org
  • Digg.com
  • YouTube.com
  • Flicker.com
  • Prezi.com
Learn more about creating dynamic, engaging presentations with Prezi