Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

BU Bingle

A simple P2P Search Engine for the BU Website

Prashant Vaidyanathan

on 30 April 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of BU Bingle

url url url url Crawling List url url Crawler Crawler Crawler Crawler Crawler Internet Updater Updater Updater Updater Updater url url url url url url url Crawling List url BU - Bingle A simple P2P Search Engine for the Boston University Website. What is peer-to-peer? • Fully distributed • No central server/controller Finding Peers • Robust • Cost Effective • Powerful Yi Chen
Heng Du
William Chapin
Prashant Vaidyanathan • A new peer periodically broadcasts a codeword to the local sub network (TCP). BU - Bingle • If there is an existing peer, the codeword will be received and greeting protocol initiated. If not, the peer will listen. • The newcomer will be given the existing peer list, and a list of unindexed URLs to being crawling. Parsing & Searching Data Transaction Searching Peers

Each of this was extracted using String Manipulation

HTML Tags were searched for, and relevant data was extracted Parsing • When a query is submitted, it is sent to all peers in the sub-network. • Each peer runs the query locally and return the results to the searcher. Parser Controlled the continuity of the crawler.

Each Site yielded new URLs which were added to the Crawler List

The Parser would compute the frequency of each word in a website.

The search Algorithm would send queries to the SQL and rank the results based on relevance. • This method is simple, and works for our small application. • If the system needed to support a large number of peers, a graph representation would likely be used. HTML SOURCE CODE <p> Paragraphs</p> <img src="Image Link/> <a href="Hyperlinks/> <title>Title</title> <h> Headers </h> Crawler List Words Frequency DataBase Implementation Searching Implementation of tables Data Structures & Main Methods Update
delete + add
Both key words and the lower case version
Use hashmap to avoid duplicates
Connection pool
Singleton for multi-thread
Synchronized list of connections Features Future Scope Implement Huffman Coding to store the data

Improve and increase search results for Images.

Try to decrease search time further.

Find Peers across broader internet - Chord data structure.

Release this software as a Domain - based - P2P search engine. Use MySQL database to store and retrieve
Open source
Cross platform operability
Indexing type: B-tree
Basic storage engines: MYISAM & InnoDB
Time complicity O(logn) for both insert and search Challenges Java!!!

GUI development

Implementing SQL

Debugging cross platform issues A Q A Q Two tables in the database
data: the main table to store all crawled information
indexedList: contains urls that have already been crawled
will be dropped every time restarting the program Main Search GUI Peer Controller Panel Crawler Crawler Internet Updater Updater Updater Updater Updater url url url The search Function sends a query to:
1) The local database
2) To all available peers
The intermediate search queries in the database are stored in a hashmap, and the final results are returned a list.

This list is passed to a function which checks for the occurrence of the word and assigns a rank based on the following parameters:
i) +1000 : If the word appears in the Title of the Page
ii) +100 : for every time the word occurs in a Header
iii) + 10 : For every time the word occurs in a Paragraph • The time to find peers grows linearly with the number of nodes in the sub-network. Demo ' Crawler Crawler Data Panel url url Image Search Crawler Crawler Crawler Crawling List Database
Full transcript