Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Transcript of BU Bingle
Prashant Vaidyanathan • A new peer periodically broadcasts a codeword to the local sub network (TCP). BU - Bingle • If there is an existing peer, the codeword will be received and greeting protocol initiated. If not, the peer will listen. • The newcomer will be given the existing peer list, and a list of unindexed URLs to being crawling. Parsing & Searching Data Transaction Searching Peers
Each of this was extracted using String Manipulation
HTML Tags were searched for, and relevant data was extracted Parsing • When a query is submitted, it is sent to all peers in the sub-network. • Each peer runs the query locally and return the results to the searcher. Parser Controlled the continuity of the crawler.
Each Site yielded new URLs which were added to the Crawler List
The Parser would compute the frequency of each word in a website.
The search Algorithm would send queries to the SQL and rank the results based on relevance. • This method is simple, and works for our small application. • If the system needed to support a large number of peers, a graph representation would likely be used. HTML SOURCE CODE <p> Paragraphs</p> <img src="Image Link/> <a href="Hyperlinks/> <title>Title</title> <h> Headers </h> Crawler List Words Frequency DataBase Implementation Searching Implementation of tables Data Structures & Main Methods Update
delete + add
Both key words and the lower case version
Use hashmap to avoid duplicates
Singleton for multi-thread
Synchronized list of connections Features Future Scope Implement Huffman Coding to store the data
Improve and increase search results for Images.
Try to decrease search time further.
Find Peers across broader internet - Chord data structure.
Release this software as a Domain - based - P2P search engine. Use MySQL database to store and retrieve
Cross platform operability
Indexing type: B-tree
Basic storage engines: MYISAM & InnoDB
Time complicity O(logn) for both insert and search Challenges Java!!!
Debugging cross platform issues A Q A Q Two tables in the database
data: the main table to store all crawled information
indexedList: contains urls that have already been crawled
will be dropped every time restarting the program Main Search GUI Peer Controller Panel Crawler Crawler Internet Updater Updater Updater Updater Updater url url url The search Function sends a query to:
1) The local database
2) To all available peers
The intermediate search queries in the database are stored in a hashmap, and the final results are returned a list.
This list is passed to a function which checks for the occurrence of the word and assigns a rank based on the following parameters:
i) +1000 : If the word appears in the Title of the Page
ii) +100 : for every time the word occurs in a Header
iii) + 10 : For every time the word occurs in a Paragraph • The time to find peers grows linearly with the number of nodes in the sub-network. Demo ' Crawler Crawler Data Panel url url Image Search Crawler Crawler Crawler Crawling List Database