Search Engine for the World Wide Web

•

The entire project is divided into 4 components:

Web crawler: The is the module that is responsible for crawling the World Wide Web, and extracting important meta information about the web page that can describe what the page is all about. The crawler then stores this information in a reversely indexed database and assigns an initial page rank to each page.

Search algorithm: It takes the search query as the input and returns related web pages as the output. It dissects the search query, tries to extract signals from the query regarding what the user might be looking for and then uses these signals to get results from the database indexed by the crawler. After each return, the search algorithm updates the page rank of each web page.

Autocomplete algorithm: This service is called every time the user types a character in the query input. Initially, it will have the entire English dictionary stored in a tree-based data structure called trie. The frequency of each node depicts the number of words ending at that node. After each search query is successfully executed, it is added to the data structure for future referencing.

Front end: The module which will interact with the user, i.e. it will take the search query from the user and display related web pages. It has two important tasks to accomplish – to call the autocomplete service every time a character is written by the user and to pass the submitted query to the search algorithm.