How a Search Engine Collects Information

Every time you search for a word, term or phrase the search engine compares it to records it has already stored away in it’s database. It’s aim is to find the most relevant result to match the keywords that have just left your fingers.

Google is the world’s most popular search engine, so with this in mind I will refer to how “google” works in place of saying how “a search engine works”. Ok let’s kick it off.

The google spider is constantly crawling the web, indexing content, recording links between sites, noting what names are given to pages, how the names of these pages relate to the content within those pages and in turn how they are relevant to the overall theme of the website. The word relevant is very important here. Google wants to establish what results to return to the searcher that are most relevant to the search phrase used.

The spider has two main tasks Gather and Record


  • Gather (Crawl)            The spider crawls it’s way from site to site, through pages and pages, looking for content and links. These links can be internal links that connect to other pages within the site that the spider is currently on or they can be a link out to another site.


  • Record (Index)            We all know that trawling through paper files for specific information can be made far easier if we have a system of reference or index. We also know how important it is to be able to trust the index to help you find what you are looking for.


Google’s continued success rides on it’s ability to return relevant results.

Imagine the frustration a school secretary would feel if he or she was looking for the file that contained the names of the members of this years soccer panel only to find that contained within the Sports/soccer section is a document related to some recent landscaping done at the school grounds instead of her intended find. It gets worse, when she goes to place this document where it belongs in the Contractors/local/groundcare she finds the school policy document on bullying.

If this sort of thing were to happen every time you used google, you’d most likely stop using it.

Once gathered what information does google record?


  • The content and structure of each page


  • All pages linked to from this page internal (within the site) and external (to another site)


  • All links that come into this page


  • HTML tags (page title, headings, text describing images etc.)


Once recorded what does google do with this information?


Google applies it’s own algorithms or formulae to the records. These are sets of calculations used to determine just how relevant a websites content is to a particular search term. These algorithms are sophisticated, complicated and very secret. Each of the major search engines has it’s own algorithm and this is essentially what makes them different from each other.

The main factors that google considers in forming it’s decision on relevancy are

  • The number and authority of related websites that link to the page


  • The relevance of the content on the page to the particular search query




If google really wants to return the appropriate results and you want your website to be at the top end of these results, then you must meet it at least half way. That’s where Search Engine Optimisations come in.