Search 101 - how search engines work - part 2
In Part 1 we covered how a search engine crawler visits web pages. In this part we're going to investigate how words on web pages are indexed.
You'll recall the three phases of search engines:
- Crawling (or spidering) the web, finding pages people want to search
- Indexing words on web pages
- Searching the index (i.e. the bit that happens when you type a search into Google)
A search engine index works very like the way the index in book works: in a book each word in the index lists page numbers the word appears on; in a search index each word has a list of pages the word appears on.
Here's what happens when you search for "blue widgets":
- Get the list of pages containing the word "blue"
- Get the list of pages containing the word "widgets"
- Return the pages that appear in both lists
The really clever stuff on search engines happens when deciding which pages are most relevant and get listed first, which we'll cover in Part 3.