Home --> SEO --> The Definitive SEO Guide to Search Engines
The Definitive SEO Guide to Search Engines
Understanding Search Engine Optimization or, SEO, can only begin when you understand how search engines work. Search Engine Marketing relies on your knowledge of search engine behavior. Learning how all the pieces fit together is much easier than you’d think.
What you are about to learn are important fundamentals for effectively executing your SEO strategy… by building upon your knowledge of how a search engine actually works.
This is a long post, but if you’re serious about learning exactly how search engine optimization or search engine marketing works, then stick with it. You’re going to find some nuggets along your way.
Bookmark this page or print it out so that you can always come back and review what you’ve learned.
Starting With Some Facts
Before we can begin to properly understand search engines, there are some things you need to get you head around. Right now I’m going to blow the lid off some popular misconceptions about search engines.
- MYTH: Search engines store web pages. BUSTED! Search engines don’t store web pages. They index the words in the content of those web pages and also keep track of link information and document locations.
- MYTH: Spiders or bots index web pages. BUSTED! Spiders, or bots, don’t index web pages. They find and crawl the contents of web pages and send that data to the search engine.
- MYTH: Search engines read and then rank web pages. BUSTED! Search engines don’t read pages and decide how they should rank. They receive a search query and rank the best matching pages to that particular query.
- MYTH: Search engines are smart enough to work out what a searcher is thinking. BUSTED! Search engines are just big dumb computers. OK maybe not completely dumb, but you get the point. It’s a computer program, executing a predefined set of instructions, and it doesn’t think.
As you read through the sections of this post you will learn how and why each of these myths are SO BUSTED!
Anatomy of a Search Engine
Search engines have 2 functions. Firstly, finding new content, or crawling the web. They use computer programs, known as bots or spiders, to wander through the web on an endless pilgrimage, in search of content. A search engine spider uses the system of hyperlinks that appear on web pages, to navigate its way through. Just the same as we do.
It reads and collects all the content that it finds on each web page, and then it follows those links to other web pages, where it repeats the whole process endlessly.
Secondly search engines provide search results. The aim of the search engine is to provide the most relevant results to a users query. It considers and ranks each page using the collected data, and makes it’s best attempt at providing the user with the location of the most useful websites for the query.
Some of the factors used in ranking are more important than others when it comes to SEO, but since this is not strictly an SEO lesson, we’ll keep that to a minimum. You won’t need to know every last detail about all of these factors, but you will become familiar with some of the most important ones as you continue to learn about how to perform absolutely bulletproof SEO.
For now just stay focused on the two primary functions of the search engines.
How Search Engines Find and Index Web Pages
There are several ways that search engines find and index pages. Some work better than others. None are guaranteed…except one!
The Old School Way
One way is to do a manual site submission via a web form. The search engines provide a web form to submit your site to their index. This tells the search engine that your site exists but won’t necessarily get any pages indexed.
The New Cool Way
Another way is to submit an XML sitemap. XML sitemaps tell the search engine about all the pages that exist in your site by providing it with a document map. But there is a big misconception about the XML Sitemap priority field.
The myth goes that setting a higher priority for certain pages will cause the spider to crawl your page faster or more frequently. The only problem with that is the search engines can choose to ignore it completely.
The priority field is more of an indication to the search engine about what you consider to be your important pages.
The Expensive Way
Another option to get your site indexed faster or more frequently might be paid inclusion into a web directory. There are a small number of directories that Google still considers as reputable. One example would be Yahoo! Directory paid listings.
Generally speaking though, Google considers directories to be bad neighborhoods. Besides if you learn SEO properly, the cost of paid inclusion seems massively inflated.
The Guaranteed Way
As I said there is one way that is absolutely guaranteed to get your page crawled and indexed. In fact it’s not only the most favored by spiders for crawling and discovering new pages, it’s also considered the top factor in how well a page ranks.
The main way that search spiders find new content on your site, is by following the links to your site from other web pages.
As the spider crawls web pages, it discovers links which are then analyzed for any META data like nofollow or noindex, and then follows that link to the next page which it then begins to crawl. So if you want the spider to crawl your site more often, get more links.
Now some of this might not be news to you so far. Well that’s all about to change as we dive deeper into how search engines actually work.
A Closer Look at Crawling
Get this in your head now. The spider is not responsible for indexing your site. In fact on close inspection the spider is more like a drone. It simply crawls the pages, reads the content, page location as well as a ton of other data, and stores it all to be processed later by the ‘actual’ search engine.
Whatever name you want to use; spider, robot, crawler, it’s all really the same thing. A big old computer program that wanders around the web, jumping from link to link, site to site, searching for content to tell the search engine about, which is just another big computer program and a massive database.
But…that’s really all it does.
Spiders do have to follow a few rules that are defined in the ‘Robot Exclusion Protocol‘ telling the spiders about content they can or can’t fetch. They also have to follow another protocol. HTTP, which is the same language your web browser speaks is how the spider crawls the page.
The main thing to know about spiders, when it comes to optimizing your site, is that a spider is a computer program, and effectively, so is your web page. Computers read code systematically starting from the entry point at the top of the code and reading it line by line. Programs execute each instruction as it is encountered. That means it doesn’t view the content the same way we do.
Images for example are replaced with alt and title text. They also do a bunch of other things with some of these types of files, like video and images, but that’s far too advanced for this guide.
What you need to grasp here is that the role of the spiders is only to fetch the contents of pages and store them to be looked at some time later for indexing by the search engine.
What The Search Engine Really Does!
It’s all really quite complex and thankfully you don’t need to know all the fine details. The simplest explanation is that the search engine is the index. It’s a set of automated programs that connect to a giant database to analyze and collate all of the data its spider retrieved from each web page. That data is then processed and indexed.
Get Ready Because This is The Stuff That Most People Just Don’t Get.
Peeling Back the Layers
The very first thing that the search engine does is to process or, parse, the data that it has collected from your web page. It strips back all the information that the search engine regards as useless. Things like I-frames (and their contents), Javascript, most of the formatting etc., leaving only what is useful to the indexing process.
The result of all this leaves a collection of META data, URL locations, anchor text, and other information such as the heading tags, underlines and italics, and where the word appeared on the page. It will also take notice of the words surrounding each other word. This is particularly useful when dealing with anchor text.
Most importantly for us though are the words from your content that the spider has collected from your page.
Words That Count
If you’ve read this far then well done! You’re on your way to knowing more about search engines, search engine marketing and search engine optimization than most people ever will.
Most people think that search engines actually store an entire web page, and then read it whenever people do a search.
The truth is, it doesn’t work that way!
A search engine indexes each of the words that it found on your page. They are stored along with the results of every other web page using that word.
Think of it like this.
Every word has it’s own container. These containers are used to store all the data collected by the spider. This happens for each word, on each web page, across the entire web. This is what we will call ranking data.
As the search engine indexes page content, it places ranking data about each word on a page into each word’s container. This includes all the links, anchor text and and all those other pieces of data we’ve already discussed that are used for ranking purposes.
So if the word “widget” is found on your page. A reference to your page is now placed in the “widget” container, cataloging all of your ranking data for that word, along with all the other sites that are using the word “widget”.
It’s likely that anchor text and urls are placed into a separate index, but there’s no proof of that..yet!
AÂ search engine index is a no more than a collection of words, with information about each site where each of those words appear.
How The Search Engine Decides
It all comes down to the ranking data. This is why it’s so important to learn more about exactly which ranking data is worth spending time on for better search result and higher placement.
Each time a user queries the search engine, it then goes on to query it’s database, the index. Using all of the ranking data that it collected and sorted, it analyzes and compares that data to produce a result that is hopefully what you were looking for.
SERP’s, or Search Engine Results Pages are made up by listing the pages that rated as the most likely page to be relevant to the search query, when compared to every other site that corresponded to that search query.
Taking A Different View
There are different types of searches and a broad search is how most people use a search engine. Broad searches run a query for each of the words that occur in the search query. It’s then filtered even further to find a winner amongst all of the best sites for each word. For example the search ‘I want blue widgets’ will conduct a query for each word in that phrase.
A phrase match searches on the exact phrase forcing the search engine to include all of words in the phrase by surrounding the query in “double quotes”.
Some words are considered to add no value to the search and are referred to as Stop Words. Words like; I, and, of, if, at, the, to, are excluded from the final search query. In the example above, the word ‘I’ would be ignored leaving a search query of ‘want blue widgets’ if doing a broad search.
Altering the search type can be done using search operators. To perform an exact match search you would need to use the double quote operator like so: “search for this term”. If you’re keen you can learn all about stop words and search operators.
Considering most people perform a broad match search, most results pages are comprised of the best result for the occurrence of each individual word in the query. This has to make you wonder if chasing the long tail is actually helping much? I’ll leave that story for another day.
The actual algorithms that the search engines use are very closely guarded secrets. It’s unlikely anyone will ever know anything for certain besides the people who design them.
What we do know is that there are many things that we can do to ensure that your web pages are ranking on the front pages of the search results. As you learn more about SEO you will become very familiar with them.
What We’ve Learned
Understanding the fundamentals of how a search engine works is critical in establishing an effective search engine optimization strategy or search engine marketing strategy.
Distinguishing the facts from the myths it vital to your success. There are many popular myths that should be properly understood and ignored. What we can and should always do is test everything that we can think of.
Any of the information that you are reading here that didn’t come directly from the search engines documentation, came from testing. And it’s important to note that the only real way to know anything about search engines, search engine optimization and search engine marketing is to Test, Review, Adjust and Repeat!
In a nutshell:
- Search engines don’t index web pages, they index each of the words on a page.
- Search engines don’t query web pages they query their own index.
- Spiders don’t index web pages. They collect and store the content for processing.
- Most web pages are found through links on other sites.
- Search results are decided by comparing the overall relevance and authority of all the data for each word on each indexed page. The page that has the most points wins.
- Search engine submission is more or less a waste of time. Your site will be indexed without a submission as soon as you have links to your site.
- XML Sitemaps are used to quickly let the search engine know about new pages. This means the spider is not required. But it does not mean you can influence the spider.
- XML Sitemaps cannot prioritize pages or influence the spiders behavior in any way. Priorities in XML Sitemaps are more like a suggestion or recommendation you give to the search engine.
- Surrounding text is considered to have some great value where the words and phrases you use can help to promote the relevance of your overall content.
- Writing content for humans makes more sense because a natural and logical grouping of associated and relevant word usage occurs. It can still be keyword targeted, while not losing sight of human visitors. And that’s exactly what the search engines want to see.
Now it’s time to take what you have learned and look at your own SEO or SEM strategy. How to use this knowledge to improve your strategy and rank higher on the SERP’s?.















HelpfulAdvisor
28. Oct, 2009
WOW! What a definitive guide for search engine marketing. Andrew, you’ve left no stone unturned on this, and even I have a better understanding of this now.
Based on your examples, I’ve gone out and got an XML sitemap plugin for my site, and so I hope the increase in my traffic will be worth it.
Thanks for the detailed posting. Can’t wait to read more!
-Jay
Andrew
28. Oct, 2009
Good to see you Jay. Don’t be too disappointed if you find the XML sitemap doesn’t seem to help. Sites that have less than a few hundred pages are not likely to see any significant change. If your site doesn’t yet have more than 200-300 pages, try using a hum readable (old fashioned style) site map. By placing it on it’s own page, with a link to it in your header, you will ensure that the spider see’s the site map each time it crawls your home page.
I’ll save the details for another post but I’m sure you get the idea.
Robomaster
29. Nov, 2009
Great post! I think it can be very helpful for SEO if you know how the search engines work. Now my main problem is finding ways to create quality backlinks…
Andrew
29. Nov, 2009
Glad you enjoyed and hopefully you were able to learn some new info or at least dispel some of the myths that are so rampant in the search engine world. Backlinks are always going to be the most difficult part of the SEO process. I believe that the most important part of getting backlinks is promotion rather than content. I’m not saying content isn’t important but the world has a very diverse range of people and opinions. What some people think is total junk, others will see as the best bit of info they ever found. The more people who see you content, the more chance you have of acquiring natural backlinks. Of course you could also do some banner promotion, text links, or site sponsorship, but these will never carry the same weight as a natural link.