In this post we continue our What is SEO? blog series in which we’re explaining all the aspects of SEO, so that when you’re comparing SEO services (or doing your own SEO) you can make an informed decision about what you want your SEO service to include.
Another important part of SEO is making sure search engines are able to find, read and list all the web pages on your site you want them to. At the same time, you also want to make sure they’re not reading and listing the pages you want hidden from the public.
Search engines are pretty smart, but they’re not mind readers.
Here are some situations where you need to give search engines some guidance
- When you have content on your website you don’t want search engines to link to.
- When you are developing a new website, a new section of your current website or new pages, that are live but you don’t want people finding them yet.
- When your website has a step-by-step process you want people to follow, and you don’t want search engines to link to pages part way through the process.
- When you have pages or a section of your site that is not in your navigation menu, but you want listed in search engines.
Most websites don’t have any of these special situations and the desire is to have search engines list all of the website’s pages. If that describes your website, it’s still extremely important to make sure there isn’t something in your website telling search engines to stay away from some or all of your pages.
How to Direct Search Engines
There are two files used to tell search engines which pages to list and which not to list.
sitemap.xml – an XML file that lists the URLs for a site. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs in the site. This allows search engines to crawl the site more intelligently.
robots.txt – a text file used to inform the robot about which areas of the website should not be processed or scanned.
If you want search engines to read and list all the pages of your website, you don’t need a robts.txt file. Just delete it to make sure there isn’t one in your account providing misinformation.
Specific instructions on how to create and format your website’s sitemap.xml and robots.txt files are beyond the scope of this blog post. Remember the purpose of this series is to educate so you know what needs to be done, not how to do it.
Once the sitemap.xml and robots.txt files have been created and uploaded into the hosting main folder for your website, the next step is to make sure search engines don’t have any problems viewing the pages of your website (aka crawl errors).
To do that, create an account with the Google Search Console (formerly Google Webmaster Tools) and Bing Webmaster Tools. You will need to verify ownership of your website by placing a small snippet of code on your website (not visible to your visitors). Then you can submit your sitemap to Google and Bing.
After a day or two, log back in to the Google Search Console and Big Webmaster Tools to verify they were able to read your sitemaps and to address any problems the search engines encountered when reading the pages of your website.
If this all seems like a hassle, it’s nothing compared to the hassle of finding out months down the road that some of your pages are not in the Google or Bing search results because Google or Bing couldn’t read them.
- Do you have sitemap.xml and robots.txt files for your website? Why or why not?