Perfect Custom robots.txt for Blogger Blog

Every search Engine crawling bot first check the robots.txt file and crawling rules of the website. That means robots.txt plays a critical role in the search engine optimization (SEO) of the Blogger blog. This article will explain how to create a perfect custom robots.txt file in the Blogger / blogspot blog and optimize the blog for search engines.

The robots.txt file informs the search engine about the pages which should and shouldn’t crawl. Hence it allows us to control the functioning of search engine bots.

Valid robots.txt lines consists of a field, a colon, and a value. Spaces are optional, but recommended to improve readability. Space at the beginning and at the end of the line is ignored. To include comments, precede your comment with the # character. Keep in mind that everything after the # character will be ignored. The general format is <field>:<value><#optional-comment>.


Google supports the following fields:

user-agent: identifies which crawler the rules apply to.

allow: a URL path that may be crawled.

disallow: a URL path that may not be crawled..

sitemap: the complete URL of a sitemap.


The default robots.txt file of the Blogger Blog looks like this:

User-agent: Mediapartners-Google

Disallow:

User-agent: *

Disallow: /search

Allow: /

Sitemap: https://www.example.com/sitemap.xml

The first line (User-Agent) of this file declares the bot type. Here it’s Google AdSense, which is disallowed to none(declared in 2nd line). That means the AdSense ads can appear throughout the website.

The next user agent is *, which means all the search engine bots are disallowed to /search pages. That means disallowed to all search and label pages(same URL structure).

And allow tag define that all pages other than disallowing section will be allowed to crawl.

The next line contains a post sitemap for the Blogger blog.


This is an almost perfect file to control the search engine bots and provide instruction for pages to crawl or not crawl. Please note, here, what is allowed to crawl will not make sure that the pages will index.


But this file allows for indexing the archive pages, which can cause a duplicate content issue. That means it will create junk for the Blogger blog.


Now Let’s optimize default robots.txt file to make it best for SEO.


How to Create a Perfect Custom robots.txt file for the Blogger Blog ?

The default robots.txt allows the archive to index that causes the duplicate content issue. We can prevent the duplicate content issue by stopping the bots from crawling the archive section. For this:

/search* will disable crawling of all search and label pages.

Apply a Disallow rule /20* into the robots.txt file to stop the crawling of archive section.

The /20* rule will block the crawling of all posts, So to avoid this, we’ve to apply a new Allow rule for the /*.html section that allows the bots to crawl posts and pages.

So the new perfect custom robots.txt file for the Blogger blog will look like this:

User-agent: Mediapartners-Google

Disallow:


#below lines control all search engines, and blocks all search, archieve and allow all blog posts and pages.

User-agent: *

Disallow: /search*

Disallow: /20*

Allow: /*.html

#sitemap of the blog

Sitemap: https://www.example.com/sitemap.xml

Sitemap: https://www.example.com/sitemap-pages.xml


How to edit the default robots.txt file of blogger to put custom robots.txt ?

Go to Blogger Dashboard and click on the settings option,

Scroll down to crawlers and indexing section,

Enable custom robots.txt by the switch button.

Click on custom robots.txt, a window will open up, paste the robots.txt file, and update.


After updating the custom robots.txt file for the Blogger blog, check it by visiting https://www.example.com/robots.txt, where www.example.com should be replaced with your domain address.

 

No comments:

Post a Comment