What is Robots.txt? How to Create the Perfect Robots.txt File for SEO

0
678
Perfect Robots.txt

The search engine such as Google uses them to index the web content. That is called The Robots Exclusion Protocol. Actually, robots.txt is a file webmaster create to instruct search engine robots or crawlers to index than on search engines or exclude them from a search engine.

Search engines like Google, Bing, Yahoo sends spiders or crawlers to crawl your website. When these crawlers or web spiders reach your site they first go through your robots.txt file to check any robots exclusion protocol that is before crawling and indexing your pages.

If they find exclusion protocols on robots.txt files then they will exclude those.

Let’s me share some examples,

#1 Blocking all web crawlers from all content

user-agent: *
Disallow:/

#2 Blocking a specific web crawler from a specific folder

user-agent: Googlebot
Disallow: /folder/

#3 Blocking some not required directories

User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /xmlrpc.php
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /category/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /*?
Allow: /wp-content/uploads/

As I’ve already explained all above,

Robots.txt is a normal text file which is available on all websites that is used by a webmaster to advise these crawlers about accessing several pages on a website.The pages that are restricted in your robots.txt file will won’t be crawled and indexed in search results. However, all those pages are viewable publicly to normal humans.

If you want to check your blog robots.txt file then just put robots.txt after your blog URL. i.e https://yourwebsiteurl.com/robots.txt.

User-agent: * Disallow: /
User-agent:* The user -agent * means this section applies to 
all robots. 
Disallow: /  The Disallow: / tell the robot that it should 
not visit any pages on the site.

Robots.txt file on Blogger; You can set it from the blogger’s dashboard, it’s very simple and easy to use and must have filed for bloggers as well.

User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://yoursite.com/feeds/posts/default?orderby=UPDATED

Wrapping it up

Make sure you are disallowing all the not required directories and categories or pages which are not needed on a search engine. Those are the main things. If you are using WordPress then make sure you disallow tags as well. Also, make sure you are disallowing dynamic URLs by using robots.txt

LEAVE A REPLY

Please enter your comment!
Please enter your name here