The search engine such as Google uses them to index the web content. That is called The Robots Exclusion Protocol. Actually, robots.txt is a file webmaster create to instruct search engine robots or crawlers to index than on search engines or exclude them from a search engine.
Search engines like Google, Bing, Yahoo sends spiders or crawlers to crawl your website. When these crawlers or web spiders reach your site they first go through your robots.txt file to check any robots exclusion protocol that is before crawling and indexing your pages.
If they find exclusion protocols on robots.txt files then they will exclude those.
Let’s me share some examples,
#1 Blocking all web crawlers from all content
user-agent: * Disallow:/
#2 Blocking a specific web crawler from a specific folder
user-agent: Googlebot Disallow: /folder/
#3 Blocking some not required directories
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /xmlrpc.php Disallow: /wp-content/plugins/ Disallow: /wp-content/cache/ Disallow: /wp-content/themes/ Disallow: /trackback/ Disallow: /feed/ Disallow: /comments/ Disallow: /category/ Disallow: /trackback/ Disallow: /feed/ Disallow: /comments/ Disallow: /*? Allow: /wp-content/uploads/
As I’ve already explained all above,
Robots.txt is a normal text file which is available on all websites that is used by a webmaster to advise these crawlers about accessing several pages on a website.The pages that are restricted in your robots.txt file will won’t be crawled and indexed in search results. However, all those pages are viewable publicly to normal humans.
If you want to check your blog robots.txt file then just put robots.txt after your blog URL. i.e https://yourwebsiteurl.com/robots.txt.
User-agent: * Disallow: / User-agent:* The user -agent * means this section applies to all robots. Disallow: / The Disallow: / tell the robot that it should not visit any pages on the site.
Robots.txt file on Blogger; You can set it from the blogger’s dashboard, it’s very simple and easy to use and must have filed for bloggers as well.
User-agent: Mediapartners-Google Disallow: User-agent: * Disallow: /search Allow: / Sitemap: http://yoursite.com/feeds/posts/default?orderby=UPDATED
Wrapping it up
Make sure you are disallowing all the not required directories and categories or pages which are not needed on a search engine. Those are the main things. If you are using WordPress then make sure you disallow tags as well. Also, make sure you are disallowing dynamic URLs by using robots.txt