What is a Robots.txt File?
The robots.txt is a very small but important file located in the root directory of your website. It tells web crawlers (robots) which pages or directories can or cannot be crawled.
The robots.txt file can be used to block search engine crawlers entirely or just restrict their access to certain areas of your website. Below, is an example of a very basic WordPress robots.txt file:
This can look a little confusing at first so I will go over what some of this stuff means.
- User-agent: is there to specify directions to a specific robot. In this case we used “*†which applies to all robots.
- Disallow: is there to tell the robots what files and folders they should not crawl.
- Allow: tells a robot that it is okay to crawl a file in a folder that has been disallowed.
- Sitemap: is used to specify the location of your sitemap.
There are other rules that can be used in the robots.txt file such as Host: and Crawl-delay: but these are uncommon and only used in specific situations.
What is the Robots.txt File Used For?
Every website that is crawled by Google has a crawl budget. Crawl budget is basically a limited number of pages that Google can crawl at any given time. You don’t want to waste your crawl budget on pages that are low quality, spammy or not important.
This is where the robots.txt file comes in. You can use your robots.txt file to specify which pages, files and directories Google (and other search engines) should ignore. This will allow search engine bots to keep the priority on your important high-quality content.
Below are some important things you might want to consider blocking on your WordPress website:
- Faceted navigation and session identifiers
- On-site duplicate content
- Soft error pages
- Hacked pages
- Infinite spaces and proxies
- Low quality and spam content
This list comes straight from the Google Webmaster Central Blog. Wasting your crawl budget on pages like the ones listed above will reduce crawl activity on the pages that do actually have value. This can cause a significant delay in indexing the important content on your website.
What You Should Not Use the Robots.txt For
The robots.txt should not be used as a way to control what pages search engines index. If you’re trying to stop certain pages from being included in search engine results, you should use noindex tags or directives, or password-protect your page.
The reason for this is because the robots.txt file does not actually tell search engines to not index content. It just tells them not to crawl it. While Google will not crawl disallowed areas from within your own website, they do state that if an external link points to a page that you have excluded, it may still get crawled and indexed.
Is a Robots.txt File Required in WordPress?
Having a robots.txt file for your WordPress website is certainly not required. Search engines will still crawl and index your website as they normally would.
However, you will not be able to exclude any pages, files or folders that are unnecessarily draining your crawl budget. As I explained above this can greatly increase the amount of time it takes Google (and other search engines) to discover new and updated content on your website.
So, all in all, I would say no a robots.txt file is not required for WordPress, but it’s definitely recommended. The real question here should be, “Why would you not want one?â€
How to Create a WordPress Robots.txt File
Now that you know what a robots.txt is and what it is used for, we will take a look at how you can create one. There are three different methods and below I will go over each one.
1. Use a Plugin to Create the Robots.txt
SEO plugins like Yoast have an option to create and edit your robots.txt file from within your WordPress dashboard. This is probably the easiest option.
2. Upload the Robots.txt Using FTP
Another option is to just create the .txt file on your computer using notepad (or something similar) and name it robots.txt. You can then upload the file to the root directory of your website using an FTP (File Transfer Protocol) such as FileZilla.
3. Create the Robots.txt in cPanel
If neither of the above options works for you, you can always log into your cPanel and create the file manually. Make sure you create the file inside your root directory.
How to Optimize Your Robots.txt For WordPress
So, what should be in your WordPress robots.txt? You might find this surprising, but not a whole lot. Below, I will explain why.
Google (and other search engines) are constantly evolving and improving, so what used to be the best practice doesn’t necessarily work anymore. Nowadays Google not only fetches your websites HTML but it also fetches your CSS and JS files. For this reason, they do not like it when you block any files or folders needed to render a page.
In the past it was ok to block things like the /wp-includes/ and /wp-content/ folders. This is no longer the case. An easy way to test this is by logging into your Google Webmaster Account and testing the live URL. If any resources are being blocked from Google Bot they will complain about it in the Page Resources tab.
Below, I have put together an example robots.txt file that I think would be a great starting point for anyone using WordPress.
User-agent: *
# Block the entire wp-admin folder.
Disallow: /wp-admin/
# Blocks referral links for affiliate programs.
Disallow: /refer/
# Block any pages you think might be spammy.
Disallow: /spammy-page/
# Block any pages that are duplicate content.
Disallow: /duplicate-content-page/
# Block any low quality or unimportant pages.
Disallow: /low-quality-page/
# Prevent soft 404 errors by blocking search pages.
Disallow: /?s=
# Allow the admin-ajax.php inside wp-admin.
Allow: /wp-admin/admin-ajax.php
# A link to your WordPress sitemap.
Sitemap: https://example.com/sitemap_index.xml
Some of the things I included in this file are just examples. If you don’t feel like any of your pages are duplicate, spammy or low quality you don’t have to add this part. This is just a guideline, everyone’s situation will be different.
Remember to be careful when making changes to your website robots.txt. While these changes can improve your search traffic, they can also do more harm than good if you make a mistake.
Test Your WordPress robots.txt File
After you have created and customized your robots.txt it’s always a good idea to test it. Sign in to your Google Webmaster account and use this Robots Testing Tool. This tool operates as Googlebot would to check your robots.txt file and verifies that your URL’s have been blocked properly.
Similar to the picture above you will see a preview of your robots.txt file as Google would see it. Verify that everything looks correct and that there are no warnings or errors listed.
That’s it! you should be set up and ready to go now.
My Final Thoughts
As you can see, the robots.txt is an important part of your website’s search engine optimization. If used properly, it can speed up your crawl rate and get your new and updated content indexed much faster. Nevertheless, the misuse of this file can do a lot of damage to your search engine rankings so be careful when making any changes.
Hopefully, this article has given you a better understanding of your robots.txt file and how to optimize it for your specific WordPress needs. Be sure to leave a comment if you have any further questions.
Amy Green is a freelance writer, front-end developer and entrepreneur. You can find her at Zuziko for writing tutorials, guides and reviews on the popular WordPress Content Management System.