Robots.txt is a standard file that used by websites to communicate with web crawlers and other web robots to inform them about which webpages and areas of the website they should not be processed or scanned or indexed on the search engine results.
What does Robot.txt Do?
A robots.txt file tells web robots, also known as crawlers, which pages or files the domain owner doesn’t want them to ‘crawl’. Bots visit your website and then index (save) your web pages and files before listing them on search engine result pages.
If you don’t want certain pages or files to be listed by Google and other search engines, you need to block them using your robots.txt file.
You can check if your website has a robots.txt file by adding /robots.txt immediately after your domain name in the address bar at the top:
How Does Robot.txt Work?
Before a search engine crawls your website, it looks at your robots.txt file for instructions on what pages they are allowed to crawl and index in search engine results.
Robots.txt files are useful if you want search engines not to index:
1) Duplicate or broken pages on your website.
2) Internal search results pages.
3) Certain areas of your website or an entire domain.
4) Certain files on your website such as images and PDFs.
Using robots.txt files allows you to eliminate pages which add no value, so search engines focus on crawling the most important pages instead. Search engines have a limited “crawl budget” and can only crawl a certain number of pages per day, so you want to give them the best chance of finding your pages quickly by blocking all irrelevant URLs.
You may also implement a crawl delay, which tells robots to wait a few seconds before crawling certain pages, so as not to overload your server. Beware that Googlebot doesn’t acknowledge this command, so instead optimize your crawl budget instead for a more robust and future-proof solution.
How to Create a Robots.txt File?
If you don’t currently have a robots.txt file, it’s advisable to create one as soon as possible. To do so, you need to:
1) Create a new text file and name it “robots.txt” – Use a text-editor such as the Notepad program on Windows PCs or TextEdit for Macs and then “Save As” a text-delimited file, ensuring that the extension of the file is named “.txt”.
2) Upload it to the root directory of your website – This is usually a root-level folder called “htdocs” or “www” which makes it appear directly after your domain name.
3) Create a robots.txt file for each sub-domain – Only if you use any sub-domains.
4) Test – Check the robots.txt file by entering yourdomain.com/robots.txt into the browser address bar.