First, let’s clear up what everything actually does. (If you already know all this, jump to the last section ‘But how do I actually remove a page?’).
What is Robots.txt?
robots.txt file allows you to control where the crawlers go on your site. The fancy name for this is the ‘Robots Exclusion Protocol’ and it lets you specify which parts of your site shouldn’t be seen or processed.
While there are directives that allow you to no-index a page in the robots file – there has been mixed messaging from Google about whether they want you to do this.
Verdict: Do not use the
robots.txt file to remove a page. It is not an effective way to remove an existing page from the index.
Webmaster Tools URL removal tool (Search Console)
What is the URL removal tool?
Google’s search console is a treasure trove of useful tools and insights into your website, everyone who owns a site should have it. The removal tool allows you to temporarily remove a page from the index.
Verdict: Effective in the Short term – however, to keep the page from being re-indexed… keep readin’.
Meta Robots Noindex
What is the meta name=”robots” content=”noindex” meta tag?
This is a HTML tag that you can add to any page that will tell Google not to index the page. And is written as follows:
meta name="robots" content="noindex" (don’t forget opening and closing tags, which WP stripped out in my example). Check out the Google specifications here.
Verdict: Effective long term. This will keep your page from being indexed by the search engines if applied correctly.
What is nofollow?
Depending on the context and where the nofollow directive is added. It can mean two things. It is either applied to individual links with the rel=”nofollow” attribute, or it can be applied in a meta robots directive which tells Google not to follow any of the links on the page you’ve added it to.
Verdict: no, do not use this to remove an existing page from the index.
But how do I actually remove a page?
So, now we have all this information.
Firstly, if you do not have Webmaster Tools verified, go do that, then come back. Secondly, you need to have access to the HTML of individual pages on your website. If you are using a CMS like WordPress, install the Yoast plugin and then follow these instructions.
If you are using another content management system, search ‘*your CMS* noindex a page’. e.g. ‘Drupal noindex a page’ If you are not using WordPress or another content management system and you have access to the HTML of your pages, add the following code between the HEAD tags of the offending page:
meta name="robots" content="noindex" (again, mind the opening and closing tags which were stripped here by WP).
Thirdly, go to your verified Webmaster Tools (Search Console) property and navigate to the ‘Google Index’ dropdown and then the ‘Remove URLs’ section. Paste the URL that you want to remove in there (which should hopefully have the newly added meta robots tag on it) and Robert’s your fathers brother. Your page should be removed for good!
Make sure not to block (disallow) the page with robots.txt, as Google will not be able to ‘see’ the noindex directive, and you will end up with the following message in the search results where your meta description should be ‘A description for this result is not available because of this site’s robots.txt’. Then you can shout at your developer and say ‘see, I told you Simon.’
Learn more about robots.txt
- Introduction to robots.txt (Search Console Help)
- Robots.txt specifications (Google Search guides for Developers)
- How to create a robots.txt file (Bing Webmaster Help)
- Robots Exclusion Protocol (IETF)
- Robots Exclusion Standard (Wikipedia)
If you are having issues with this or you have a more complex indexing problem that you need help with. Get in touch and we can help you.