Google Search Console, robots.txt, meta robots, nofollow. What is the best way to remove a page from Google and keep it that way?
It’s an issue as old as the search engines, and one of the simplest most confused topics around, causing arguments between developers and SEO’s the world over. Speak to an old-school developer and some will confidently tell you that no-following or disallowing the page in robots.txt
is the best way. Despite the confusion, it is pretty straightforward. Take a look at how to easily remove a single page from the search results below.
First, let’s clear up what everything actually does. (If you already know all this, jump to the last section ‘But how do I actually remove a page?’).
Robots.txt
What is Robots.txt?
The robots.txt
file allows you to control where the crawlers go on your site. The fancy name for this is the ‘Robots Exclusion Protocol’ and it lets you specify which parts of your site shouldn’t be seen or processed.
While there are directives that allow you to no-index a page in the robots file – there has been mixed messaging from Google about whether they want you to do this.
Verdict: Do not use the robots.txt
file to remove a page. It is not an effective way to remove an existing page from the index.
WMT, GSC?? The URL removal tool (Google Search Console)
What is the URL removal tool?
Google’s Search Console is a treasure trove of useful tools and insights into your website, everyone who owns a site should have it. The removal tool allows you to temporarily (for 6 months) remove a page from the index.
Verdict: Effective in the Short term – however, to keep the page from being re-indexed… keep readin’.
Meta Robots Noindex
What is the meta name=”robots” content=”noindex” meta tag?
This is a HTML tag that you can add to any page that will tell Google not to index the page. And is written as follows: meta name="robots" content="noindex"
(don’t forget opening and closing tags, which WP stripped out in my example). Check out the Google specifications here.
Verdict: Effective long term. This will keep your page from being indexed by the search engines if applied correctly.
Nofollow
What is nofollow?
Depending on the context and where the nofollow directive is added. It can mean two things. It is either applied to individual links with the rel=”nofollow” attribute, or it can be applied in a meta robots directive which tells Google not to follow any of the links on the page you’ve added it to.
Verdict: no, do not use this to remove an existing page from the index.
But how do I actually remove a page?
So, now we have all this information.
Firstly, if you do not have Google Search Console verified, go do that, then come back. Secondly, you need to have access to the HTML of individual pages on your website. If you are using a CMS like WordPress, install the Yoast plugin and then follow these instructions.
If you are using another content management system, search ‘*your CMS* noindex a page’. e.g. ‘Drupal noindex a page’ If you are not using WordPress or another content management system and you have access to the HTML of your pages, add the following code between the HEAD tags of the offending page: meta name="robots" content="noindex"
(again, mind the opening and closing tags which were stripped here by WP).
Thirdly, go to your verified Search Console property and navigate to the ‘Google Index’ dropdown and then the ‘Remove URLs’ section. Paste the URL that you want to remove in there (which should hopefully have the newly added meta robots tag on it) and Robert’s your fathers brother. Your page should be removed for good!
*Thugnotes
Make sure not to block (disallow) the page with robots.txt, as Google will not be able to ‘see’ the noindex directive, and you will end up with the following message in the search results where your meta description should be ‘A description for this result is not available because of this site’s robots.txt’. (for an example, Google ‘gmail’ and look at the top organic result – this is becuase the page is blocked by the robots file and has been for years… who knows why. Maybe it’s so I can give you this handy example.
Then once all the above has been done and your page has been deindexed, you can shout at your developer and say ‘see, I told you Simon.’
Learn more about robots.txt
- Introduction to robots.txt (Search Console Help)
- Robots.txt specifications (Google Search guides for Developers)
- How to create a robots.txt file (Bing Webmaster Help)
- Robots Exclusion Protocol (IETF)
- Robots Exclusion Standard (Wikipedia)
If you are having issues with this or you have a more complex indexing problem that you need help with. Get in touch and we can help you.