Why Pages Disallowed in robots.txt Still Appear in Google

Meaning of Robots.txt

Robots.txt is a useful file which places in your website’s root and controls how search engines index your pages. One of the most useful declarations is “Disallow” — it stops search engines accessing private or irrelevant sections or pages of your website, e.g.

Disallow: /temp/
Disallow: /mypage.html/

You can Even Block Search Engines Indexing Every Page on Your Domain, e.g.:

User-agent: *
Disallow: /

Blocked Pages can Still Appear in Google – HOW?

Take a little while to understand how and why it happens. Assume you have a page at http://www.abc.com/mypage.html containing confidential information about your company’s new “coupon codes” project. You may want to share that page with partners, but don’t want the information to be public knowledge just yet. Therefore, you block the page using a declaration in http://www.abc.com/robots.txt:

User-agent: *
Disallow: /mypage.html

A few weeks later, you’re searching for “coupon codes” in Google and you found http://www.abc.com/mypage.html at 1st Page of Google. How could this happen? It means, Google abides with your robots.txt instructions, isn’t?

However, this is not a violation of robots.txt rules. This happens because of very simple reason that Google found your link from elsewhere, means http://www.abc.com/mypage.html might be linked from any external website, so Google caught you from there. Meta information also comes from that particular external link, not from your page content.

There are Several Solutions that will Stop Your Pages Appearing in Google Search Results:

  • Set a “no index” Meta Tag: Google will never show your page or follow its links if you add this code to your HTML head section:

  • Use the URL removal tool: Google offer a URL removal tool within their Webmaster Tools.

  • Add authentication: Apache, IIS, and most other web servers offer basic authentication facilities. The visitor must enter a user ID and password before the page can be viewed. This may not stop Google showing the page URL in results, but it will stop unauthorized visitors reading the content.

Turn More of Your Website Visitors into Actual Customers – HOW?

Your website will only cost you money if your website visitors don't become customers of your company. For that reason, it is important that your website is being able to convert your visitors into actual customers.

  • What is your website about and what's in it for the visitor?
    Clearly tell your website visitors which problem your product solves and why your solution is better than other solutions. This piece of information should be presented as prominently as possible.

  • Is your website trustworthy?
    Tell your visitors who you are and don't hide your address. If possible, show customer testimonials. Show your website visitors that you are a real company and that you can be trusted.

  • Does your website look professional?
    Another way to create trust is to use a professional website design. Avoid clutter and extreme colors. Use a clear design that makes it easy to read your web pages.

  • Make it as easy as possible for your website visitors:
    Add user instructions to every task. The less your website visitors have to think about how to complete the task, the better.

  • Make it a risk-free experience for your customers:
    Emphasize your respect of privacy and remove unnecessary fields from your sign-up forms. Avoid registration forms or make the forms as short as possible. The more fields a form has, the less people will fill it out.

  • Give search engines what they want while pleasing your website visitors:
    The title and the major tag line on your web page should contain the main selling proposition of your website and one or two good keywords. When a searcher sees your website in the search results, the title will show the searcher that he has found a good match. The content on your web page should confirm and clarify this with more detail.