There are various methods
through which Google
censors and manipulates SERPs, news, blogs and other
results. Google rely heavily upon the “crawling methodology” to
produce results as per its own requirements. Google may create a 503
error or other such error. However, such an error is apparent if you
analyse the webmaster tool.
It seems Google has developed another innovative
method of censorship of posts that it finds controversial. Google
seems to be manipulating the robots text file (robots. txt) to block
even those posts and segments that are not, by design, supposed to be
blocked.
The worst part is that it is done in a clandestine
manner and you cannot do much even if you thoroughly analyse the
webmaster tool. Today I spend an entire day to understand why
the post titled “cyber
forensics and Indian approach” was censored by Google.
I analysed the webmaster tool and found the message
telling me that the “health” of the blog titled Cyber
Forensics in India is not in good shape. The exact message
reads “Severe health issues are found on your site - Check site
health”. Upon further analysing the problem, the webmaster tool
told that “some important page is blocked by robots.txt”.
I first analysed the message that reads “Is
robots.txt blocking important pages?” and it returned
the message that reads “The page you are trying to reach does not
exist (404)”. I then tried to analyse the important page that has
been blocked by the robots.txt file and it gave me this
page.
Before proceeding further, let us check the standard
robots.txt file of Blogspot blogs. The standard robots.txt file in
case of the present blog (and all other Blogspot blogs as it is
similar except the blog address) is as follows:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap:
http://cyberforensicsofindia.blogspot.com/feeds/posts/default?orderby=UPDATED
It is clear that the only thing that has been
disallowed by the Blogspot robots.txt file is the /search directory
and its sub folders. All other directories and their sub folders are
crawlable and accessible to not only Google’s bots but also
crawling bots of other search engines as well.
Now when we clicked upon the important page that has
been blocked by robots.txt file of our Blog, it took us here.
Now this is absurd on at least two counts. Firstly, this is bound to
be blocked due to the blocking of search directory so there is
nothing unnatural as such. This cannot be termed as a “serious
health issue” for the blog.
Secondly, there is no entry or record of the post
that has been censored by Google at all. There is no error, either
crawling or indexing. There are no malware issues. There are no pages
removal issues involved as well.
Clearly, whatever happened to that post happened at
the level of Google and Google owes an explanation to us in this
regard. We are aware that we are not facing this issue alone and
there are tons of examples where these issues have arisen and
resolved at Google.
However, we saw no reason for the blocking,
filtering, censorship or deindexing of our post. It is time for
Google to explain.