I was somehow surprised at how much page not found (404) errors the Google Webmasters Tools found on mikelopez.info that it made me think that it is the reason behind my recent PR drop. About two weeks ago, Google only reported 600+ page not founds but when I checked today, it showed 3000+ 404 errors! Google’s got to be crawling these URLs from somewhere and it’s definitely not from my site.
A sample of the URL is:
http://mikelopez.info/Acer_Travelmate_4502.cfm?…
Here’s a link to the complete CSV file that Google produced showing all the 404 errors. A common pattern I noticed is that it’s all using .cfm (Coldfusion) extensions so I decided to just block such URLs from being crawled in robots.txt. So my robots.txt now looks like this:
User-agent: *
Disallow: /*.cfm
Now, I’ll just wait and see what happens next.
Quick update: I also noticed the same problem for both science.mikelopez.info and religion.mikelopez.info. Both have 600+ 404s of the same kind.
Anybody else having problems similar to this?
October 29th, 2007 at 6:20 am
I never experienced this kind of problem/errors but I did get spam that contains porn links that’s supposedly within my site domain. Something like http://jaypeeonline.net/wp-admin/porn.html but it doesn’t exist in the folder.
October 29th, 2007 at 8:47 am
How many spam URLs like these did you get?