Tag Archives: google

MPG/SFX got deep indexed by Google

End of 2008 we noticed that the Googlebot started to deep crawl the MPG/SFX link resolver by following distinct OpenURLs. This finding was surprising because we haven’t expected any free available website to promote deep links to dynamic pages created by the MPG/SFX server. There are some indicators that the assumption could have been wrong, e.g. Yahoo’s Site Explorer meanwhile counts 5,365 Inlinks to "sfx.mpg.de" in total. This deserves some additional checking!

In addition, I just learned from a post on the Google webmaster blog, that the web form offered by the MPG/SFX Citation Linker could have been used to crawl the server as well. But this is not very obvious because I’m pretty sure that Google doesn’t rate MPG/SFX as a "high-quality site".

The cause remained undetermined; however, the number of requests from Googlebot had significant impact on the statistics created for the SFX service. Therefore, we refined the robots.txt last December to disallow indexing of the relevant directories and started to forget about it… until today. Today, we learned from a feedback that any Internet user may accidentally stumble about an empty MPG/SFX menu and doesn’t feel well served – which is totally understandable. Unfortunately, it looks like Google’s index still includes a high amount of links to "sfx.mpg.de":
Google search result

Hm, it looks like modifying the robots.txt is not a very straightforward way to remove content from Google. Meanwhile, we used the URL removal request tool offered by Google Webmaster Tools. Let’s see if this will reduce the number of superfluous requests.