Roy Schestowitz wrote:
__/ [Davémon] on Thursday 08 September 2005 19:48 \__
http://www.google.com/search?q=site:www.nightsoil.co.uk+eastenders&filter=0
All 69 results point to the same 2 pages, which are just database driven
urls with minor varients.
I can't recall the the site structure that you previously had (it looked
nice), but it seems as if the older blogging tool pointed to entries by
invoking a PHP file and passing it some arguments.
The new one does the same, but uses mod-rewrite to make it look
otherwise. Much more SE friendly!
Some CMS packages will
have a single fixed address with an argument like ?page=155, in which case
search engines will index locations in a peculiar, non-elegant manner.
I know there were historical problems with the ? and id etc. What I find
amusing is that Google can't tell that 69 different urls which have
_exactly the same_ content on them are the same. (I'd conclude Google
does no pattern matching on content to test alike-ness or duplication,
and treats URLs as unique).
So when Google says it has a gazillion pages in its index, you can
safely assume a significant (?) percentage are urls with 'trailing
garbage' rather than actual different pages.
Be careful because PageRank can be dropped as a consequence. Maybe someone
can correct me and say that the custom error pages will preserve PageRank
given the right retrieval status.
Thanks for pointing that out. I hadn't really considered PageRank.
Hopefully next time Google spiders the site, it will drop the old stuff
and pick up the new. I'll think about a simple noindex,follow for the
'error' page, and possibly pre-running the search query with the q= from
the referer.
--
Davémon
http://www.nightsoil.co.uk
|
|