Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Google & dynamic pages weirdness

  • Subject: Re: Google & dynamic pages weirdness
  • From: Davémon <nospam@xxxxxxxxxx>
  • Date: Mon, 12 Sep 2005 09:06:33 +0100
  • In-reply-to: <dfr836$3b9$1@godfrey.mcc.ac.uk>
  • Newsgroups: alt.internet.search-engines
  • Organization: datanet.co.uk
  • References: <5a0a5$43208773$504427df$5106@datanet.co.uk> <dfr836$3b9$1@godfrey.mcc.ac.uk>
  • User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)
  • Xref: news.mcc.ac.uk alt.internet.search-engines:66460
Roy Schestowitz wrote:
__/ [Davémon] on Thursday 08 September 2005 19:48 \__

http://www.google.com/search?q=site:www.nightsoil.co.uk+eastenders&filter=0

All 69 results point to the same 2 pages, which are just database driven
urls with minor varients.

I can't recall the the site structure that you previously had (it looked nice), but it seems as if the older blogging tool pointed to entries by invoking a PHP file and passing it some arguments.

The new one does the same, but uses mod-rewrite to make it look otherwise. Much more SE friendly!


Some CMS packages will
have a single fixed address with an argument like ?page=155, in which case
search engines will index locations in a peculiar, non-elegant manner.

I know there were historical problems with the ? and id etc. What I find amusing is that Google can't tell that 69 different urls which have _exactly the same_ content on them are the same. (I'd conclude Google does no pattern matching on content to test alike-ness or duplication, and treats URLs as unique).


So when Google says it has a gazillion pages in its index, you can safely assume a significant (?) percentage are urls with 'trailing garbage' rather than actual different pages.

Be careful because PageRank can be dropped as a consequence. Maybe someone
can correct me and say that the custom error pages will preserve PageRank
given the right retrieval status.

Thanks for pointing that out. I hadn't really considered PageRank. Hopefully next time Google spiders the site, it will drop the old stuff and pick up the new. I'll think about a simple noindex,follow for the 'error' page, and possibly pre-running the search query with the q= from the referer.



--

Davémon
http://www.nightsoil.co.uk

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index