Re: Google & dynamic pages weirdness

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: Google & dynamic pages weirdness

Subject: Re: Google & dynamic pages weirdness
From: Davémon <nospam@xxxxxxxxxx>
Date: Mon, 12 Sep 2005 09:06:33 +0100
In-reply-to: <dfr836$3b9$1@godfrey.mcc.ac.uk>
Newsgroups: alt.internet.search-engines
Organization: datanet.co.uk
References: <5a0a5$43208773$504427df$5106@datanet.co.uk> <dfr836$3b9$1@godfrey.mcc.ac.uk>
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)
Xref: news.mcc.ac.uk alt.internet.search-engines:66460

Roy Schestowitz wrote:

__/ [Davémon] on Thursday 08 September 2005 19:48 \__

http://www.google.com/search?q=site:www.nightsoil.co.uk+eastenders&filter=0

All 69 results point to the same 2 pages, which are just database driven
urls with minor varients.


I can't recall the the site structure that you previously had (it looked
nice), but it seems as if the older blogging tool pointed to entries by
invoking a PHP file and passing it some arguments.

The new one does the same, but uses mod-rewrite to make it look otherwise. Much more SE friendly!

Some CMS packages will
have a single fixed address with an argument like ?page=155, in which case
search engines will index locations in a peculiar, non-elegant manner.

I know there were historical problems with the ? and id etc. What I find amusing is that Google can't tell that 69 different urls which have _exactly the same_ content on them are the same. (I'd conclude Google does no pattern matching on content to test alike-ness or duplication, and treats URLs as unique).

So when Google says it has a gazillion pages in its index, you can safely assume a significant (?) percentage are urls with 'trailing garbage' rather than actual different pages.

Be careful because PageRank can be dropped as a consequence. Maybe someone
can correct me and say that the custom error pages will preserve PageRank
given the right retrieval status.

Thanks for pointing that out. I hadn't really considered PageRank. Hopefully next time Google spiders the site, it will drop the old stuff and pick up the new. I'll think about a simple noindex,follow for the 'error' page, and possibly pre-running the search query with the q= from the referer.

--

Davémon
http://www.nightsoil.co.uk

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index