Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Google Sitemaps and Alias'd domains

  • Subject: Re: Google Sitemaps and Alias'd domains
  • From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
  • Date: Sat, 20 May 2006 17:43:50 +0100
  • Newsgroups: alt.internet.search-engines
  • Organization: schestowitz.com / MCC / Manchester University
  • References: <s8tt62t71pv5sb0jqe1b2gl2ue7barunr8@4ax.com> <8u2u625fb3kjses5hsu7d8ti3jhtji83he@4ax.com> <ah4u62dhcsbgrbhig94rvina8jta9ple5s@4ax.com> <235u629krj8jick7c6toi69s36omcesq64@4ax.com>
  • Reply-to: newsgroups@xxxxxxxxxxxxxxx
  • User-agent: KNode/0.7.2
[Commenting as I go along]

__/ [ Darren Tipton ] on Saturday 20 May 2006 12:09 \__

> Not sure if this is of any use to you, discovered today by experiment.
> 
> Got a main domain on a server tippy. .co . uk
> Got another domain which is aliased and then uses htaccess to point to a
> user directory on tippy .co .uk
> 
> I can register the main domain tippy .co .uk domain in google sitemaps, but
> when trying to register the alias'd domain always get a DNS timeout.
> 
> If your getting problems submitting a site to sitemaps, might be worth
> ensuring it's on a server where the domain is not an alias, or uses
> htaccess to point to a different location (such as a user directory on the
> main site).
> 
> It seems the time taken for htaccess to do the redirect is too long for
> google to accept the domain exists.
> 
> Anybody else of experience with this?


Evidently no, as everyone realised from the following message.


__/ [ Darren Tipton ] on Saturday 20 May 2006 13:30 \__

> Disregard my above email....and everything I have said about google not
> spidering my sites.


And rather than yourself being insulted by Google, it turns out that Google
are the ones insulted for being denied. Poor sweet Google... *smile*


> My ISP have just informed me that they had blocked the google spider from
> my domain due to excessive spidering ! ! !
> 
> How pi**ed off am I.


FWIW, this happened to somebody else in this newsgroup and I believe it was
Cat, if not Dmitri. Actually, I believe that Cat once told me that her Web
host blocked crawlers from accessing any of the sites on the server (going
years into the past). How bloody pathetic and terrible? Not even a word of
warning, I suspect. I wonder what motivated you to contact the ISP. And
speaking of ISP, it makes it sound like you host your Web sites at home,
which conflicts with the following...



__/ [ Darren Tipton ] on Saturday 20 May 2006 14:11 \__

> On Sat, 20 May 2006 07:58:10 -0500, Paul B <lamewolf2004@xxxxxxxxx> wrote
> on the topic of "Re: Google Sitemaps and Alias'd domains":
> 
>>Ouch.
>>Can they do that ?


I believe so, but there should probably be something in the contract, if one
exists.


>>Could be open for legal proceedings if they do as it could be seen as
>>sabotage - esp if your main income is from google.


I suppose it's worth pursuing. Some Web hosts shove too many sites on the
same box and when the load average goes too high they can just exclude the
crawlers and reduce the load significantly. DDOS attacks likewise. Some
sites get 'isolated' for being Dugg or Slashdotted. Speaking of which, 2
stories which I had dugg reached the Digg front page this morning. By doing
so, I even manage to collapse one the sites that I referenced. Stiiiirike!
*badabum SMASH*


>>I hope you can sort it out.
>>plh
>>Paul
> 
> For anyone who is interested, the hosting company is penguin-uk.com. Who,
> apart from this have been superb. Anybody with them, you might want to find
> out if your site was affected.


I have made a mental note. They give Tux a bad name, assuming they provide
Linux hosting. *smile*

I know that my host blocks a few IP addresses that are known to be nothing
but trouble. I at least get notified by looking at the 404 error logs.


> Luckily no, my main income is not from Google.  Block now removed and
> sitemaps working anyway.
> 
> I believe there is something in their T&C's that they can alter the way the
> servers work to maintain performance. As Google was crawling so quickly, it
> hammered the servers which affected all the site's on the box.


Just when you need them most, they pull the plug. Isn't that just precious?
"We would like you to well, but not *TOO* well".


> Apparantly, they block an IP range, my site just happened to be in it.
> 
> I'm trying to get them to commit to telling people when they do it again or
> pin point sites that are getting excessive crawling so we can use
> robots.txt to stop crawling of forum's etc. Apparantly it's usually
> temporary, 2 months to me does not sound temporary.


The cheeky admins should have sent you an E-mail beforehand. There are ways
of reducing the load of spiders without turning them away: Control files,
sitemaps in the case of Google, redesign, reductions, rel="nofollow",
contacting Google, etc.

Best wishes,

Roy

-- 
Roy S. Schestowitz      |    Useless fact: 21978 x 4 = 21978 backwards
http://Schestowitz.com  | Free as in Free Beer ¦  PGP-Key: 0x74572E8E
  5:30pm  up 23 days  0:27,  11 users,  load average: 2.92, 2.16, 1.71
      http://iuron.com - semantic engine to gather information

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index