Introduction About Site Map

XML
RSS 2 Feed RSS 2 Feed
Navigation

Main Page | Blog Index

Archive for the ‘SEO’ Category

Googlebombing

Dynamite Monkey

“Googlebombing” is the term used to describe a situation where inappropriate SERP‘s get returned due to intentional (hence unnatural) intervention with the Google algorithms. Anchor text of links (text in the vicinity of a hyperlink) gets heavily used for classification of pages and this observation can lead to overt misuse.

I can recall that in the past, due to large amount of poker-related spam, bloggers decided to link en masse to the Wikipedia page on poker, which in turn made it a prime search engine result for the term “poker”. Therein, poker and gambling were discouraged as well as the practices they use to promote themselves, destroying cyberspace in the process.

More recently, there has been a lot of talk about binding between the term “failure” and the White House, in particular George Bush’s official page. The voices of witnesses have grown loud enough to justify a post in the Google Blog.

If you do a Google search on the word [failure] or the phrase [miserable failure], the top result is currently the White House’s official biographical page for President Bush. We’ve received some complaints recently from users who assume that this reflects a political bias on our part. I’d like to explain how these results come up in order to allay these concerns…

My vague recollection tells me that several years ago, Google humour got involved when searching for the term “weapons of mass destruction”.

PageRank Prediction and SEO Tools

Crystal ball

THERE appears to have been a recent growth in PageRank and SEO analysis tools. Such tools have an on-line front-end (interface) so they are easily and readily accessible. Here is a short survey of tools that not only have I used, but I can also confirm are valuable:

Black Hat SEO

Cowboy hat
Blackhat SEO’s: under the illusion that they are talented gunslingers, shooting from the hip

I have recently become aware of highly dirty practices, which sometimes get used by malovalent SEO‘s. ‘Googleating’ is one of the most vile SEO methods, perhaps only second to spam. I have been told about someone who developed a habit of conquering deleted blogs immediately once they had been freed. Then, it was promptly possible to ‘fuel’ his own adverts-filled public content sites, making use of merit (mainly in the form of Google PageRank), which got transferred owing to links in these recently-acquired blogs.

Further on the issue of dodgy practices, in order to compensate for banishment from search engines, that same person bought multiple domains; thus, he avoided putting all eggs in one basket.

How did I come to find this out? There is an exceptional user in a benevolent SEO forum that I participate in. He uses blackhat techniques and gives the group a bad reputation that may sooner or later draw attention from search engines, which in turn can inflict collective punishments on group participants. If justice pervails, bad practices will be choked and genuine sites will receive more referral traffic from search engines. The Internet is no place for mirrored public content, neither it is for link spam.

PageRank and Traffic

Google’s PageRank mechanism refelects on popularity estimates in most engines and directories, which presently adopt similar ranking methods that are based on the notion of citations (links).

The average webmaster would fall victim to an illusion. It always appears as if the majority of other sites have higher ranks. A question then springs to mind: how come most pages being visited have PageRank 4 or higher, for example, if they are only a small minority? The matter of fact is that much of the Web, including one’s own site have much lower ranks. That remote part of the Web is often virtually invisible to the errant surfer.

Let us recognise the fact that Google, Yahoo and CNN, for example, receive far more traffic than other sites. Their traffic is many orders of magnitude higher than that of the average site. To visualise this, I drew two exponential curves. Some figures suggest that the curves I drew for illustrational purposes should be far steeper. Some would suggest that at any level of PageRank the number of sites may be 3 times smaller and the traffic 3 times greater than that of the previous PageRank level.

PageRank versus traffic
The number of sites with PageRank 10 is tiny when compared to the number of sites with PageRank 0. Conversely, traffic is largely centralised in sites with a high PR.

To give an idea of how vast the World Wide Web actually is, the number of sites is slowly approaching 100 million (all registered global domains, as well individual parts of the world). At present, PageRank 0 may fit 50 million sites (some are not identified, not listed, or parked), PageRank 2: 5 million sites, PageRank 4: 500,000 sites and so forth. Of course this rough guess does not lead to the true numbers, which end up at just dozens of sites with PageRank 10.

In conclusion, always remember that the vast majority of sites is on the left-hand-side of the figure above, whereas some of the far more popular pages are on the (comparably) tiny number of pages on the right, e.g. the Google’s search page, the BBC front page or the W3 consortium pages.

Content Spam Prevention

Browser searchContent spam grows worryingly fast. This new type of spam shows its face in a form which is different from spam that we very well know as uninvited E-mail. Such spam is placed on the Web and later infiltrates search engine results pages (SERP’s). If you recently came across a page full of links and ads that did not provide useful information, you would probably understand what content spam is.

Spam content pages are often generated automatically (robot-made). Such robots would run a search, pick up HTML code from the results page, and finally add some advertisement. This generates a large volume of pages, which warrant presence in many SERP‘s. One very large site that I recently came across got tempted and uses this technique to woo visitors and offer them subscription. If you come across such sites, be sure to report them as its our only chance of banning illicit sites and preventing content spam (and referrer spam, i.e. organic and fake links/referrals to sites) from expanding. Google provide an on-line form for this very particular purpose.

Dangers of RSS Sitemaps

Sitemaps are probably the most crucial pages in all Web sites. They might not be most helpful to human visitors, but they greatly assist crawlers, thereby attracting, promoting and inviting more traffic from search engines. Site maps can be perceived as ‘crawling maps’, regardless of how shallow or deep they are. These maps not just the spine, but possibly the entire skeleton of complex Web sites where crawling will take a huge number of distinct routes through pages.

Map of EuropeGoogle have recently introduced RSS sitemaps. This means that new site content will be appended to the map whose form is a long aggregated feed, i.e. links with minimal content and without unimportant media and layout detail. This move by Google encouraged many Webmasters to go on the RSS wagon and XML their Web sites. This benefits Google in a variety of ways. First of all, there is a clear pairing between content and dates. If the site delivers timely news, this will become a significant factor. Secondly, as any site is described fully by its map, there is no need for repeated crawling. There are significant savings in terms of bandwidth, which appear to allow search engines to crawl a sparser portion of the Web, as well as reduce the burden on Web servers, whose load involves a great deal of pages being served to crawlers or robots.

There are hidden dangers in moving towards RSS sitemaps. Typical HTML/XHTML/other site maps get neglected as they are laborious to maintain and as time goes by they may appear redundant, much like an older generation of pages that have gone completely out of date. One of the worrying implications is that opponents — in this particular case MSN and Yahoo predominantly — get ‘robbed’ of the true, old-styled site maps, which are conceded altogether or simply neglected in favour of RSS sitemaps. Therefore, they must follow suit and take advantage of the public RSS site maps, just as Google did. They might need to woo Webmasters and get those sitemap submitted to them as well. Is it going to be an easy task? Probably not, especially while Google’s impact is by far superior.

The phenomenon above is the introduction and absorbance of new technologies by force. Google are forcing, not malevolently though, trends of crawling and push for methods to change. Microsoft exhibited some similar behaviour by introducing new elements to HTML without consent from the community. They incorporated these element into Internet Explorer; they considered the Internet to be a source to serve Internet Explorer rather than the reverse, whereby the Web is open and accessible to all. By doing so, Microsoft encouraged Web developers to construct Web pages that work exclusively under Internet Explorer and slowly killed their main opponent: Netscape Navigator. But that is all history as Internet Explorer (version 6) lags behind Opera, Firefox and arguably behind Safari too.

In practical terms, do RSS sitemaps lead to any gains? We have discussed this issue to death at the primary SEO-related newsgroup. Borek expressed his skepticism about Google sitemaps:

So far Google just fetches my sitemaps 4 times a day. One site is PR3 5 months old, second is PR2 several years old, redesigned in June. No signs of crawl on either (and there are not spidered pages on both sites).

The bottom line, from my point-of-view, is that for news-delivering sites, RSS sitemaps provide a good opportunity to conquer valuable SERP‘s very quickly. For most standard sites with a decent amount of bandwidth to spare, RSS sitemaps appear like an overkill, even when HTML to XML implementation are virtually available ‘off-the-shelf’. RSS sitemaps may also be valuable for blogs where the nature of publication is linear rather than hierarchical or lateral.

Other related threads on the topic:

Related News: Google to Patent Ads in Feeds

Google Slips?

Figures have recently come out, which break down search engine usage in the States. According to one report, Google has been losing some of its impact

Google’s market share of US searches for June 2005 was at 36.9% compared with 37.5% in May 2005.

Then again, according a different study which was cited on the very same site, Google’s usage has been rising lately.

Google’s share of U.S. searches hit 52% in June, up from 45% a year ago, according to Web analytics firm WebSideStory Inc.

A careful read would reveal the obvious. The total volume of queries declines throughout the summer and likewise Web traffic in general. Google is probably more prolific outside the United States and among IT-savvy users.

Search engines chart

June 2005 referrals to this domain

Retrieval statistics: 21 queries taking a total of 0.137 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|