Introduction About Site Map

RSS 2 Feed RSS 2 Feed

Main Page | Blog Index

Archive for the ‘SEO’ Category

Tracking Inbound Links

Iron links

A number of links that reach a given site can be probed (or estimated based on crawlers) using some special syntax in queries. The universally-accepted form for the query has become link:, where the string can be either a domain name or an individual page that resides deeper inside the site.

Altavista seems to report the highest number of inbound links, yet not all of them are visible. In general, the overall number of results, as estimated at the very start, is always misleading. Not all results are reachable from the search engines, so a false illusion is given. In terms of saturation of results (number of links reported), then comes Yahoo and only later comes Google. MSN does not support this query syntax, yet it appears to lose popularity anyway (“MSN’s search market share dropped from 14 percent to 11 percent”).

Technorati is a good tool for finding fresh links to a given site very quickly. Links are tracked almost in real time, owing to feeds and pinging services. In that respect, Technorati is Web 2.0-oriented.

Detection of Fake Content

Book scanning

THERE is a growing interest in snatching of Web traffic from search engines. A notorious method for achieving this is by large mass of useless and re-used content. This is backed by many inbound links, which are most commonly accumulated by spamming of other Web sites. The overall outcome of this is degradation in the quality of publication on the Web. Likewise in science (sometimes at least!).

There are real technical papers and fake ones as well. SCIGen was at one point utilised to output randomly-generated publications, one of which was actually accepted to be presented at a conference.

Finally, there is a new tool,which claims to be able to discern real technical papers from fake ones.

Authors of bogus technical articles beware. A team of researchers at the Indiana University School of Informatics has designed a tool that distinguishes between real and fake papers.

It’s called the Inauthentic Paper Detector — one of the first of its kind anywhere — and it uses compression to determine whether technical texts are generated by man or machine.

The Business Search Engines Create

Yahoo telephone

COMMERCIAL effects of search engines can no longer be ignored. Now that Google Finance has gone live, even economy is managed and supervised by major search engines (Yahoo Finance, MSN Money and Google Finance).

Various agencies and freelancers are finally offering services which promise manipulation of search results while businesses perceive search engines as a vital source of revenue. The
following recent article provides somewhat of a decent primer with special reference to the current industrial state-of-affairs.

Part science, part art, search-engine marketing is perhaps the fastest-evolving segment of the Internet. A cottage industry of search-marketing techies labors to adjust keywords to a search engine’s algorithm, the set of rules used to rank search results. The algorithm used by Google, the market leader, has hundreds to thousands of components that determine the results.

The Importance of Link Anchor Text

Google Cookie

MY site has just returned to Google’s top 50 for “roy”, after I had dropped to oblivion last year. I was never truly trying to get there. I never aimed for the site to be affiliated with the keyword “roy”, but this was a side effect of anchor text (words adject to or within hyperlinks). More oddly, my domain name and front page title make no mentioning of Roy. This, anchor text must matter a great deal; More than many of us realise, that’s for sure.

As another example, take the main developers of WordPress. They are ‘Googlebombing’ themselves in ‘out of the box’ installations of the software Web-wide. Try searching Google for “matt”, “mike”, or “alex”. You will see what I mean.

Simultaneous Spam Reports

Google portal

TO anyone who is interested: I have put together a page that enables content spam to be reported to several search engines in tandem. The purpose of this little ‘utility’ is to centralise various pages of interest, which motivates spam reports that reach more than just a single company. The ‘meat’ of the report can be conveniently copied and pasted from one frame to another. Report spammy sites that violate ethics.

Google and the Optimisation Trap

Google Cookie
How does the notorious Google cookie affecting your search results?

II is no secret that Webmasters sometimes subvert (or “optimise” as they would call it) Google results pages, which in turn weakens the relevance of Google’s search results. This manipulation is done by insertion/injection of particular keywords, as well as navigation hacks, ‘organic’ inbound links and so forth.

While artificial changes to site content have an effect on other search engines, changes are done primarily with Google in mind. This urges Google to make their strategy (algorithms) more dynamic and permit results to ‘dance’ every now and then, sometimes owing to large updates (c/f Bourbon). In such circumstances, ‘re-shuffle’ reached the extent that resembles a complete overhaul of indices. The main purpose it to weed out spam, yet the borderline between “ham” and “spam” becomes rather vague, so genuine siteget affected and often penalised. Many lose vital revenue as a consequence.

The Internet is becoming a brutal battelground. There are many cases of black-hat SEO, which have become worryingly prevalent. Even id search engines can annul the effect of ‘noise’, this does not account for the issue of relevance, spam aside. This is possibly a case of becoming a victim of one’s own success. Google have become the main target to many harmful practices, including the rumour mill.

In my humble opinion, Google remain the best engine bar none, but new challenges are being posed every day. Black-hat practices accumulate more and more tricks, which can be ‘pulled off the sleeve’ and then shared, soon to become a Web epidemic. Like a virus which quickly exploits flaws in servers and desktops, SEO hacks rely on flaws in search engines algorithms. The question remains: will Google be able to adapt to changes as soon as they occur, thereby annihilating the impact of Google-targetted site/page optimisations? I will illustrate using a very timely and true example.

This morning I wanted to know the precise definition and difference between single-breasted and double-breasted garments. Putting “single breasted double breasted” in Wikipedia (I tend to use it for knowledge queries, as opposed to URL search), I got the perfect answers with clarifications, pictures included. In Google, one gets promotional stuff. Shops are trying to sell suits, so results 1 to 4 are purely irrelevant for my query (at least the one I had in mind). I had no intention of seeking a guide or a purchase. In other words, the finanical incentive beat the informative source, which is sad. I did, however, find Wikipedia’s page at number on Google for the phrase “single breasted double breasted”. To me, this was not good enough. As a whole, it wasn’t a specific and satisfactory outcome as the first 4 places attempted to sell me something. Knowledge references should surpass any individual store — one among the many which exist.

Search Engines and Benchmark Subjectivity

SEARCHING of the Web is no exact science. If it was, its exhaustive exploration would be infeasible. The more formidable search engines amass information from million of Web sites, each containing huge lumps of information — both textual and media. That information, in turn, could be interpreted and/or indexed in a variety of different forms. Rarely is the content truly understood, which is my personal motivating for knowledge engines.

Mathematics and physics could be argued to be inexact sciences as well, at least when a variety of man-made, non-fundamental fields are introduced. Think of computer science, for example. Its fundamentals assimilate to this complex problem, which is searching the World Wide Web. It is associated with ad-hoc solutions. Computational theories which relate to Turing machines are not tractable enough to make a most correct and efficient algorithm ever crop up and stand out.

Don Knuth has written his popular series of books on the issues of correctness and efficiency in common algorithms. It proves an elegant reference to many computer science practitioners. Problems which are simple, such as element or number sorting, are still handled differently by different algorithms and their efficiency is dependent upon the architecture involved, the scale of the problem, and its nature. Search algorithms likewise, which is why they should be engineered differently depending on a number of key factors. Hence, judgement of the quality of search engines cannot be done objectively, but only ever be estimated using test cases and artificial scoring schemes.

Search buttonStill, everyone wants to discover the perfect recipe to outperforming Google. Others try to reverse-engineer their algorithms and cheat (fame and riches owing to ‘Google juice‘ that is channelled to one’s site/s). Many of us continue favour and recommend Google, which brings the largest number of referrals to most sites in existence. There is a danger here though. Large search engines are the main target for deceit and they are easily confused by spam inasmuch as they are inclined to pick up rare and valuable content.

Quality of search is probably in the mind of the searcher and the result of hearsay — somewhat of a ‘cattle effect’. Even engines that spit out cruft might be defended unconditionally by their innocent users. This may lead the competition to forfeiting the battle and invest fewer resources (e.g. datacentres) in the attempt to catch up. Phrases like “Google it” do not help either as they promote a search monoculture at best.

Related item: Search Engines and Biased Results

Retrieval statistics: 21 queries taking a total of 0.143 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|