Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: Fuzzy Matches for Content Mirrors

__/ On Friday 26 August 2005 11:29, [John Bokma] wrote : \__

>> Sorry, but I must disagree. Let us say that T is the original page and
>> F (false) is the copy.
>> If F = T + A where A is some extra content, then you have problems
> Not really, you can define similarities based on sentences, words, etc.
> You don't have to look for exact matches. Similar is close enough.

...and very computationally-expensive. Search engines are having a hard time
indexing billions of pages and picking up key words. Now you ask them to
calculate similarities in a graph with billions of nodes?!

> I am sure there has already been a lot of research done. For example,
> students copy papers written by others.

Yes, I know, but people mocked it for being unreliable. Besides, you can
easily run filters that will do some permutations and replace words with
proper equivalents. Brute force would do the job.

>> To a black hat SEO it would be no problem to automate this and deceive
>> the search engines. it is much easier to carry out a robbery than it
>> is for the police to spot the crook in a town of millions.
> You don't do exact matches in cases like this, just fuzzy matches.

Using that analogy again, that's like doing a house-to-house search and
questioning all the residents.


Roy S. Schestowitz        Useless fact: Every polar bear is left-handed

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index