Introduction About Site Map

XML
RSS 2 Feed RSS 2 Feed
Navigation

Main Page | Blog Index

Friday, November 3rd, 2006, 6:55 pm

Duplicates Detection in Social Bookmarking Sites

DUPLICATE entries are some of the evil residues of sites where editorial involves many people. There are ways of preventing duplicates (dupes), but none is perfect.

I personally find Digg’s dupe detector somewhat flawed because, by the time the user finds matches based on similarity, much of the entry (and effort) has already been put into it. The user is thus tempted never to retract and concede the submission. Netscape, on the other hand, checks for title and URL similarity (identity only) in-line.

Wishlist items:

  • Have matches that are more ‘fluid’ appear on the side while input is entered (not just exact matches)
  • Permit the user to preview without entering a channel and without tags. This enables the submitter to check for dupes before giving some supplemental information. I am aware that it requires some parsing of the text, which is harder than using tag-based similarity.
  • What would be nice is an option for supplemental items, URL’s, and followup news. Maybe have a hierarchical connector between related items, or at least some linkage that connects an item with a correction, clarification, op/ed, etc.

One Response to “Duplicates Detection in Social Bookmarking Sites”

  1. Kian Ann Says:

    Good thoughts. I don’t know if I’m the only one who feels this way, but I think there is really so much spammy and duplicate stuff on the Internet, and it sometimes get irritating.

    For example, I can have a google alert on “Social Bookmarking” and then I find that all three links are exactly the same content. Really irks me!

Back to top

Retrieval statistics: 21 queries taking a total of 0.114 seconds • Please report low bandwidth using the feedback form
Original styles created by Ian Main (all acknowledgements) • PHP scripts and styles later modified by Roy Schestowitz • Help yourself to a GPL'd copy
|— Proudly powered by W o r d P r e s s — based on a heavily-hacked version 1.2.1 (Mingus) installation —|