Clay Shirky's presentation is called Ontology is Overrated: Links, Tags, and Post-hoc Metadata.
He starts off by defining
Ontology and tell us the parable of the travel agent. The
periodic table of the elements is one the great examples of classification. The Library of Congress categorization
contains an imbalance with very generic element representations like Asia and Africa because the criteria used was the
number of books on the shelf.
Yahoo! was the first significant atempt to bring order (categorization) to the web. They hired ontologists to
categorize the content. There were shortcuts to other categories if users tried to find a category in a wrong place
(e.g. Books and Literature shortcut under Entertainment in case users went there to find Book and Literature). It was
the change from Hierarchical categorization to Hierarchical categorization with links. The huge quantity of links made
the hierarchy no longer necessary. That's when the search appeared. Even Google at some point adopted DMOZ but then
discontinued as there was no one using.
So when does ontological organization work well? Only when the domain is restricted and the participants are experts.
The web is not such case.
Voodoo categorization happens when one can force a categorization to users. This causes:
Signal loss. E.g. Mac, Apple and OSX; Movies, FIlm and Cinema, Queer, Gay and Homosexual.
Makes it hard to predict the future: E.g. "This book is about Dresden" vs. "This book is about Dresden and goes
in the category East Germany"
Merging ontologies is very difficult, Do we merge categories or GUIDs? In real life real minds don't think alike,
that's when del.icio.us comes into scene. The distribution of tagging is a long tail—few users with lots of tag entries
and lots with few. The distribution of tags for one individual user is also a long tail. Lots of tags about few
subjects and lots of not so frequent tags. Modeling the distribution of how users tag one individual URL is also—you
guessed right—a long tail. Lots of people tag the URL with one or two tags.
This is the called organic categorization—user and time are core attributes; one-off categories are lost in the rear
end of the tail (the system is the editor); the semantics are in the users, not in the system; merges are
probabilistic, not binary.








1. Alberto, what is the parable of the travel agent?
Posted at 5:48AM on Dec 19th 2005 by Gary Potter