I have been doing a lot of reading and thinking, along with many others, about the relative merit of taxonomies and folksonomies in the corporate environment. Would you derive more value by spending effort creating a detailed taxonomy for your information environment or would you be better off letting the users categorize content by tagging that content thereby creating a folksonomy? Or perhaps a hybrid solution where a shared vocabulary is used to as the basis of your content tagging effort.
Since I am currently working with MOSS 2007 forgive me if I use some MOSS specific terms throughout this post.
Obviously there are benefits and drawbacks to both and their relative merits can be dictated by the situation. I wouldn’t personally want to be perusing for medical information on heart medication and be mislead into thinking a laxative fell into that category because someone thought that would be funny – although I’m certainly not above seeing the humour there.
First let’s understand some relevant terms. It would be a terrible irony if we didn’t define these terms considering the topic of this post. Or would it?
Taxonomy: is the practice of classifying things. The Duey Decimal system comes to mind In the context of this article it is the creation of meta-data structures to organize and classify content. The structure is often hierarchical and really represents an attempt to create a shared vocabulary. I work in a manufacturing company and and example of a sample taxonomy can be found here: http://en.wikipedia.org/wiki/Taxonomy_of_manufacturing_processes
Folksonomy: A non-hierarchical collection of user generated terms used to describe content. Think tag cloud. The word is a portmanteau of “folk” and “taxonomy. Sorry – I found that lovely French word while researching this post and felt like I had to include it.
Precision: In the context of document retrieval, precision is the percentage of retrieved documents that are relevant to the search. It’s a measure of exactness.
Recall: The number of relevant documents that were returned in a search versus the amount that should have been returned.
Semantics: the study of meaning. Or in other words, there may be a semantic gap between your perception and your boss’ intentions when he tells you that as a manager this place could run just fine without you.
Metadata – data about data. Or in other words, descriptive data about specific content (author, file size, owner etc.)
Here is a quick table illustrating providing an overview of taxonomies and folksonomies. Credit goes to Mark Baartse from useyourweb.com for the information: http://www.useyourweb.com/blog/?p=62
| Taxonomy | Folksonomy |
| Brittle | Flexible |
| Accurate (if done well) | Less reliable |
| Compliance must be forced | Rewards but doesn’t force compliance |
| Harder to add to | Easy to add to |
| Centrally controlled | Democratically controlled |
| Predictable | Organic |
| Higher Precision | Higher Recall |
There is a very interesting article below that describes the semantic gap between expert curators & laypeople in describing and categorizing artwork. http://www.nytimes.com/2007/03/8/arts/artsspecial/28social.html?_r=2&oref=slogin. I think in a decentralized, global organization like my current employer that this situation illustrates perfectly the difficulties in implementing a comprehensive taxonomy.
Consider the situations many organizations find themselves in. A global organization working with numerous languages and cultures composed of numerous functional areas and levels that is sharing information with customers and suppliers. Add in a decentralized operating structure along with a sprinkle of rather ad-hoc document management practices and the challenges of implementing a taxonomy that will add any value start to become quite clear. There is a lot of leg-work do to organizationally prior to even considering a technology solution.
“Can we all agree on what a “part” is? What about a customer? What about an HR policy? Most of the time we can’t agree where to go for lunch. And hey! All our ERP solutions are implemented differently anyway. Shouldn’t we spend 10 years doing that prior to doing this? And that guy from Iowa is a jerk anyway so there is no way I’m working with him. I’m hungry…about that lunch…”
Incrementally implementing a simple taxonomy that can leverage the enterprise search features of MOSS is straightforward and requires agreement of a smaller group of people.
In scenarios like this I have started to lean much more towards the use of folksonomies. It’s far simpler to implement, content editors have been open to the idea that they can “tag” things and it shows up in a cool tag cloud on the home page. Plus it seems very trendy and Web 2.0.
In fact I have to include an image of one of the first that we implemented. It figures that the most prominent word would be Die. Just my luck.
Obviously this is fraught with the typical perils of user generated tagging but in our situation the content authors are reasonably well-trained. They are not that likely to imput something malicious. They might input something that makes no sense to someone else but that’s part of the process. It’s a hell of a lot easier to modify a tag than it is to make a change to your taxonomy. In fact, and this is very important, I see a real opportunity for a folksonomy to drive a more structured effort at creating a taxonomy over time. It will provide us with a glimpse into how the people that use the content perceive that same content.
The important thing to remember is that this approach provides content consumers with a new mechanism for finding the information that they are looking for. It is complimentary to Enterprise Search. There are numerous free tagging solutions for MOSS 2007 available at codeplex and it will be included in SharePoint 2010 out of the box.
The major limitation I see with this approach is the lackof a semi-structured contextual shared vocabulary. I’m referring to a way to provide content creators with contextual tagging suggestions based on the content they are trying to describe. If you take a look at my short post regarding OpenCalais you’ll get an idea of what I am referring to: http://www.intelligenceamong.us/?p=350.
When I think of a hybrid solution this is what I am referring to and the potential power of the Semantic Web or Web 3.0 or whatever we’ll be calling it in a few years. Imagine if content creators were able to filter their content through an engine or services that looked at the contents and was able to determine relationships, people, events, facts etc. about the content and return suggestions that made sense. The instances of incoherent or irrelevant tags would drop significantly. Based on what I have seen with OpenCalais, we are going to look at creating a prototype that integrates with their service to assist content creators with creating a reasonable folksonomy.
In summary, taxonomies and folksonomies are complimentary to one another and will faciliate your understanding of your information from a variety of perspectives. Nothing is static and gathering as much information as organically as possible will assist the evolution of the classification system that you choose. Adding semantic capabilities to your content tagging effort can make your folksonomy even more relevant. Good luck.


