This website will shut down on 31 December 2021!

Taxonomy and folksonomy

The traditional approach in technical information is for the writer and producer to create the metadata, to markup information chunks and topics with keywords and categories. It is not uncommon for information producers to create a category structure, sometimes called a taxonomy or an ontology, and to use the keywords of that structure to characterize all the contents.

In the participatory media, in contrast, a practice has developed of collaborative categorization. It is most typically manifested in tagging systems of “user-generated content” sites such as Flickr and Delicious. The outcome of such collaborative categorization is called a folksonomy (from “folk” plus “taxonomy”; refer to Wikipedia for a good introduction).

Professional information producers are sometimes concerned about giving up control over the metadata, worrying that the resulting metadata structures will be less well-structured and perhaps even degrade into chaos. What if the users don’t understand the principles behind the information architecture? What if novice users spend a lot of time entering misleading tags? Why give up the benefits of hierarchical metadata structure for a flat namespace? What about redundancy and irrelevance?

It is tempting to ask the question of which approach is the best, which approach to choose — as if it was an either-or situation. I feel that would be a misguided question.

Instead, think of the two approaches as complementary.

In situations where an initial information set is created and deployed by an information producer, it makes all sorts of sense to provide producer-generated metadata as well. It helps structure production, it facilitates user search and access, it can provide users with overviews of a large material, and it can be used to handle customer regulations and other sorts of mandatory requirements on viewing the information set.

However, if there are openings for users to participate in the development of the information set (as intended in our ongoing work on next-generation DocFactory), then it seems obvious that the users’ categorization skills and socially constructed domain knowledge should also be tapped to create a more useful and meaningful information set in the long run.

There are several ways to create a hybrid approach.

Given the rough concept design described below, a rather participatory approach would be to implement a straightforward tagging system connected to the annotation function. Simply put, users could annotate contents with tags that they make up themselves or select from lists of already created tags. Since the user’s identity is stored with a tag (by way of the annotation function), this approach would support a rather powerful social search where you can, e.g., locate material based on tags created by selected colleagues. Moreover, the open tags would provide a good basis for computing clusters of related topics.

A more producer-controlled approach would be to see the user-generated tags as suggestions to be reviewed and possibly incorporated in the metadata taxonomy. A variation on this theme is to not create a parallel system for tags, but rather to open the facet structure and values to editing or editing suggestions by the users.

The general design guideline on the issue of taxonomy and folksonomy could be something like this:

Customer needs and contexts of information use can be expected to vary, where some cases would benefit from more user participation than others. Those cases are also the ones where a more open folksonomy approach to categorization should be explored. The platform should ideally support a range of categorization approaches from more to less participatory.