Thesaurus Modelling & Evaluation

Thesauri can be used to support various application scenarios like Autocomplete, Facetted Search & Browsing, Recommendations or Glossaries. Herein thesauri usually perform the function of harmonising terminologies, controlling vocabularies and/or support the user in browsing through a concept space.

Our research focus:

  • Thesaurus-based Search & Recommendation
  • Semi-automatic Generation of Seed Thesauri
  • Context-oriented Concept Alignment & Fusion
  • Collaborative Thesaurus Modelling
  • Thesaurus Quality Evaluation & Assurance

With the Linked Data initiative gaining momentum in the past years, SKOS (Simple Knowledge Organization System) has emerged as a common "standard" (currently a W3C recommendation) for expressing knowledge organization systems (KOS) such as thesauri or taxonomies. SKOS features a concept-oriented approach, with a concept being "An idea or notion; a unit of thought." (as defined in the SKOS definition itself) that can be represented with an URI. A critical sign for the importance of having controlled vocabularies in web-oriented formats like SKOS is that more and more existing vocabularies are offering SKOS versions of their vocabularies. Transformations have been made for thesauri like Agrovoc, Eurovoc, GEMET and STW Thesaurus for Economics but also for other types of controlled vocabularies like subject headings (MeSH, LCSH etc.).

Despite a long research tradition in thesaurus quality assurance little attention has so far been paid to the interaction between the structural specificities of a thesaurus and the quality of output with respect to differing application scenarios supported by the thesaurus. Although several international initiatives exist that focus on thesaurus and meta-data quality in terms of expressivity and structural soundness, these approaches do not take the envisioned application into account, thus being of limited relevance for applied thesaurus modelling.

Here is where our research starts. In the past years we have invested substantial effort to understand the interaction between different ways of thesaurus modelling and the envisioned application types. Understanding these interactions helps to set up and manage thesaurus projects efficiently and lower the costs of thesaurus maintenance.