DBpedia, UMBEL & the Future Web’s Ecology
12.11.2008
INTERVIEW: The Linked Open Data infrastructure is in a rapid process of maturing - the recent release of UMBEL’s webservice and DBpedia's incorporation of UMBEL classes are yet another confirmation of this exciting process. Andreas Blumauer spoke with two of the key players in the evolution of the Web of Data: DBpedia co-initiator, Triplify main developer and head of the AKSW research group Sören Auer, and UMBEL editor and Zitgist CEO Mike Bergman.
The background
DBpedia has become the largest RDF repository for encyclopaedic knowledge, extracting structured information from Wikipedia and making it available on the Web of Data. UMBEL, on the other hand, provides an OpenCYC-based, light-weight ontology structure for relating Web content and data to a standard set of subject concepts, with a number of 20,000 concepts currently reached. In the Linked Data Cloud, DBpedia and UMBEL map and cross-reference each other.
In practice this means that UMBEL provides classes to describe the concepts to which “things” are members. For instance, named entities from Wikipedia such as “John F. Kennedy” are mapped with subject concepts such as Leader, Person, Administrator and Graduate, with broader and equivalent classes in CYC and FOAF and broader subject concepts within UMBEL. A link is set to Wikipedia, as well as a ‘same as’ reference to DBpedia. A class structure enables faceted browsing and extraction, inferencing, and navigation and discovery for all datasets linked to that structure.
DBpedia, in turn, returns properties of 'John J. Kennedy' (e.g. abstracts in available Wikipedia languages, demographic information such as birth date and place, alma mater, predecessors and successors), and ‘same as’ references, e.g., to the JFK entry in Freebase (who recently released their RDF service) and the aforementioned page in UMBEL. Furthermore, DBpedia maps the URI with available RDF types, for instance foaf:person or yago:AssassinatedAmericanPoliticians and, once again, with UMBEL’s subject concepts Person, Administrator, Graduate and Leader.
Due to its reliance on Wikipedia, DBpedia does a great job at covering a bandwidth of knowledge as broad as the spectrum of the interest of people participating in Wikipedia; it’s within the area of named entities, i.e. entities such as persons, organizations, locations, which have a proper name, but are not necessarily and specifically part of a particular, acknowledged domain or discipline.
UMBEL, on the other hand, has as its most apparent advantage its reliance on OpenCyc and with that the strong inferencing and logic capabilities of the CYC knowledge-base which are thus also brought to the Web of Data. DBpedia is a community project started by the University of Leipzig, Free University Berlin and OpenLink Software, while the open and free UMBEL is developed and hosted by Zitgist with support from, again, OpenLink Software.
Now, and in particular with the recent release of Zitgist’s web service endpoints and with the incorporation of UMBEL classes in DBpedia, questions arises as to the relationship of the two projects, and regarding the role of OpenLink Software in the further process.
The interview
Andreas Blumauer: One could say that DBpedia’s goal is to lower the barrier for web developers and end-users in the actual use of the semantic web, while UMBEL aims at bringing "order to the chaos" that is inherent to user-generated, collective knowledge. Would you agree with this description – and is it a contradiction at all or the kind of dynamic the Semantic Web community has been waiting for?
Mike Bergman: Yes, I would agree with this description, though we have tried many others. For example, in various writings in the past, we have described UMBEL as a roadmap, or middleware, or a backbone, or a concept ontology, or an 'infocline', or a meta layer for metadata, and others. Today, what I tend to use, particularly in reference to DBpedia, is the TBox-ABox distinction in computer science and description logics. UMBEL is more of a class or structural and concept relationships schema -- a TBox -- while DBpedia is more of an an instance and entity layer with attributes -- an ABox. I think they are pretty complementary.
Sören Auer: I very much agree with Mike, but would like to add that Wikipedia authors do not have in mind to create a coherent and consistent knowledge base when working on Wikipedia. I think the more we demonstrate the benefits of the semantic representations in DBpedia to the Wikipedia community, these people will start to organize and rearrange content to enable the use of Wikipedia as a knowledge base. Right now, Wikipedia authors just have not yet been confronted with the problem of synonymous infobox properties or the uncleanliness of the category system, for example. I think with a few small and non-invasive changes to Wikipedia, much of the current chaos can be already resolved.







