Bernhard Haslhofer: "Linked Data is an attempt to continue the well-established information organization tools known in libraries."
04.01.2010
Libraries have long been suspicious about the Semantic Web but latest developments show that Linked Data paves the way for the adoption of Semantic Web standards in the library domain. Andreas Blumauer (SWC) talked to Bernhard Haslhofer from the University of Vienna about the cautious, but promissing relationship between libraries and the Semantic Web.
Bernhard, in November you gave a talk at the SWIB09 (Semantic Web in Libraries) workshop in Cologne. How was the feedback?
The goal of the workshop was to give answers to questions like “What is the Semantic Web and what is Linked Data?”, “Why should libraries and librarians care about these topics”, and “How can libraries get involved in the Semantic Web?”. The target audience of the first day were decision makers, so the focus was more on the “What” and “Why” questions. The topics on the second day were more technical and clearly focused on the “How” question. In total approximately 100 people participated at the workshop.
I had my talk on the first day, gave an introduction to the Linked Data topic in general, pointed to existing Linked Data projects in the library community (e.g., LIBRIS, LCSH), and discussed some of the benefits but also problems libraries may encounter if the get involved in this topic. My goal was to demystify the notions used in the Semantic Web and/or Linked Data area and the main message I tried to communicate was that from a library-perspective, we could even regard Linked Data as an attempt to continue the well-established information organization tools known in libraries (identifiers, metadata, controlled vocabularies), just on the technical-level of the Web. Taking into account that the Web is there now and will most likely be the primary medium for accessing and exchanging information also in the foreseeable future, we could see Linked Data as a natural technical evolution step in information organization, also in libraries. Using commonly-known basic Web technologies (URI, HTTP, RDF) for representing structured data will also lower the barriers for integrating library data with other applications and linked data sources.
At the SWIB workshop I was quite impressed by the number of libraries that have already started their linked data projects or will do so in near future. I was aware of projects like LIBRIS and LCSH but I did not know that much about ongoing projects in the German-speaking countries. Now I know, for instance, that the German National Library will start a Linked Data project with their authority files (Personennormdatei), that the German National Library of Economics is currently working on a SKOS representation of their thesaurus and will expose it as Linked Data, that also the library at CERN will implement the Linked Data principles, a.s.o.
So my impression was: most attendees at the workshop knew pretty well what the Semantic Web and Linked Data is about, why they should care, and how they can involved.
For quite a while the topic was not really accepted. How do you explain the growing acceptance of Linked Data in the library domain?
For a long time, the Semantic Web had the problem that many promises were made but never found their implementation in practice. I still remember the story about the “intelligent” agents that float through the Web, understand the data and services they encounter and automatically fulfill tasks on behalf of their users. This is definitely interesting from a research perspective, but for librarians and other information professionals this probably sounded a bit too abstract and far from reality. Also the conception that the Semantic Web can solve all interoperability problems was and is far from reality. For quite a while, the Semantic Web was also considered as a term that is required in research proposals in order to obtain fundings. This, of course, didn't help us in explaining the possible benefits of the Semantic Web, because the discussions were often on a non-technical level.
The Linking Open Data project and the appearance of concrete realizations such as DBpedia were milestones that changed this conception. The Semantic Web, or at least its data-centric part, became visible, which made it more intuitive to understand what it is all about. Also projects such as LIBRIS and the Library of Congress Subject Headings triggered other Linked Data initiatives in the library domain.
From your point of view, what are the main benefits for libraries when the apply Linked Data or semantic technologies?
Libraries have applied semantic technologies for ages. Using identifiers, metadata, classification schemes, and thesauri for information and knowledge organization is not new to librarians. The main benefit of applying Semantic Web technologies are, in my opinion, that these well-known tools of information organization also continue to exist in future in a globally interconnected information network.
As Linked Data consumers, libraries could increase the user experience of their portals, in a similar way as the BBC is doing that right now. When users receive high-quality, contextually relevant information in library portals than simply a list of available objects, they would probably more often consult libraries in order to fulfill their information needs. I believe that this should be in the libraries interest if they want to continue to fulfill their social role of public, high-quality information and knowledge providers. They could achieve this, for instance, by incorporating information from third party data sets, which can be other libraries but also other community-driven datasets such as DBpedia. A library could, for instance, enrich their objects in their existing collections with links to objects in remote collections, or could enrich their local object descriptions (e.g., about books and authors) with contextually relevant information (e.g., digital book excerpts, author biographies) from other sources. Also, especially in Europe, the issue of multilingual access could be a driving motivation for integrating Linked Data with library data. DBpedia provides data in 91 different languages. Isn't it somehow self-evident to exploit this in libraries in order to provide multilingual access to their objects?
If libraries become linked data providers and implement an open data policy, they allow the reuse of their data in other application contexts. This could trigger the development of novel search and browse user interfaces for libraries that go beyond the catalog-oriented front-ends we know from today's libraries. It would also enable different communities to build their own domain-specific “virtual” collections by referring to relevant objects provided by several real-world libraries around the globe. If, for instance, somebody specializes on the life and work of “Franz Kafka” he or she could easily build a virtual Web collection dedicated to this topic by linking and reusing the objects available about the life and work of Franz Kafka in real-world libraries around the world. This, however, requires that library objects are identifiable via URIs and the data about these objects are accessible via HTTP. At the end this may also increase the visibility of the real-world libraries on the Web, because they attract an audience they wouldn't have attracted otherwise.
The “Europeana” digital library is currently one of the flagship digital library projects in the are of the Semantic Web. How, precisely, will Semantic Web be applied in this project?
The Europeana digital library aggregates metadata about library objects (books, videos, images, etc.) from several libraries and cultural institutions all over Europe. The EuropeanaConnect project currently tries to make the implicit semantic connections between objects (e.g., books and images about Franz Kafka) explicit by introducing a semantic layer on-top of Europeana. At the moment we are working on the conversion of knowledge organization schemes, which are already available in libraries, to SKOS. Our ambition is to integrate them with Europeana's search and discovery mechanisms. A research prototype of a semantic search engine for Europeana is available in the the Europeana ThoughtLab.
Linked data is also an issue in Europeana; here is a figure that illustrates the current Europeana data cloud. The integration of Europeana data with other currently available datasets in the Linked Data cloud (Dbpedia) is also on our research agenda.
Which barriers and obstacles do you expect in the next years, when you think about the implementation of larger Linked Data infrastructures? Which non-technical barriers are the most critical ones?
The major goal in the early phase of Linked Data was to make data available on the Web, which succeeded pretty well. Now it is time to focus on the quality of the available data and the supporting infrastructure. This is especially important for institutions whose job it is to provide high-quality data such as libraries. Data provenance and the problems caused by dataset dynamics (link integrity, synchronization of updates) are just two examples of issues that should be on our research agenda.
The number one non-technical argument I often hear against Linked Data is the discussion on intellectual property rights (IPR). Here it is important to say that “Linked Data” does not automatically mean that the provided data are open and freely available on the Web. This depends on the applied data policy. Also with Linked Data institutions can still protected their data. But for libraries it would in my opinion definitely make sense to implement Linked Data in combination with an open data policy.
In any case, IPR should not be neglected. But it should not be abused as an excuse to not think about a possible future of libraries and related institutions in a globally connected information network and to not take action. Alternative licensing schemes (e.g., Creative Commons, Public Domain), which already work pretty well in other domains (e.g., Open Source) are worth to be considered as an alternative approach to existing IPR policies. If existing libraries do not take action, others, who have the financial power to ignore IPR rules, will build the libraries on the Web and attract the library users of the future.
As a non-IPR expert, I cannot give a general recommendation for a possible IPR solution for libraries. I can only say that the Web is not the future, it is the present. People are using the Web to fulfill their information needs and the future generation will most likely continue to do so. What is not on the Web will hardly be perceived. This is the case for digitized objects but also for the adjacent descriptive data. We can hardly change the behavior of people, but we can adapt our view on intellectual property and think about alternative business models. Other areas such as the music domain already went through this process and we have seen that exploiting the possibilities of novel technical developments often turned out to be more successful than retaining with IPR models from the pre-Web era.
The SWIB workshop has shown that many librarians do not let block themselves by the IPR discussion. They started developing prototypes and convinced people that Linked Data makes sense. Maybe this is the way to go...
About Bernhard Haslhofer
Bernhard Haslhofer is a member of the scientific staff in the Multimedia Systems Department at the University of Vienna. His research interests lie in the area of Linked Data / Web of Data, Metadata Interoperability and Metadata Mapping / Metadata Standards.







