Corporate news

Tassilo Pellegrini's picture

George Anadiotis: "Linked Data brings value by offering an alternative approach to lightweight data integration and mashups."

09. December 2009, by Tassilo Pellegrini

george-imc-2

Linked Data has become a hot topic over the past few months. What is your general opinion about this movement?

Indeed, Linked Data has been gaining popularity lately, and we think this is great. Having a Semantic Web background, we see Linked Data as the vehicle that can help this idea take off by demonstrating (some of) its practical benefits. We should also point out though, in all fairness, that we see what we're now going through as being on the 'Linked Data Peak of Inflated Expectations phase' - one, in fact, that every new technology goes through: when things are new, promising and not completely understood, people put lots of hope in them and do not realize completely neither their full potential nor the issues that come with it. It's very exciting however to see all the enthusiasm and we're glad to be a part of the movement.

From a technological perspective: What are the merits and flaws of Linked Data? What are the critical issues with respect to data storage, scalability and distributed queries at the moment?

Probably the single most important advantage of Linked Data is the low entry barrier: the 4 well-known rules to start using Linked Data can be easily conceived and immediately applied. On the other hand, simplicity can be deceptive, as some some recent high-profile Linked Data unveilings have shown: it's fairly easy to get started, but going deeper and getting it right is more engaging. The fact that Linked Data is based on Semantic Web technology is in my opinion where both its technical merits and flaws come from. So yes, we can use Linked Data in quantities that are growing at a speed analogous to the speed of growth of WWW content in its early days to do reasoning and build mashups and intelligent applications, but, we have only begun to scratch the surface of the Linked Data challenges.

Ultimately, these challenges are the same ones that the Semantic Web itself has to deal with in order to be succesful on large scale. There is an anecdote going around in the Semantic Web crowd about how earlier attempts to convince the industry that semantics can be valueable were met with scepticism, mostly at their ability to scale - the 'come back when you can do some million triples' statement. Well, we're now at the stage where we can do that and more, reasoners are improving constantly and introducing parallelization and optimization techniques to scale at unprecendented levels, but this is not what scale is really about in the Linked Data scenario. When used as a means to publish data against well-defined schemata using a simple access mechanism, as is the case today, things are generally straighforward. But the more data publishers enter the game, the more data are published and the more we come to expect of the applications we build on top of them become, the more we'll start running into issues..

For example, what happens when source A states that resource X is Y, while source B states that resource X is Z? First of all, the issue of data provenance needs to be addressed if we are to distinguish who says what and then, we'd have to introduce uncertainty reasoning and some notion of trust - really tricky issues. In a sense you can call all that side-effects of distributed querying in an open world, apart from the very specific algorithmic issues that distributed querying per se introduces. In this context, we see data storage as the least of concerns, simply because our take on this is to get the ball rolling by leveraging existing technology. We would not be hasty to to advise customers to give up their existing db investment and infrastructure to switch to RDF stores, as much progress as the latter have shown, as we find this approach rather disruptive. We would rather go for maintaining existing infrastructure in place and use relational mapping to publish data where appropriate, while keeping purely semantic data in their dedicated store. In order however to enable more sophisticated applications, relational mapping software has to evolve beyond its current state - support some sort of inference and improve performance.

What about the business value of Linked Data? Does it go beyond the principles of a gift economy?

The 'gift economy' is but one use case for Linked Data technology. People with enterprise ICT background saw the opportunity that Linked Data presents for BI, EAI and EII very early on, so we're glad to see that this is becoming a realization for major players as well, as PriceWaterhouseCoopers Tech Forecast for Spring 2009 pointed out and others have also picked up. Though we're not at liberty to disclose details, i can tell you we're also aware of some major companies interested in Linked Data potential and allocating resources to investigate applications. So, it must be made clear that the Linking Open Data effort is one thing, a very worthwhile effort in fact that we strongly support, and Linked Data technology is another thing that can be used to bring value in different business use cases, offering an alternative approach to lightweight data integration and mashups.

With your contribution "Linked Data for the masses" (1) to last year's Triplification Challenge you investigated into Linked Data powered and-user applications. Can you give us some details on your research?

Absolutely. Let me start by giving you the motivation for this work: IMC is very active in the area of eParticipation - using ICT to facilitate citizen participation in the public sphere debate and decision-making process. We have developed our own methodology and platform that we use in our eParticipation projects, called eDialogos. The methodology defines a process for eParticipation and the platform uses many different technologies to support this. In the course of this process, communities of interest are formed, topics are raised and discussed, opinions expressed and -therefore- content is created.

We wanted each of the communities to be able to capitalize on the insight gained from other communities that have dealt with similar topics of interest and in practical terms, this meant making content available in a structured, remotely queriable form. We wanted our 'community integration' to take place on the data level directly, removing the need to rely on proprietary APIs and having data that are well-defined in a self-describing way. As you can see, this use case is a perfect match for the Linked Data approach. By making community-created content available as Linked Data, every community becomes a Linked Data provider as well as a consumer at the same time, mashing its own content dynamically with content provided by other communities (the 'inbound approach' as we call it) and also enabling other communities to do the same with the content it provides (the 'outbound approach'). This is exemplified by the use of tagging in the community: we rely on external providers (e.g. DBpedia) to ground each tagging's contextual meaning and then export our content and its taggings as Linked Data, thus making it not only available but also meaningfully annotated.

In order to realize this vision, we started experimenting with the technical infrastructure needed and created some proof-of-concept applications. Part of this work was enabling Linked Data access for the front-end infrastructure we used, Liferay portal. We decided on the appropriate vocabularies for the type of content we wanted to publish (FOAF, SIOC and MOAT mainly), delved on the internals of Liferay and used D2R to map its relational database to the vocabularies of choice, also using techniques to improve performance as much as possible. Since Liferay itself is also based on the notion of communities, we thought our work would be more widely applicable and useful, so we chose to submit it for review at the Triplification Challenge and make it available to the community as open source software. Our applications have gradually matured and are about to be deployed in our commercial projects, while at the same time we are now making the Liferay Linked Data Module available as a Sourceforge project and we are working with Liferay management in order to disseminate this effort to the community and also include it in a future release of the software.

(1) Linked Data for the Masses

In a recent interview (2) Prof. Wendy Hall from the University of Southampton said that Linked Data is as much social as technical. Where is the social in the Linked Data?

Well, we think she is right, and you're also right to call Linked Data a 'movement'. Indeed, the technical infrastructure needed to realize Linked Data is not what this is all about: although of course maturing comes with age, the infrastructure has been there for quite some time now. I would say this is mostly about a shift in paradigm - making people realize what linking data is, how they can use it and what benefits they can reap by doing so. In the same way as the WWW, Linked Data technology is not really that revolutionary or new; in both cases, the value comes from the way people use it to publish, cross-link and aggregate content. And as Tim Berners Lee himself put it, Linked Data is pretty much the WWW done right this time: not just content, but the underlying data - structured content. But in any case, the process of defining, publishing, linking and using this data is -or should be!- highly social every step of the way, since it is carried out by and for people in a collaborative setting.

In our opinion, a prime example that demostrates the combination of technical and social aspects of Linked Data is opening up public sector data. The recent initiatives by the U.S. and the British administration to make public sector data available as Linked Data have been met with enthusiasm - and with good reason. This is data that should be in the public sphere and the decision to make them available is a political one, hence social in the broader sense. At the same time, the choice of Linked Data as the means to do this maximises the value one can derive from the data by making them well-defined and cross-linking them with other sources. We hope that other administrations will soon follow this lead.

(2) Interview with Wendy Hall

About IMC

IMC Technologies S.A. is an award-winning technology and consulting company founded in 2004 as a spin-off of NTUA with headquarters in Athens, Greece. The company has a strong research background and a focus on knowledge management and knowledge technologies, on which it is considered a leader in the Greek market. It provides highly specialized products and services to the public and private sector across several industries and it is one of the fastest growing ICT companies in Greece. We are now at the point where we are creating product offerings, rebranding the company image and opening up to new markets, focusing in the areas of eParticipation and Cultural Asset Management on which we have extensive experience.

Wed, 12/09/2009

Comments

Add new comment