Corporate news

Tassilo Pellegrini's picture

Jeen Broekstra: "The importance of SPARQL can not be overestimated"

06. April 2006, by Tassilo Pellegrini

Jeen Broekstra

About triple stores

SWS: "What´s the whole idea of triple stores?"

Jeen: "The term 'triple store' typically refers to any system that has facilities for persistent storage of RDF data - triples. They come in many different varieties; some store data in a relational database, others use an XML file, still other implement their own native storage and indexing format. It is perhaps important to point out that generally speaking, frameworks such as Sesame are more than 'just' a triple store though: it provides an application programming interface (API) that allows handling and manipulating of RDF data that goes beyond simple storage and retrieval.

The motivation for such RDF-specific stores and frameworks is that the relational model and query languages like SQL are not particularly suited towards storage and retrieval of RDF data. RDF is a graph data model and as such one would want to be able to query 'the graph', that is, having query expressions that formulate paths through an RDF graph (rather than expressing relations between cells and columns in a relational schema). To bridge this gap, triples store came into existence."

Relational databases and triple stores

SWS: ""How will triple stores and relational databases work together in the future? Is there still a problem with scalability?"

Jeen: "I expect that in the future, there will be a degree of 'fusion' between triple stores and relational databases. We already see this happening with Oracle for example, which has put some support for RDF storage and querying in its latest release. I expect that this tendency will come from both sides: RDBMS developers will put more support for the RDF data model in their system, which developers of RDF frameworks such as Sesame can then more fully exploit to increase performance and scalability.

The current generation of triple stores can scale up to the order of 100 million triples, and this will definetely continue to increase as systems get more fully developed."

Sesame is very scalable

SWS: "What´s the main advantage of sesame compared to other triple stores?"

Jeen: "Sesame's strong points are its flexible and easy-to-use Java API, its small codebase, its easy deployment as a (Web) Server, and its strong support for all relevant RDF standards (including RDFS entailment). Furthermore, Sesame is very scalable and has good query performance, even on very large datasets."

Sesame

Sesame 2.0 and SPARQL

SWS: "What´s the next steps in the sesame-road-map? What role will SPARQL play in the future?"

Jeen: "Currently we are working hard on Sesame 2.0, which is a complete re-design of the Sesame framework. Support for the SPARQL query language and protocol play a significant role in this effort: the HTTP protocol with which one communicates with a Sesame server will be a superset of the SPARQL protocol, and Sesame 2.0 will support a complete query engine for the SPARQL query language alongside its own SeRQL query engine.

Other important new things in Sesame 2.0 are its strong support for context (in both the API and query language), full transactional support, and a complete revision of its access API to facilitate greater ease of use for application developers.

Currently there is an alpha version of Sesame 2 available, and we are aiming to release the first stable version in May/June 2006.

The importance of SPARQL in general can not be overestimated in my opinion: SPARQL defines a standard way in which to communicate with RDF-based services on the Web. Up to now, interoperability between different RDF tools was hampered by the lack of such a common standard. Having SPARQL can make sure that a Sesame-based application can freely communicate with, say, a Jena-based application, over the Web, by means of SPARQL queries and answers."

Triple store-based applications

SWS: "What kind of triple store-based applications are ready for today?"

Jeen: "It depends on what you mean with 'ready for today'. Aduna, the company that develops Sesame, has a number of commercial products which use Sesame internally as a component for triple storage. For example, AutoFocus is a desktop search and navigation tool which stores the metadata it crawls from the indexed documents in a Sesame store. Another Aduna product, Spectacle (http://aduna.biz/products/spectacle/), provides a web-based faceted browsing environment based on RDF metadata and is particularly suited for use in an enterprise environment."

About Jeen Broekstra

Jeen Broekstra is an Assistant Professor at the Technische Universiteit Eindhoven, department of Computer Science, and is part of the Architecture of Information Systems Group (AIS). He is also a senior software developer at Aduna, a company that produces software for querying, searching and navigation of large information spaces, and is one of the lead designers of the Sesame RDF framework. He is also involved in the Data Access Working Group of the W3C.

Jeen received his PhD at the Vrije Universiteit Amsterdam, on the topic of storage, querying and inferencing for Semantic Web Languages.

His current research is on software components and languages for the Semantic Web, such as RDF, RDF Schema, OWL and SPARQL.

Link: Jeen´s Homepage

Thu, 04/06/2006