As an open data fan or as someone who is just looking to learn how to publish data on the Web and distribute it through the Semantic Web you will be facing the question “How to describe the dataset that I want to publish?” The same question is asked also by people who apply for a publicly funded project at the European Commission and want to have a Data Management plan. Next we are going to discuss possibilities which help describe the dataset to be published.
The goal of publishing the data should be to make it available for access or download and to make it interoperable. One of the big benefits is to make the data available for software applications which in turn means the datasets have to be machine-readable. From the perspective of a software developer some additional information than just name, author, owner, date… would be helpful:
In a previous blog post I have discussed the power of SPARQL to go beyond data retrieval to analytics. Here I look into the possibilities to implement a product recommender all in SPARQL. Products are considered to be similar if they share relevant characteristics, and the higher the overlap the higher the similarity. In the case of movies or TV programs there are static characteristics (e.g. genre, actors, director) and dynamic ones like viewing patterns of the audience.
The static part of this we can look up in resources like the DBpedia. If we look at the data related to the resource <http://dbpedia.org/resource/Friends> (that represents the TV show “Friends”) we can use for example the associated subjects (see predicate dcterms:subject). In this case[read more]
The ADEQUATe project builds on two observations: An increasing amount of Open Data becomes available as an important resource for emerging businesses and furtheron the integration of such open, freely re-usable data sources into organisations’ data warehouse and data management systems is seen as a key success factor for competitive advantages in a data-driven economy.
The project now identifies crucial issues which have to be tackled to fully exploit the value of open data and the efficient integration with other data sources: