
As Susanne Boll has pointed out in an article in IEEE MultiMedia in 2007, most of the media related Web 2.0 sites have not used any research results from areas such as multimedia content analysis or semantic content classification in the beginning. It is exciting to see that many of these platforms are now adding media semantics in small steps, in the typical dynamic way they evolve. We see features such as structured annotation, capabilities for annotating time segments or regions, use of face detection, similarity search, etc. being available in many of these platforms.
These services put into practice the lesson learned from more than a decade of research in content-based multimedia retrieval: there are a number of very interesting methods, but in isolation they do not provide real benefit to the end user. What we see now is that these technologies are used in combination with other descriptive metadata and with the user in the loop to provide relevance feedback and iteratively refine the query, which leverages their potential.
There is some truth in saying that a picture is worth a thousand words. Automatic analysis algorithms can currently decode just a few of them, mainly those that are related to what is actually is depicted, while humans associate many concepts with a picture out of their context and experiences. Although a lot of progress has been made in automatic concept classification, the results of the TRECVID benchmark still shows that there is a difference of about a factor of 10 in the achieved precision between visually well represented concepts and more abstract ones. We can expect this to improve with the currently ongoing work on large scale concept classification (several thousand classes) that also makes use of semantic relations between different concepts.
It is interesting to see that the annotations created with different approaches are complementary. There is a Dutch video labeling game that allows users to annotate broadcast archive content. It turns out that the non-expert users annotate different aspects than professional archivists, just as automatic tools can provide annotation that is difficult to create for humans and vice versa. The key is to intelligently combine the strength of these approaches, using automatic methods to apply existing annotations to similar content, to use linked open data to enrich annotations, etc.
There have been a number of proposals for multimedia ontologies and mappings of multimedia vocabularies (cf. the excellent report from the W3C MM Semantics XG), differing in complexity and expressivity. Thus the W3C has chartered a working group to develop an ontology and API for multimedia content on the Web. The group is developing a lightweight core set of metadata properties and an API specification for accessing these properties, which may come from metadata documents in different standards. Thus mappings to many relevant standards have also been specified. The set of metadata properties will be formalized for interoperability with the Semantic Web. A W3C recommendation is expected in 2010.
Werner Bailer is a researcher at the Institute of Information Systems of JOANNEUM RESEARCH and works in the area of audiovisual archiving, digital cinema production, digital film restoration and quality analysis and interactive TV. He is interested in image and video processing algorithms and metadata modelling for audiovisual content, currently being a member of the W3C Media Annotations WG. Since 2007 he is working on a PhD thesis on the topic of multimedia content abstraction at the Technical University of Graz.
2-4 December 2009 Graz, Austria
The 4th International Conference on Semantic and Digital Media Technologies (SAMT '09) targets at narrowing the large disparity between the low-level descriptors that can be computed automatically from multimedia content and the richness and subjectivity of semantics in user queries and human interpretations of audiovisual media - The Semantic Gap.
Comments
Add new comment