Arno Scharl: "Geographic platforms drive mash-ups and user-generated content"
26.02.2008
Arno Scharl heads the Department of New Media Technology at MODUL University Vienna. Andreas Blumauer (SWC) talked to him about the Geospatial Web, the US Election Monitor and semantic mashups.
INTERVIEW
SWC: Professor Scharl, your recent projects and publications are strongly related to a rather new research area: the Geospatial Web. What impact is the Geospatial Web going to have on our daily life, on the way we work together and on the way we consume media on the web?
The Geospatial Web seamlessly integrates very different types of information – from cartographic data (maps, street directories) and weather sensor readings to video feeds from points of interest, location-based news services, activities within virtual communities and environmental indicators such as emission levels and ozone concentrations. The sheer range of possible applications based on virtual globes revolutionizes the production, distribution and consumption of media products. For example, geographic platforms such as NASA World Wind, Google Earth and MS Virtual Earth are among the main drivers of mash-ups and user-generated content, recent phenomena commonly referred to as the “Web 2.0”.
SWC: The Geospatial Web can be seen as a possible area of application of the Semantic Web. How do you think are those two ways of collecting data, representing information and visualising complex relations going to influence each other?
The diminished lifespan of information calls for new methods and tools to aggregate and analyze this information in real time. Semantic technologies address this need and create large knowledge repositories, which call for adequate visual representations that are useful to both expert analysts and casual users. Assuming a certain level of geospatial literacy, virtual globes are an ideal platform for such visual presentations, as the thought that needs to be followed in information discovery tasks is often spatial in nature.
"Correctly processing humor and sarcasm are among the most challenging problems of automated sentiment detection."
SWC: One of your latest projects - US Election 2008 Web Monitor - has
attracted a lot of attention. Could you explain what the uses are of such an application and how semantic technologies have been implemented in this project?
The Election Monitor tracks the candidates’ performance on the campaign trail. Weekly snapshots of web coverage (about 800,000 documents) reveal regional differences and contrast the perceptions of international news media, Fortune 1000 companies, bloggers and environmental organizations. Besides tracking recent developments, users can also cast their votes for their preferred candidates. An automated process identifies attention by counting references to a candidate. Semantic technologies measure sentiment towards the candidate by looking for positive and negative expressions that co-occur with these references, compute keywords that reflect the most important topics associated with a candidate, and annotate each document contained in the knowledge repository (e.g. parsing the text to assign spatial coordinates and map news articles onto a geographic interface).
SWC: One of the exciting new features of the US Election 2008 Web Monitor is the way in which the "semantic orientation" of a sentence is calculated. How exactly does this work? Can this application also identify statements that are meant ironically?
The sentiment detection algorithm calculates the distance between references to candidates and 8,000 positive and negative words from a tagged dictionary, then assigns a sentiment value to each sentence containing a reference. These individual values are then aggregated on a per-candidate basis. Correctly processing humor and sarcasm are among the most challenging problems of automated sentiment detection. The current system is optimized for throughput and does not account for such expressions. Our newest research project RAVEN, just started in January and funded by the FIT-IT Semantic Systems program aims to increase the algorithm’s accuracy for these types of problems.
"We plan to release certain components such as the ontology visualization service under an open source license."
SWC: With another application called "Media Watch on Climate Change" you have
demonstrated the power of your technology. What further ideas and applications could be realized with your framework? Do you have any plans to open source at least parts of your technologies?
The technology demonstrated by the “Media Watch on Climate Change” is quite generic in nature and can be applied across many different domains and usage scenarios. The RAVEN project mentioned above, for example, will include corporate knowledge repositories in addition to public web content. Our next public web portal will focus on the tourism sector and should be available in the second quarter of 2008. And yes, we plan to release certain components under an open source license, e.g. the ontology visualization service or the component to export "knowledge planets" to NASA World Wind.
SWC: Just recently Reuters has opened up an exciting new web service to the
general public: OpenCalais could support the breakthrough of the semantic
web. Do you agree?
The OpenCalais annotation service certainly is an important step in the right direction and, thanks to its open API, will make its way into many semantic applications. Our team is currently evaluating the service and how it might complement our own research efforts, with a special focus on the upcoming releases R3 (multi-language support) and R4 (development environment).
SWC: Last question: Who will be the next president of the U.S.?
In terms of relative media attention as of February 18th, it is a close call between the two Democratic front-runners Hillary Clinton and Barrack Obama (about 25% each), followed by the Republican candidate John McCain with about 15% of mentions. In terms of sentiment, McCain has continually improved his position and took over the lead from Obama on February 11th. But it is a long time until November 2008, and the Election Monitor does not intend to predict the outcome of the current primaries or the actual election, but to contrast the perceptions of different stakeholder groups, and to reveal regional differences in coverage with a special focus on environmental issues.
SWC: Thank you, Prof. Scharl!
Arno Scharl heads the Department of New Media Technology at MODUL University Vienna. He edited two books in Springer’s Advanced Information and Knowledge Processing Series - "The Geospatial Web" and “Environmental Online Communication” - and founded the ECOresearch Network and served as co-chair of the 20th International Conference on Informatics for Environmental Protection. His current research interests focus on text mining, integrating semantic and geospatial Web technology, media monitoring, virtual communities and computer-mediated collaboration.
References
The Geospatial Web
MODUL University Vienna - New Media Technology
US Election 2008 Web Monitor
Media Watch on Climate Change
FIT-IT Semantic Systems
IDIOM Project
RAVEN Project







