Corporate news

Tassilo Pellegrini's picture

"Any open data strategy needs goals, but also policies and governance structures for information sharing"

05. May 2008, by Tassilo Pellegrini

Ken North - Portrait

Increased awareness for the benefits of Linking Open Data

SWC: Ken, you are Co-Chair of this year's LinkedData Planet conference - could you tell us a bit more about this event, what audience are you expecting?

Ken North: The launch of LinkedData Planet is recognition that semantic technologies and data sharing have become important to both the enterprise and web computing communities. Our core theme is the confluence of enterprise and Internet computing, a phenomenon that’s contributing to the evolution of the Web from linked documents to linked data.

Linked data is a continuation of a historical trend, a long-standing recognition of data as an asset and a desire for data sharing. Several decades ago overcoming the limitations of application-specific data silos was the prime reason the concept of the database and database management system came into existence. Today that same motivation spurs linked data development, but we have a pipeline, the Internet, that enables information to flow globally.


"The motivation is the same as decades ago - overcoming the limitations of application specific data silos - but now with the Internet we have a pipeline that enables information to flow globally."


Our audience is rather broad and it might be described as the 2.0/3.0 crowd. Version numbers don't convey precise meaning in this case, but generally speaking, we appeal to the Enterprise 2.0 and Web 3.0 audience. We offer a program for people interested in semantic technologies and information retrieval - for mashups, social networks and data integration for SOA, enterprise and web applications and services.
 

SWC: LinkingOpenData is one of the major efforts at W3C to make the Semantic Web more popular and useful. And interoperability is an important precondition to develop value-added services on mashed data. Just recently representatives from Google, Facebook and Plaxo have joined the Data Portability Working Group. Is there a change in consciousness taking place?

Ken North: Yes. It helps that Sir Tim Berners-Lee has been an advocate because, as a high-profile innovator, he is able to keep a spotlight focused on certain technologies. The size of the MySpace and Facebook communities has proven the continued evolution of the web offers tremendous potential for connecting people and sharing information.

The visibility of data providers such as Google Maps contributed to the surge of interest in taking enterprise computing and web computing to the next level. The potential is staggering when you consider, for example, that governments worldwide have a responsibility to make information available to billions of citizens.
 

SWC: In many domains keeping data closed is a business strategy to generate lock-in effects and protect IPRs. Hence opening up data is a strategic economic decision. What motives drive research communities and companies to open up their data and make it mashable?

Ken North: Research institutions share information because it helps to reach their goals. Research often requires a collaborative process, such as sequencing the human genome. And businesses open up data when it's helpful in selling services and products, servicing customers, and making life easier for employees and partners.


"It's not a new requirement that we need a policy for information sharing and publishing: The guidelines that  applied to scientific journals and academic papers are applicable to open data on the web."


We've learned the Internet provides unprecedented access to a global community, one that includes criminals and malevolent users. So opening up data is a process that must weigh the risks against the benefits. Businesses and research institutions must adopt formal policies to govern the publication of data, but that's not a new requirement. Organizations and individuals have dealt with this issue in the past when deciding what information to share via scientific journals, academic papers and conference presentations. The guidelines for those forms of publishing data are applicable to the open data on the web scenario.

The future of semantic web based services

SWC: Just recently Google has started to publish services which are based on semantic web specifications ("Social Graph API). A bit later Reuters has published OpenCalais - another cornerstone for the realisation of Tim Berners-Lee´s semantic web visions and Yahoo!´s open search strategy is applying semantic web standards, too. It seems like the Semantic Web is taking off in 2008. Which new semantic web based services do you think will be the next ones?

Ken North: One area where I see progress is an uptick of adoption of semantic technology for discovery of web services and grid services. Web Services Discovery Language (WSDL) is reminiscent of Web 0.5, when the results of web searches often depended on a webmaster's ability to code META tags.

Encoding data using micro-formats, RDF, OWL or other schemes is one approach to improving information-finding capabilities. But we'll see progress from the machine intelligence community, which we're exploring with presentations at LinkedData Planet.


"We are going to need information-finding capabilities for rich types, such as audio and video, not just for text documents and simple data types."


The movement to impart semantics and improve our information retrieval technology has been largely oriented towards text. There's been explosive growth on the Internet of rich types, such as video, and there are specialized engines for searching video and audio. The Grid developed for CERN's Large Hadron Collider project has shown we'll use a much larger pipe in the future, with even more possibilities for serving video and other rich data.

That volume of digital information increases the need for intelligent search and information retrieval capabilities by an order of magnitude. We're going to need improved capabilities for imparting semantics for webs of multimedia data and composite documents, not just text documents and simple data types.

Decency, Privacy and Open Data

SWC: In Europe, there is a strong awareness in regard to decent data use andcprivacy. While Open Data is generally applied to non-critical data in regard tocpersonal information (like the CIA Factbook or DBPedia) the unreflected turnctowards open data might blur the boundaries between the private and the publiccsphere. Are these kinds of ethical issues addressed by the LinkingOpenData-Community? And what conclusions do you draw?

Ken North: Some technology solutions are found more easily than the solution to sociological problems the technology introduces. The technology solution for linking data and open data are at hand, but we need to become a lot smarter to solve the sociological challenges. Commodity computers and the Internet infected the developed world with the technology bug and helped spur a modern Industrial Revolution. There have been incalculable benefits but the Internet also enabled creation of an entirely new industry that steals and sells credit card data and other confidential information.


"We need developers familiar with EU privacy laws as thought leaders on ethical issues related to publishing open data, and global awareness of privacy issues."


Besides the problem of criminal enterprises, governments have failed to safeguard data that facilitates fraud and identity theft. Organizations have also been negligent with medical records. So we must be aware of potential for abuse and avoid publishing open data that can be exploited by malevolent users for identity theft, denying insurance claims, manipulating stock prices or other nefarious purposes.

Many members of the emergent Linked Data and Open Data developer communities are Europeans who are familiar with EU privacy laws. We need them to serve as thought leaders on ethical issues related to publishing open data, although we’ll still need global awareness of privacy issues. We also need an international agreement that provides a process for resolving disputes about the accuracy or removal of private and confidential data.
 

SWC: What are your recommendations for an open data strategy?

Ken North: Establish goals and priorities for data sharing. Establish policies and a governance structure to review what information to share with different communities, such as citizens, customers, partners, suppliers, sister companies and the Web community.

Organizations can throw hardware at problems so supporting open data doesn't constrain critical applications. Invest in information servers, disk clusters, and load balancing.

About Ken North

Ken North is a consultant, author, industry veteran and company founder (www.KNComputing.com). He's developed systems, taught seminars and consulted in North America, Europe, Asia and South America. His systems programming background includes database management systems (DBMS), compiler extensions, query processors, terminal emulators, operating system security monitors and communications software for embedded systems. His projects have been diverse, including real-time tracking software for Apollo spacecraft, fixed asset accounting, transportation scheduling, stock trading, B2B order entry, and systems integration for TDRSS satellites and ground systems.

Ken advised conference producers and chaired conferences in North America and Europe. He programmed content for NextWare, DataServices World, Web Services Security, LinkedData Planet and the XML DevCon conference series. In this capacity, he's been able to identify trends, emerging technologies and heightened interest by early adopters. He believes the Linked Data phenomenon represents a natural progression, with web data benefiting from the maturity of database and query technology.

Ken recalled that even in 2001 there was strong interest in conference presentations about topic maps, RDF and grids serving data. He noted the subject of integrating SQL data and web data was of great interest to conference attendees by the late 1990s, with linked data being an indication of a long-standing desire to provide optimal solutions for querying and integrating database information for web users.

References

LinkedData Planet (conference website)

Ken North Computing

The World Wide Web Consortium (W3C)

The Data Portability Working Group

Google Social Graph API

Reuters OpenCalais

CERN - Large Hadron Collider Project

CIA Factbook

DBpedia

Jupiter Media Events

Mon, 05/05/2008

Comments

Add new comment