Michael Alvers: "We believe that creating an ontology is a wise investment."
02.03.2007
The search market engine is highly competitive. Dr. Michael Alvers, CEO of the Germany Based corporation TRANSINSIGHT, has recently published the life science search engines GoPubMed and MeshPubMed. In an interview with Andreas Blumauer (SWS) he reveals some insights into the logic and success factors of special interest search and their applicability on the life sciences.
Mr. Alvers, you are the CEO of Transinsight in Dresden. When do you get contacted by enterprises and what customer needs can you solve?
Our customers approach us when they realize that traditional search technologies are no longer sufficient. Companies are drowning in data, and knowledge management solutions are often described as unclear and complicated.
We offer a solution which intelligently sorts search results into a “Table of Contents” – an ontology. We do this in a fully automated way, extremely fast and with very high accuracy. As a result the user saves time and data are sorted, all with no additional effort for the customer.
The market for text-mining is highly competitive. What should customers pay attention to when, for example, installing a powerful search engine?
We always convince our customers through simplicity and usability coupled with competence as regarding content. The feature of browsing search results using a simple tree creates confidence and transparency. On competitor systems, end users have difficulty judging whether search results of one machine are better than results of another. Imagine one provider claims to produce better results than another; this is very difficult to confirm. Who can and is willing to compare two long lists of results and judge the order of one list vs. the other? In our solution transparency is verifiably given.
In the life sciences it is also very important to know customers and be able to scientifically tailor solutions. Unilever has chosen us not only because we can deal with huge amounts of text and large ontologies, but also because definitely we understand their problems.
The phrase “Semantic Desktop” hints at future intelligent computer-assisted desktops. A central technology therein is automated classification. What is state-of-the-art in this regard? What basic technologies are used and how reliable are they?
For me there are generally two different approaches. The first is “on-the-fly clustering”, where search results are searched for patterns post-retrieval and are presented to the user as categories or even hierarchies. The second way is to classify documents into a previous knowledge network like an ontology. I believe in the second way! In my view it is not possible to structure search results out of themselves. For example: thousands of texts speak about neoplasm and thousands about cancer in context of the tumour suppressor protein P53. Clustering would likely find two clusters: “neoplasm” and “cancer.” But the point is that biologists know both concepts are synonyms. In an ontology this fact is of course be expressed.
The following example also shows the power of knowledge-based searches. Assume thousands of texts mention start-ups in Dresden, Berlin, Hamburg, Frankfurt etc. When searching for start-ups in Germany, one will find these articles only with the knowledge that the cities are German cities.
For our customers the possibility of being able to freely model certain domains is a great advantage. Scientists at Unilever want to classify their documents and search results exactly the way they want. So, for each research project, a tailored search machine can be built showing exactly what is relevant.
Of course the question remains, where do the ontologies come from? My answer is twofold: first, many ontologies already exist which can be used and, if necessary, be adopted. Secondly – and here the circle is closing – today good semi automated methods exist for creating domain-specific and context-sensitive ontologies through clustering. These generated “ontology suggestion” can be adapted to specific needs. We believe that creating an ontology is a wise investment. Not having to search in the real sense is a great advantage. An intelligent assistant, so to speak.
Search machines turn into finding machines and finally question answering systems. How long before we see an intelligent search machine able to answer questions given in natural language?
Our vision statement “Towards answering questions” shows that we are confident in aiming for a reachable goal. The spectrum of answering questions is wide: from “What’s the fon number of …?” to “How many investments took place in bio technology in 2006?” to “What is the meaning of life?”. One doesn’t have to start with the last question. Our aim is to make answering questions more comfortable than it is today. This need not confront the user with long lists of text snippets. The dialog should go far beyond the call-back “Did you mean…”. The following example is conceivable and soon to be realized in our search machines: “For the query for [the protein] Rab5, 12,345 articles were found. Rab5 has 30 synonyms for which 67,890 articles were found. The top authors in the context Rab5 are X, Y, and Z and most publications are found between 19XX and 200X.” Or, in typical case of get zero results for 5 query terms, which term should be dropped? Should the user test all combinations? No, an intelligent assistant should do this work for the user. Above all it is helpful to bring the search space home to the user!
SemWeb technologies and standards as proposed by the W3C are not obligatory to realise intelligent search machines. However, RDF is recognised as a “Lingua Franca” for information integration. How are you addressing this area?
I believe here it is like in times, when [in Germany] no “unique time” existed. Only with the implementation of rail roads did unique times become coercively necessary. The web brings everything together – now we are confronted with the challenge of unifying available data. RDF is a good start and we are monitoring the initiative. On the customer side I’d say adore is contained, because the focus is more on solving concrete search problems. I think, however, that RDF framework has a good chance of becoming the standard. We use OWL as interchange format.
The semantic web has to and will come about – one way or another. The value add a user would for example have to instruct an agent to find all offers for rental cars for a drive on the weekend with a mid-class convertible and get back structured results, would be gigantic. My personal wish would be an agent who collects all scientific TV shows every day and presents them in an info box – of course well-classified with an ontology.
Your company is positioned as a provider for search technologies for the life sciences. Recently you launched MeshPubMed and GoPubmed as intelligent search machines for the life sciences. What is the user benefit, and what other services result from it?
This example shows the benefit best: assume you are looking for documents containing information about P53 and disease. In MeSH over 4,000 diseases are hierarchically listed under the term disease. With the query term P53, and one mouse click in our system, you have the answer to your question. Try doing this with traditional search engines!
Or try to navigate the ontology – in this case MeSH – and observe the over 45,000 articles that you get with traditional search engines in the context of, for example, disease, cardiovascular disease und heart disease. With three mouse clicks you narrow down the 45,000 results to approximately 45 relevant to this context. Try finding these articles with traditional search engines. You’ll observe that you will need a) much more time (at least one hour) and b) need extremely strong expertise in molecular biology and medicine. With our technology, questions can be answered much more easily than with today’s machines.
You were an exhibitor at SEMANTIC 2006. Will you join SEMANTIC 2007 in Graz?
Of course we’ll be joining Semantic 2007 in Graz!







