
Our customers approach us when they realize that traditional search technologies are no longer sufficient. Companies are drowning in data, and knowledge management solutions are often described as unclear and complicated.
We offer a solution which intelligently sorts search results into a “Table of Contents” – an ontology. We do this in a fully automated way, extremely fast and with very high accuracy. As a result the user saves time and data are sorted, all with no additional effort for the customer.
We always convince our customers through simplicity and usability coupled with competence as regarding content. The feature of browsing search results using a simple tree creates confidence and transparency. On competitor systems, end users have difficulty judging whether search results of one machine are better than results of another. Imagine one provider claims to produce better results than another; this is very difficult to confirm. Who can and is willing to compare two long lists of results and judge the order of one list vs. the other? In our solution transparency is verifiably given.
In the life sciences it is also very important to know customers and be able to scientifically tailor solutions. Unilever has chosen us not only because we can deal with huge amounts of text and large ontologies, but also because definitely we understand their problems.
For me there are generally two different approaches. The first is “on-the-fly clustering”, where search results are searched for patterns post-retrieval and are presented to the user as categories or even hierarchies. The second way is to classify documents into a previous knowledge network like an ontology. I believe in the second way! In my view it is not possible to structure search results out of themselves. For example: thousands of texts speak about neoplasm and thousands about cancer in context of the tumour suppressor protein P53. Clustering would likely find two clusters: “neoplasm” and “cancer.” But the point is that biologists know both concepts are synonyms. In an ontology this fact is of course be expressed.
The following example also shows the power of knowledge-based searches. Assume thousands of texts mention start-ups in Dresden, Berlin, Hamburg, Frankfurt etc. When searching for start-ups in Germany, one will find these articles only with the knowledge that the cities are German cities.
For our customers the possibility of being able to freely model certain domains is a great advantage. Scientists at Unilever want to classify their documents and search results exactly the way they want. So, for each research project, a tailored search machine can be built showing exactly what is relevant.
Of course the question remains, where do the ontologies come from? My answer is twofold: first, many ontologies already exist which can be used and, if necessary, be adopted. Secondly – and here the circle is closing – today good semi automated methods exist for creating domain-specific and context-sensitive ontologies through clustering. These generated “ontology suggestion” can be adapted to specific needs. We believe that creating an ontology is a wise investment. Not having to search in the real sense is a great advantage. An intelligent assistant, so to speak.
Our vision statement “Towards answering questions” shows that we are confident in aiming for a reachable goal. The spectrum of answering questions is wide: from “What’s the fon number of …?” to “How many investments took place in bio technology in 2006?” to “What is the meaning of life?”. One doesn’t have to start with the last question. Our aim is to make answering questions more comfortable than it is today. This need not confront the user with long lists of text snippets. The dialog should go far beyond the call-back “Did you mean…”. The following example is conceivable and soon to be realized in our search machines: “For the query for [the protein] Rab5, 12,345 articles were found. Rab5 has 30 synonyms for which 67,890 articles were found. The top authors in the context Rab5 are X, Y, and Z and most publications are found between 19XX and 200X.” Or, in typical case of get zero results for 5 query terms, which term should be dropped? Should the user test all combinations? No, an intelligent assistant should do this work for the user. Above all it is helpful to bring the search space home to the user!
I believe here it is like in times, when [in Germany] no “unique time” existed. Only with the implementation of rail roads did unique times become coercively necessary. The web brings everything together – now we are confronted with the challenge of unifying available data. RDF is a good start and we are monitoring the initiative. On the customer side I’d say adore is contained, because the focus is more on solving concrete search problems. I think, however, that RDF framework has a good chance of becoming the standard. We use OWL as interchange format.
The semantic web has to and will come about – one way or another. The value add a user would for example have to instruct an agent to find all offers for rental cars for a drive on the weekend with a mid-class convertible and get back structured results, would be gigantic. My personal wish would be an agent who collects all scientific TV shows every day and presents them in an info box – of course well-classified with an ontology.
This example shows the benefit best: assume you are looking for documents containing information about P53 and disease. In MeSH over 4,000 diseases are hierarchically listed under the term disease. With the query term P53, and one mouse click in our system, you have the answer to your question. Try doing this with traditional search engines!
Or try to navigate the ontology – in this case MeSH – and observe the over 45,000 articles that you get with traditional search engines in the context of, for example, disease, cardiovascular disease und heart disease. With three mouse clicks you narrow down the 45,000 results to approximately 45 relevant to this context. Try finding these articles with traditional search engines. You’ll observe that you will need a) much more time (at least one hour) and b) need extremely strong expertise in molecular biology and medicine. With our technology, questions can be answered much more easily than with today’s machines.
Of course we’ll be joining Semantic 2007 in Graz!
Comments
Add new comment