For the future, Milward says, "We're always trying to improve quality, get more precise answers to people's questions, and rule out noise in the results." Linguamatics is also looking working on improved integration, so that structured and unstructured data can be mined together – and scalability.
"There are 20 million relevant articles in the biological domain," says Milward. "And if you're going into social media, for example, there are one billion tweets a week. It's huge amounts of information and what we're trying to do typically is pull out bits of information from that." Some may already be known. But more valuable is, "creating new knowledge by combining documents from different sources".
In a business context, that might mean finding companies A and B linked in one document, and B and C in another. "In the science of genes and diseases, you link across from a particular compound to a particular disease through a particular gene you didn't know about before. So we're pulling together known bits of information but getting out a new hypothesis not known in the literature before."
But the big challenges facing the company are speed and accessibility – providing a graphical interface to help users create queries the computer can understand.
"You don't ask a question in English because the machine would have to interpret the questions as well as interpreting the text," he says. "In the past you had to be a programmer, a linguist, and a domain expert." Instead, "We wanted to have a toolbox where you can match different techniques."
A user might, for example, search rather generally about a particular disease at the document level but want to match that with a precise drug dosage pattern within those documents. In the long term, understanding the chain of reasoning within a document and its context will also be important challenges. "We need to have that sort of contextual information because people write papers based on common understanding and common knowledge. That's a very hard problem to tackle."
None of that is what Google does. As web searching becomes more mainstream, increasingly search engines are fine-tuning their algorithms and interfaces to speed the most common searches for information that's already known.
"For businesses," says Milward, "often the issue is trying to find new information that's less well-known and get new discoveries." The key thing, he says, "is that you can tailor what you get back according to what you need. And that you can pick up weak signals across a large amount of data – things you wouldn't have noticed in any one document." µ
Tags: Software
Definition of Officer : One who holds an office of authority or trust in an organization, such as a corporation or government.
Talking of language, who came up with the term CTO? Chief Technology 'Officer'? and that's for people not in the army, so why not chief technology manager? What kind of nonsense is using officer? Or was it originally some joke on the people working in offices, to quip with using 'officer'?