Our technology discovers new insights from data, writes them up in perfect English, and visualizes them. All automated. Let's see how this technology fits into the broader picture.
One focus of data mining and big data has been making predictions, often to suggest an action. For example, is this customer likely to leave us in the near future? Is the air fare for my route likely to rise or decrease? Another focus is on finding general patterns or correlations: Young men who buy diapers in a supermarket on a Friday afternoon often also buy beer. But much human knowledge consists of factual statements about specific entities: There are 50 U.S. states. Mount Kilimanjaro is 5,895 meters high. The UK is a constitutional monarchy. Such knowledge doesn’t need data mining. You just need to observe, record, and be able to retrieve the facts.
On the other hand, let’s consider these:
MIT has the most members of the national Institute of Medicine of any university without a medical school.
If it were a country, California’s economy would be the tenth biggest in the world.
KIF26B is the only gene that both contains one of the top 20 accelerated exons and has highly significant acceleration in dN/dS.
Or, let’s say some Senator is the only member of his party up for re-election who voted for some bill.
To come up with these insights, reasoning is needed, whether by people or software. Also, since data attributes are not just numbers (age, income, etc.), but also symbol values (birthplace, profession, marital status, state) and even sets of values (languages spoken, colleges attended), data mining needs to discover factual knowledge that allows for diverse data attributes.
How about automated writing? Much less has been done there than in data mining, or even in automated reading, for which IBM’s Watson project achieved fame due to its triumph on the TV quiz show Jeopardy. Companies like Automated Insights and Narrative Science have been producing well-written stories that summarize recurring events like earnings reports and sports outcomes.
So there is a double technology gap of (1) factual knowledge discovery and (2) literate expression of ideas.
First conceived and developed at Carnegie Mellon in 1998 under a National Science Foundation grant, OnlyBoth’s technology aims to fill both gaps within the context of interesting and valuable applications on data tables, also known as structured data.
OnlyBoth can be seen as doing the reverse of IBM’s Watson and other ambitious text-mining projects, which read articles and compile databases of relations and properties. Instead, OnlyBoth goes from relations and properties to novel insights and then on to human-quality paragraphs.