- What does your core technology actually do?
First, it discovers insights in the data that express how individuals, or pairs of individuals, are unique, exceptional, surprising, or different. Then, it writes up the insights in perfect English. Then, it visualizes the insights, depending in each case on the form of the insight.
- Did a computer really write all the insights seen on the applications?
Yes, although people programmed the computer. We people also had teachers who taught us vocabulary, composition, and grammar, but our writings are ours, not our teachers’.
- Is this hard to do?
It’s hard but not impossible. To do it, it’s good to bone up on (1) epistemology - What knowledge about individuals do people express, appreciate, and understand? (2) linguistics - What are alternate ways to accurately express knowledge, and which are best? and (3) computer science - what algorithms, heuristics, and architecture will do it all in software?
- What else can this be applied to besides colleges, hospitals, and baseball?
Good candidates are any structured data where people care about individual entries - versus caring only about general patterns - and care about what makes individuals special, unique, distinctive, surprising, unusual, flawed, etc. During its origins as a Carnegie Mellon research project, we developed prototypes for baseball players and teams, human genes, members of Congress, and world languages.
- New technologies often make unforeseen connections. Thoughts?
This technology makes plain, in excellent English, what is hidden in masses of data. When applied to usage or performance data, it promotes self-improvement by revealing exceptional outcomes - whether good, bad, ambivalent, or neutral, in the context of significant peer groups. When applied to public data, such as to hospitals or universities, it also encourages transparency in area of great public interest. All of this justifies our vision of Universal Betterment through Automated Benchmarking.
- I have all this data I’d like to explore for insights. Can you do something with it?
Maybe. Write to us.
- Can I use your data or insights in a commercial project?
Please refer to our Terms of Service.
- What are the References?
This section explains where the data comes from, or how we created the data ourselves, or sometimes the interpretations needed to assign an attribute value to an entry. For example, colleges historically have arisen, disappeared, merged, changed names, etc. so sometimes
interpretation is needed. Also, for baseball we needed a single overall measure of a player's achievement or team's success during a season, so we defined one ourselves.
- What happens when I choose to Select Attributes? What does that really do?
For the chosen entry, it limits the insights for that entry to those that include one of the selected attributes.
Showcase Technology Demos
- Where does your colleges data come from?
Most of the data comes from the federal IPEDS data as collected by the National Center for Education Statistics. Other sources refer to college rankings (both serious and fun), alumni who played NFL football, basketball Final Four appearances, property crimes, career
salaries, members of the National Academy of Sciences, Rhodes scholars, alumni who became U.S. Presidents, and even the number of search results that match a query topic directed at a college’s website. If you strongly believe that an attribute value associated
with a college is wrong, please refer to the data source which is cited in the Reference section below each result.
- How often do you update the college application insights?
The federal government issues updated data every year or two, and we intend to follow that schedule.
- What’s the Surprising feature?
People employ slightly different logic in stating why something is unique, distinctive, unusual, or surprising. We have studied this logic, codified it, and designed algorithms that employ it in the course of devising facts about individuals.
- What’s the difference between Similar and Neighbors?
Peers are similar if they share a large set of characteristics. Neighbors are just peers that are geographically nearby.
- I noticed that college A lists college B on its Similar page; but college B doesn’t list college A on its Similar page. Why not?
There's no guarantee that top-10 similarity will always be symmetric, although it usually is.
- What am I really looking at when I use the Combine option?
You’re asking this: How are these two entries (e.g., colleges) jointly unique or special?
- Can I use the Compare option to compare all entries vs all other entries?
Right now we just offer comparisons to the top-10 or so most similar, and to other data-specific peers, such as neighboring colleges, athletic conference rivals, or baseball players who played the same position for the same franchise.
- What’s the Profile section?
The profile is a verbal description of much of an entry's data, also written by software.
- Some of the college pages show a Rivals option. Is that related to sports?
If the college is in a football athletic conference - as last reported by the National Center for Education Statistics - which has at least six members, then clicking on Rivals will allow comparisons against each of the conference rivals.
- In the baseball application, a player entry is a player/team/year triplet. Why did you formulate the entries this way?
In baseball, it's common to compare a player's season to himself, or a team's season to itself, in other seasons, and we wanted to enable this. Sometimes an insight will refer to an attribute that doesn't change - or change much - across seasons, e.g., a player's height, birthplace, and throwing arm, or whether the player is a future manager. In those cases involving static attributes, we take account of the fact that careers span multiple seasons while generating the outputs, including the comparison peer groups.
- How does the colleges application differ from the baseball application?
The same software analyzes the data and generates the web application, but the underlying data models are different. For colleges, there is no time dimension. For baseball teams, a same entity (i.e., team) performs differently aross various year or seasons. The players data
model is the most elaborate, since players also perform across various years, but also belong to a team, possibly even multiple teams in a same year.
- What if a player is traded mid-season, or even traded again to the same team?
A player entry is a player/team/year and a team entry is a team/year. If a player is traded to a second team, then the statistics for that team will constitute a separate entry. If the player is then traded back to the first team (as has actually occurred in baseball history), then the player's statistics during the same team/season are combined into one entry.
- Where are you based?
Pittsburgh, Pennsylvania, USA.
- Where does this software run?
In the cloud. That is, on remote servers of standard capacity.
- Who are your investors, and are you looking for new ones?
We are self-funded and not looking for investors at this time.
- Where does your name come from?
The software occasionally generates statements of the form “Only A is both P and Q” or “Only A has both as many X and as many Y”. This latter formulation is relatively rare in human-written text, but is elegantly concise and precise, and so inspired our name. Furthermore, only OnlyBoth both discovers new insights in data and writes them up in perfect English.