October 26, 2014

Analysts are mostly working on the “known unknowns”, while the “unknown unknowns” remain the “holy grail” of analytics.

The concept of “knowns and unknowns” is attributed to Donald Rumsfeld, former US Defense Secretary, back in 2002:

“… there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.” (see for instance Wikipedia)

Rumsfeld spoke in reference to counter-terrorism intelligence, but the concept is applicable in “civilian” analytics, research, and Business Intelligence as well.

An unscientific model of knowns and unknowns, and the efforts, methods and tools existing in each space:

Obviously there are a lot of “known knowns” out there, often as openly available information on the internet (news reports, blog and web sites, wikis and so forth). Inside corporations there’s usually an abundance of reports and statistics on various business parameters like revenue, turnover, logistics and more.

Organizations are capable of developing “blind spots” that might be considered “unknown knowns”. This is knowledge that is forgotten, displaced, overlooked or misunderstood/misinterpreted, and needs to be rediscovered or re-learned. Modern information systems, databases and intranets help build a “collective memory” for the organization. Findability is often key in keeping “unknown knowns” from becoming a problem.

In analytics and Business Intelligence, the focus so far has been on the “known unknowns”. That is, analysts, researchers and data scientists are attempting to find answers to specific challenges facing the organization (or society at large). There are many tools and methods available: search engines, data mining, text mining, NLP, sentiment analysis, and more. Analysis of “known unknowns” can, for instance, find valuable information on customers and markets, identify causes of revenue changes, or map the spread of viral diseases. By using historical data and predictive modelling, analysts can forecast future developments and enable businesses to stay ahead of the game.

The “holy grail” of analytics, though, is to find the “unknown unknowns”: identify patterns in data that we didn’t set out looking for; get early warning of events we don’t know will occur; discover relations between people, places or entities we’re not even aware of yet.

In the era of Big Data, the quest for the “unknown unknowns” is gathering pace. Endless amounts of data, vast computing resources, and increasingly sophisticated algorithms make it possible to uncover hidden patterns and reveal unseen relationships.

Machine Learning (ML) and Artificial Intelligence (AI) are being developed as the primary technologies for dealing with the “unknown unknowns”. Open source software, standards for data formats and integration, cloud storage and cloud computing are enablers in this process.

Still, the human analyst has an important role to play in identifying “unknown unknowns”, determining their importance, and alerting decision makers. Employing Agile analytics methods, with the ability to “pivot” when data points you in a new direction, will be necessary.

By combining top-notch analytical capabilities with smart tools and Big Data, the “holy grail” of analytics may be within reach.

BlogAdminBlog, Big Data