Until recently, the validity of the results from statistical modeling rested upon meeting the model’s underlying assumptions and whether the sample accurately represented the whole population. Cost, lack of compute ability and the slow speed of analysis made it difficult or impossible to record, aggregate and analyze very large data sets. Sampling, theory and assumptions were necessary, but their accuracy was difficult to verify.
The recent advent of Big Data tools—high power computing combined with software to access, integrate and analyze data and report the results—means that we are no longer restricted to sampling, theory and assumption. These new tools allow us to completely supersede those assumptions and analyze entire data sets of enormous size.
New Tools let us Ask New Questions
Recent advances in technology not only give us the computing power to analyze these Big Data sets in ways that are cost-efficient and timely, they also greatly expand our access to data. The new integration applications can reach out to the cloud to gather data beyond our organizations, integrate that data in ways that make it useful, and cache the data in its analysis-ready state. New analytic tools allow us to look at the data in new ways and do it quickly and cheaply (relatively speaking). Combine all that with new, user-friendly reporting mechanisms, and you get insights that are actionable.
The real value of Big Data in the life sciences is that it gives us the confidence to act on our results. We no longer analyze samples and hope our results are true for the total population. We now know that the results apply to the entire population, because our data sets are so large that the biases and errors inherent in small data sets disappear. The data are our model and access to large data sets also allows us to verify the accuracy of our sampling techniques and the statistical models we use.
Data Sets Continue to Grow
To make full use of these new tools you need access to large data sets that encompass many variables. Fortunately the variety of data now being recorded is unlike anything we’ve seen before. Electronic medical records, wearable devices, social media, web clicks and a host of other sources allow us to look at variables that were once inaccessible. This allows us to ask new questions and look at old problems in new ways. The unexplained variability that we had to accept in the past can often be accounted for when these new variables are added to the data. And we now have the ability to test interactions between multiple variables quickly, efficiently and at low cost, expanding our knowledge and understanding.
"The recent advent of Big Data tools means that we are no longer restricted to sampling, theory and assumption"
Speed Turns Long Journeys into Day Trips
Beyond adding confidence to our answers, the new analytic power makes possible new journeys, especially in the area of genomics. The size of the data in a human genome is so immense that until a few years ago, analyzing even one genome could take months or even years. With the computing power and analytic techniques currently available, analysis of a tumor cell genome can be done in a matter of hours. The implications for drug development and for matching drugs to individual patients are enormous.
Even with the New Tools, you still have to Ask the Right Questions
A widely accepted recipe for effective use of analytics states “Right question+ large data set (Big Data) + simple model = actionable results.” Note that the first ingredient is the right question. If you ask questions that can’t be answered with the data at hand, or if the answers rely on variables that aren’t included in the analysis, the results will be meaningless. That’s why you need a collaboration between the business and clinical leaders, who know what problems need to be solved, and data scientists, who can help access and integrate the right data to answer the question.
Though not all IT departments have data scientists on staff, most life sciences organizations have data scientists in their research divisions. Use those experts to help craft analytics projects in non-medical areas, such as operations, finance, manufacturing and marketing, at least at the outset. Over the long term, CIOs should plan on adding data science skills to their staff, to help business and clinical leaders formulate questions appropriately and access the data needed.
A good consultant can also help you assess opportunities for data mining and analysis and help you select a software suite that covers all three major areas of concern: data access, integration and management; analytics; and reporting.
CIOs Should Lead
While there are many point solutions in the analytics market that can help answer questions in a limited realm, these can be more trouble than they are worth, because they tend to create and reinforce data silos. That’s the opposite of what you want. The future of analytics lies in the ability to use a wide scope of data for better insights, and data silos are anathema to that future.
However, if the CIO actively leads on analytics, partnering with the individual stakeholders and establishing a comprehensive data strategy, the organization can avoid the proliferation of these point solutions. Rational data governance and powerful, scalable and flexible analytics tools will build a program that will give you actionable insights. Those insights can help you be more competitive and more efficient, and, on a broader level, help you solve the biggest healthcare problem of all: how to achieve better outcomes at lower cost.