Multivariate vs Univariate Analysis in the Pharma Industry: Analyzing Complex Data

Data Analytics
Jun 18, 2019  |  4 min read

The pharmaceutical industry, including R&D, manufacturing and also product sales and use, creates a lot of data. The question is, what can we do to understand our data better, get more out of it, and unlock its potential in the most rational way possible to get to the knowledge we need? And how can we gain control over our research, or the processes needed to generate a stable, reliable product that consistently meets regulatory requirements? The answer is Multivariate Data Analysis.

This article is posted on our Science Snippets Blog.

Gaining control and optimizing processes requires more than univariate data analysis: Multivariate data analysis is the key to meeting regulatory requirements.

The Challenge of Multiple Data Points

Mike Tobyn, Research Fellow at Bristol-Myers Squibb, leads an international team studying the physical properties of APIs and excipients. As a pharmaceutical materials scientist, he can be faced with determining the properties of an API composed of billions of crystals, starting with morphological methods that describe thousands of particles, each with 20 descriptors. The result is a database of thousands of materials each with millions of data points.

The challenge is to extract key information from this data to help in developing a robust method of formulation. Similar challenges must be met in many other areas, such as basic R&D, small molecule API production, excipients, upstream and downstream processes for large molecule production, and real-time process control.

As Tobyn found, extracting the information needed from such data requires modeling or data analytical techniques. And even when the system is understood, robustness needs to be established despite day-to-day changes in a large number of variables such as humidity, raw materials, new operators, and temperature.

Taking a univariate approach to data analysis, or looking at each data point one at a time with regression analysis, is not as effective as looking the interaction that all data points have on each other in influencing the outcomes.

Multivariate vs Univariate Data Analysis

As Tobyn points out, our world is dominated by multiple sources of data from complex, multivariate sources, which means that analysis of each individual parameter will not give the full picture. Historically, many pharmaceutical manufacturers took a univariate approach to evaluating and managing their R&D and production processes. But that’s no longer the most effective, valid way of doing things.

In pharmaceutical manufacturing processes, or indeed all manufacturing processes, it is vital that we understand the relationship between parameters, with a combination of factors generally being the cause of events, rather than individual parameters. This is why Multivariate Data Analysis (MVDA, or MVA) as used by Tobyn and his team, rather than univariate data analysis, has become the most commonly used method for extracting information from large data sets in the pharmaceutical industry.

This is a summary of Tobyn’s case for using MVDA in the pharmaceutical industry:

Unlike with univariate data analysis, in MVDA multiple variables are analyzed simultaneously and summarized using a few underlying latent variables, taking into account the relationship between parameters.

By compressing the variation contained in a large multivariate data set onto a smaller number of latent variables, you gain a simpler representation of the variability contained within the data. This enables easier interpretation of the key information contained in the data.

MVDA has a number of valuable properties for pharmaceutical R&D and manufacturing:

  • MVDA can cope with large amounts of data from a range of formats.
  • The analysis is transparent and reproducible, to aid internal understanding and also to meet regulatory requirements.
  • It is possible to validate the analysis and the models it builds, demonstrating the predictability and reproducibility of the method.
  • The method of analysis can evolve as new data is incorporated, and this evolution can be documented and validated.

MVDA can also be used to mine data from historical databases to help predict the properties of new materials. Process Analytical Technology (PAT) relies on MVDA, for example in monitoring any drift in the properties of excipients. In its most advanced form, MVDA can be used as a powerful tool in PAT to make the most of chemometrics for the release of materials in real time. Overall, MVDA is making major contributions across manufacturing, leading to complete end-to-end understanding.

An Aid to Regulatory Compliance

Any tools we use must fit into the framework of guidance that defines best practices to meet binding regulatory requirements. The transparency and relative ease of validation of MVDA, together with its power in helping communication with colleagues and regulators, has lead to the integration of MVDA into the regulatory framework. This means that, providing we have used MVDA correctly together with guiding documents, we can be confident that our conclusions are valid.

Getting Started

MVDA is a robust, transparent suite of techniques that can be used in a wide range of data-rich applications. Applying a robust multivariate analysis software to support MVDA will help optimize the process. If necessary, and when used correctly, MVDA can be validated and included in a workflow to better understand, and therefore control the production process for products. It is a powerful method, but to make the most of it means developing expertise, based on fundamental understanding and also practical experience, and there is a well-developed ecosystem of users and products to support this.

Mike Tobyn presented a webinar that explains more about why MVDA is an essential tool for today's pharmaceutical manufacturers: ‘Analyzing complex data in the pharmaceutical industry: The case for multivariate analysis.’

Want to Know More?

Further reading
Using advanced data analytics to make the shift to continuous process manufacturing

Multivariate Analysis in the Pharmaceutical Industry. Editors: Ana Patricia Ferreira, Jose C. Menezes, and Mike Tobyn. eBook ISBN: 9780128110669. Paperback ISBN: 9780128110652. Academic Press, 2018