How Data Analytics Supports Commercialization of Omics Research

Data Analytics
Jan 13, 2021  |  7 min read

In the post-genomic era, biological studies that lead to new pharmaceutical successes are often characterized by a series of technologies known collectively as "omics", which include fields such as genomics, proteomics, metabolomics, lipidomics, cytomics, phenomics and others. Omics studies rely on large-scale analyses of biological samples using high through-put analytical approaches and bioinformatics, which produce vast amounts of data that are complicated and time-consuming to interpret. 

This article is posted on our Science Snippets Blog.


Making sense of Omics data requires advanced methods of data reduction and visualization techniques that can only be achieved with powerful data analytics tools, methods and software. The practical application of Omics techniques in pharmaceutical research are essential to drug target discovery, toxicity evaluation, personalized medicine, and new applications for traditional medicines.

Similar to other bio-analytical methods, Omics data are characterized by a large number of correlated variables and relatively few observations. Multivariate methods distill the important information from these data into relevant insights by finding the correlations that exist among the variables. 

This article contains additional information that will help your scientists, researchers and data analytics understand how multivariate data analytics (MVDA) tools can support biopharma commercialization and speed time to market for drug discovery, toxicity evaluation and more. 

Get SIMCA Free Trial

How MVDA Supports Omics Research

SIMCA multivariate data analytics software provides:

  • Multivariate projection methods to simplify complex Omics data 
  • Powerful visualization techniques for spotting outliers and patterns (using PCA)
  • Tools to recognize classes and identify the genes, proteins or metabolites responsible for differences (OPLS-DA)

In short, SIMCA makes analysis of Omics data easy and is designed for use by scientists.

Common Areas of Application

Toxicological Studies – Finding Differences Between Groups 

A common objective in Omics studies is to differentiate between groups of individuals (for example, the control vs treated or diseased vs healthy) with the aim of early diagnosis or early detection of toxic effects in drug candidates. PCA is a data visualization technique that highlights outliers, trends and clusters within data. 

Managing Variation Between Plates, Chips and Gels 

One of the difficulties with Omics studies is producing consistent results on different plates, chips, gels, days etc. Using techniques borrowed from multivariate statistical process control (MSPC), it is possible to track experimental variation even with massive datasets. Outliers and time drifts may be monitored and experimental procedures tightened to ensure continuous improvement in reproducibility. 

Quality Control and Biological Variation 

Due to the complexity of the analytical techniques used in 0mics, a key question is whether the variation between individuals is more or less than the experimental error? By including repeats and standards in such studies, multivariate methods can be used to highlight the dominant sources of variation with ease. 

The Importance of Pre-Processing Data

When it comes to working with biological data and Omics data, one of the complications and factors that slows understanding of results, can be the need to get all data into a similar format. The need to pre-process data manually can be eliminated used advanced data analytics tools like SIMCA.  

Advanced data analytics software like SIMCA uses a number workhorses, such as PCA, PLS, OPLS and O2PLS. These form the basis of multivariate data analysis. Their most important use is to represent one or more multivariate data tables as a smaller list of summary indices (latent variables) in order to observe trends and outliers. This overview may uncover the relationships between observations (rows of the data tables) and variables (columns of the data tables), and among the data tables themselves.

However, standardizing and regularizing (or "pre-processing") the data prior to starting any data analytics modeling is crucial. Before PCA, PLS, OPLS and O2PLS can be performed, the original data must be transformed into a form suitable for analysis. That means reshaping the data in order to fulfill important assumptions. Pre-processing the data can make the difference between a useful model and no model at all. 

Some methods of pre-processing data include:

  • Scaling
  • Mean-centering
  • Transformation
  • Advanced scaling
  • Data correction and compression

The latest version of SIMCA (16) removed the need to pre-process data with a tool called Multiblock Orthogonal Component Analysis (MOCA).

Want to test how it works ?

Get SIMCA Free Trial

Scaling of data

Variables often have substantially different numerical ranges. A variable with a large range has a large initial variance, whereas a variable with a small range has a small initial variance. Since PCA, PLS, OPLS and O2PLS are maximum variance or co-variance projections, it follows that a variable with a large initial variance is more likely to be expressed in the modeling than a low-variance variable. 

Unit variance scaling

There are many ways to scale the data, but the most common technique is unit variance (UV) scaling. UV-scaling is default in SIMCA, but how is it done? For each variable (column), you calculate the standard deviation and obtain the scaling weight as the inverse standard deviation. Subsequently, each column of X is multiplied by the inverse standard deviation value. Each scaled variable then has equal variance, unit variance.

(above) The affect of variance scaling is shown. The vertical axis represents the “length” of the variables and their numerical values. Each bar corresponds to one variable and the short horizontal line inside each bar represents the mean value. Prior to any pre-processing, the variables have different variances and mean values. After scaling to unit variance, the “length” of each variable is identical. The mean values still remain different however.

Like any projection method, PCA is sensitive to scaling. That means that by modifying the variance of the variables, it is possible to attribute different importance to them. This gives the possibility of down-weighting irrelevant or noisy variables. However, you must be careful to avoid scaling subjectively to achieve the model you want, rather than objectively.

Mean-centering data

Mean-centering is the second part of the standard procedure for pre-processing data. With mean-centering, the average value of each variable is calculated and then subtracted from the data. This improves the interpretability of the model. A graphic interpretation of mean-centering is shown below.
 


(above) After mean-centering and unit variance scaling, all variables will have equal “length” and mean value zero. Another name for this scaling method is “auto-scaling.”

Skins enable scaling, or mean-centering

Mean-centering and UV-scaling procedures are applied by default when using SIMCA software. However, it's possible to enable other default settings by using special Omics or spectroscopy "skins" in SIMCA (custom views available in SIMCA).  For example, when running the spectroscopy skin, mean-centering but not scaling is default. When running the Omics skin, depending on which data is imported, either mean-centering only or Pareto scaling can be chosen. Being able to modify the default settings and select different plotting configurations is one of the reasons to use these skins.

Omics Skin in SIMCA

An Omics skin is a customized view within SIMCA® software designed to help people who typically work in various biological fields such as proteomics, genomics, metabolomics or transcriptomics. The Omics skin graphical user interface (GUI) is specifically designed to help with the complex analysis of biological or gene data obtained through methods such as mass spectrometry.

Watch the video to learn more about OMICS skin

Watch the Omics Video
 
Download an OMICS course

Learn more about data analytics for Omics. Download this free course from Sartorius Data Analytics.

Download Omics Course

Easy Data Analysis for Novices and Pros Alike

If you’re a biologist, an analytical chemist, or another type of researcher, the Omics skin may be just the solution you need to help you gain meaningful insights from your carefully collected data. If you’re used to working with high-tech microarray instruments, and gathering a lot of intricate data, you may have the sort of complex analytics needs that could be achieved more successfully using an Omics skin.

The key benefit of using an Omics skin is that with only a minimum of training, and only a basic understanding of multivariate data analysis (MVDA), you can swiftly turn your data set into a list of discriminating variables that would help you further your research.
In addition to reliable data analytics, you’ll more easily be able to identify a short list of potential biomarkers or discriminating variables that separate the groups of samples in a way that is meaningful.

A Wizard Makes It Easier

One of the beneficial elements of the Omics skin is that it includes a wizard. The analysis wizard uses a workflow that can guide even an inexperienced user all the way from data import through key analysis over to a report of the most interesting findings. The wizard and skin were developed in a way that means you do not necessarily have to know a lot about multivariate data analysis in order to use it successfully.

Using a custom-built wizard and Omics skin allows you to focus on the specific data that is useful in evaluating biological samples and data such as MS, NMR, identified metabolites and chromatographic data, but any data type can be analyzed. If you are working in an omics field, the SIMCA Omics skin could be the solution you need to get the right information from your data.

Watch this video to learn more about the spectroscopy skin

Spectroscopy Skin Video

View the Demo

Want to know more? View the online video demo of the OMICS skin now. 
Watch the Omics Video

Learn More: Download These OMICS Exercises
Download Omics Exercises


Subscribe to Get Updates From Sartorius Data Analytics

Subscribe to Get Updates From Sartorius Data Analytics