Data Analytics
Jul 12, 2018
| 12 min read

How to Use Data Analytics to Understand Consumer Preferences

Product development and innovation are important elements for the survival of many companies. Whether introducing a new food flavor or adding new product features, understanding consumer preferences can help guide both design and production decisions. The right decisions can make a product launch more successful, and ultimately more profitable.

But making sense of a myriad of data points and information about consumer preferences across a number of different product concepts or versions can be complicated. That’s where multivariate data analysis, specifically conjoint analysis using OPLS (or orthogonal partial least squares) and O2PLS (an extension of OPLS) can be useful. Short for consider jointly, conjoint analysis is a technique based on design of experiments (DOE), whereby a product’s key attributes are modified according to a design or plan. Customers are then asked for their opinions about the different versions of the product.

This article is posted on our Science Snippets Blog.

Preference mapping using conjoint analysis can help companies decide which factors are most important when creating new products or adding features that customers want, such as choosing between using digital or analog controls.

In addition, preference mapping using OPLS and O2PLS can find relationships between customer preferences and sensory attributes across a number of product versions or types of food, for example.

Preference Mapping to Determine Product Features

Preference mapping is useful in market investigations to understand which products are better appreciated than others. We can use it to answer questions such as:

  • Which particular characteristics or attributes of a product influence consumer preferences?
  • Are there any significant segments in the preference profiles?
  • Which sensory properties affect consumer preferences?

How preference mapping is implemented depends on the type of product, the field of application and the type of data you have available. For example, it can be used to better understand which particular technical characteristics of a product influence preferences and determine if there is any way to segment customers according to different preference profiles.

In addition, sometimes you want to co-analyze consumer preference data against other types of data, perhaps external data such as physical and chemical measurements, ingredient concentrations, manufacturing variables, material attributes and so on. All of this results in a large set of disjointed data tables that require advanced data analytics tools to manage, compare and contrast the information residing in those data tables.

We use O2PLS to help us uncover information overlap that is present in multiple blocks of data.

What Are OPLS and O2PLS?

OPLS (or orthogonal partial least squares) analysis is a type of data analysis used to model multiple Xs and multiple Ys. OPLS is used for the regression problem. We here use an extension of OPLS called O2PLS. O2PLS is used for the data integration problem — for finding joint information (information found in all data tables) and unique information (information found only in ONE data table).

The O2PLS model has three compartments (as seen in illustration above):

1. The X-Y Joint Variation. The center part (in green) may be familiar if you have been working with the orthogonal partial least squares (OPLS) model. Here you have scores and loadings of what we call the predictive components, which contain or express the information that is found in both blocks of data at the same time (the joint information, or the information overlap).

2. Orthogonal in X. On the left are one or more components expressing information that is only found in the X block and is not linearly correlated to anything on the Y block. We call these orthogonal in X components, and they account for the unique information in the X part.

3. Orthogonal in Y. Similarly, on the right side, we have the components that are found only in the Y block. They express the information that is unique to Y.

Two Examples of Preference Mapping

Let’s take a closer look at these two kinds of analysis: conjoint analysis and preference mapping, using two of tutorial datasets where we determine customer preferences by aligning attributes of the X and Y data.

First, we’ll look at how conjoint analysis can be used to screen preliminary product versions (concepts) in order to highlight respondent preferences. We’ll use product attributes as the X data and the respondent preferences as the Y data to see if there is some overlap between the two blocks.

Then, we’ll look at how preference mapping can be used to find relationships between sensory data and consumer data in understanding which sensory characteristics drive consumer preferences. In this second example, using varieties of apples, the X block will be verdicts from sensory judges and the Y block will be likings as expressed by the consumers tasting the apples.

Example 1: Sensor for Control of Indoor Climate

In this example, we use a design of experiments (DOE) approach along with conjoint analysis to find out which characteristics of the climate control sensor are important for the customer.

First, we present preliminary product versions (concepts) to respondents to determine their preferences. Then, we use data analytics to find the most significant product attributes across a broad range of possible factors from design styles to features to price.

In this case, the DOE product investigation involved five factors. Four of them were changed according to a full factorial design and the fifth was overlaid for a total of 16 test versions.

The five factors investigated included:

1. Price (“look”). The price is of interest to the customer. However, a user might be less sensitive to the price, than the homeowner is (who is paying for it). Price also affects whether the product has more of a luxury look to it or an unsophisticated appearance.

2. Status indicator. The status indicator (light) is used to see whether the sensor is in operating mode or not.

3. Logotype (Brand). The brand is usually of great importance when choosing a product. Both the company’s logotype and a competitor’s were tested. In this blog the brand information is blinded.

4. Control screens. The control screens of the device can be made in either digital or analog models.

5. Product design. The product design (“shape”) is important since the climate sensor will be mounted in a well-exposed position in a room. Four different product designs were tested in the survey: rectangular, square, circular and beveled (square form with slanted edges). Each product shape was spread across each of the 16 concepts. So each product shape was tested four times.

In this experiment, 32 respondents were surveyed. They were asked their opinions of the 16 product concepts. They provided ratings from 1-16 (1 is the best, 16 is the worst) for each of the sensors, ranking them in order of preference. The group included:

  • Seven users (labeled U1-U7)
  • Seven consultants/resellers (labeled C1-C7)
  • Twelve company employees (labeled E1-E12)
  • Six house owners (labeled O1-O6)

We can use O2PLS to search for information overlap between the product attributes (X block) and the respondent likings (Y block).

In this case, O2PLS results in a model with a total of 11 components (2 + 4 + 5). Using the compartmentalized division it has 2 predictive components, four components that are orthogonal in X and five that are orthogonal in Y components.

While we can’t look at all of the variables, we can start with two score vectors. Here (image above) we see 16 points because we have 16 product concepts (16 versions of the sensor). We can see a split here: the right hand side of the cluster has the low numbers 1-8, and the left hand side is the higher numbers 9-16. This split coincides with the DOE-factor that is represented in a related plus-minus way, in this case, whether the control screen is analog or digital. So that is the most strongly observed feature of the sensors to which the consumers are reacting.

Which attribute is most important?

Additionally, we can look at the loading plot to figure out which of the attributes is most important.

On this loading plot the 16 attributes are given names (price, brand, status, etc.) and the letter/number combinations (C1-5, U1-5, E1-5, etc.) represent the individual respondents.

If we start with the brand (Brand KL, Brand AB), we can see that because these points are very close to the origin, it means that the logotype has very little influence. The respondents don't react to that element.

The price (Price High, Price Low) is something that the respondents react more strongly too. If points are at the top of the grid, they have been weighted up (and on this scale are less popular — remember that 16 is least liked). The respondents have up weighted the price and that means they prefer a less expensive product.

And an even more important product attribute is the control screen (Control screen), which is on the far left (analog) and far bottom right (digital) of the grid.

Then we have the status indicator (Status indicator), which is also relatively close to the origin, so it’s not something to which the respondents react very strongly.

But there is one more attribute that somehow causes people to react, and that is the product design (Product design), which has four settings. The one at the top is the circular design. So that means the respondents really do not like the circular design. They prefer the square or beveled. The product design rectangular is close to the origin and that means that respondents are more or less indifferent to it.

So in summary, our conclusions for the manufacturer of this product are:

  • Control panel is the most important attribute. Both analog and digital are equally well liked so the manufacturer should make one product with digital and one with analog screens.
  • The sophistication of the look (price) has some influence so we would recommend that the manufacturer use the less sophisticated (less expensive-looking) design.
  • The manufacturer should avoid using a circular shape for the climate sensor (beveled or square are preferred).

Example 2: Preference Mapping for Apple Varieties

In this second example, we create a preference mapping model for 13 varieties of apples using data from 12 sensory judges and consumers. We have:

  • 70 sensory attributes (X-variables) across 12 judges (70 x 12 data points)
  • 108 consumer likings (Y-variables), expressed on a nine-grade scale (here 9 is high and one is low).

This grading scale in this example is different than the previous example because in that one each number from 1 to 16 had to be used in a ranking, but in this case it's a grading scale and the scores are reversed with 9 being high (best) and 1 being low (worst). We have to remember that when we do the model interpretation.

The sensory attributes recorded included:

1_ is “First Bite”Attribute

E_ is “External Appearance”attribute

EA_ is External Aroma

A_ is Astringent aftertaste

F_ is Flavor

I_ is Internal appearance

T_ is texture.

We can use O2PLS to co-analyze the sensory judges and consumer liking data for these 13 apples and create a preference map. In this case, the O2PLS analysis results in a 4 + 0 + 2 model.

Let’s start by interpreting the joint information. Every point on the plot below represents one of these 13 apple varieties.

We can see that one apple variety, Egremont Russet, is situated far (bottom left) from the main cluster of the apples, meaning that this apple is very different from the others. In order to find out what drives the separation, we have to look at the loadings (the attributes). What we find out with interpretation of the attributes on the loading scatter plot (below) is that the Egremont Russet is not as juicy as the other apples.

(X-part of the loading plot – sensory attributes)

There are some drivers for this and some important sensory attributes are flavor attributes, like almond, pear-like and stale, pear-like in internal appearance, granular texture, or a brownish external appearance, which are all numerically higher than the other apples. This is the X-part of the loading plot, but we can also look at the Y-part of the loading space (consumer likings) (below).

(Y-part of the loading plot: The consumer likings).

First of all, we see that there is no subset of information among the consumers. Sometimes you can see islands of dense points grouped tightly together and then separated by lots of space. If that had been the case, we would have identified groups of consumers having similar likes and dislikes, and then it would have been different correlation structures among the sensory attributes and consumer likings depending on the clustering.

But in this case we have uniform distribution of the consumer likings going basically from left to right in the top part of the loading plot. There is one deviator at the bottom. One consumer (C20) has completely different opinions about the apples. But because of this distribution and because of where Egremont Russet is, the conclusion is that Egremont Russet is the least popular apple variety.

Comparing Two Orthogonal in Y Components at Once

We had two orthogonal in Y components expressing information in the consumer likings that were not contained in the sensory judges data set and here (below) we show you how you can look at the orthogonal in Y variations by plotting the scores and loadings of these two components.

Remember that they represent only 16% of the variance in the Y block, so we have to be a little bit careful with the interpretation. But we can see that these two components are defined and spanned by Cameo (top left) and Braeburn (bottom center). So there is some consumer liking information for these two varieties expressed by the consumers that is not expressed by the sensory judges. Apart from that there is no subset of information from the consumers, indicating that we have a very homogenous data set to work with.

Summarizing this second application, we can say:

  • O2PLS shows that both consumers and judges found the same main types of latent factors of the apples to be important.
  • O2PLS pointed out some systematic variation among the consumers not expressed by the judges.
  • However, because of the small size of data, it’s important to be careful when interpreting results


These examples show how uncovering patterns of similar preferences and noting differences in preference patterns among groups of consumers can be used to determine which elements of a product or which products are most appreciated by consumers. Preference mapping along with conjoint analysis and design of experiments can be an effective way to create product roadmaps that align with customer preferences and achieve market success.

SIMCA software contains a number of tools that can be combined in different ways for a variety of applications from preference mapping to MVDA. These include:

  • PCA for data overview
  • PLS for regression modeling
  • OPLS for enhanced interpretation
  • O2PLS for data comparison and data integration
  • HCA for bottom-up clustering
  • PLS-tree for top-down clustering
  • SIMCA for classifier training
  • OPLS-DA for discriminant analysis
  • Hierarchical modeling for summarizing multiple data blocks

See A Demo

Watch a recorded webinar showing how SIMCA software can be used to do preference mapping using the data sets illustrated above. This recording provides the original literature references to the cited example datasets. (Registration required).

Recorded Webinars