## Documentation Center |

On this page… |
---|

Introduction to Multidimensional Scaling Classical Multidimensional Scaling |

One of the most important goals in visualizing data is to get a sense of how near or far points are from each other. Often, you can do this with a scatter plot. However, for some analyses, the data that you have might not be in the form of points at all, but rather in the form of pairwise similarities or dissimilarities between cases, observations, or subjects. There are no points to plot.

Even if your data are in the form of points rather than pairwise distances, a scatter plot of those data might not be useful. For some kinds of data, the relevant way to measure how near two points are might not be their Euclidean distance. While scatter plots of the raw data make it easy to compare Euclidean distances, they are not always useful when comparing other kinds of inter-point distances, city block distance for example, or even more general dissimilarities. Also, with a large number of variables, it is very difficult to visualize distances unless the data can be represented in a small number of dimensions. Some sort of dimension reduction is usually necessary.

Multidimensional scaling (MDS) is a set of methods that address all these problems. MDS allows you to visualize how near points are to each other for many kinds of distance or dissimilarity metrics and can produce a representation of your data in a small number of dimensions. MDS does not require raw data, but only a matrix of pairwise distances or dissimilarities.

This example shows how to use `cmdscale` to perform classical (metric) multidimensional scaling, also known as principal coordinates analysis.

`cmdscale` takes as an input a matrix of inter-point distances and creates a configuration of points. Ideally, those points are in two or three dimensions, and the Euclidean distances between them reproduce the original distance matrix. Thus, a scatter plot of the points created by `cmdscale` provides a visual representation of the original distances.

As a very simple example, you can reconstruct a set of points from only their inter-point distances. First, create some four dimensional points with a small component in their fourth coordinate, and reduce them to distances.

rng('default'); % For reproducibility X = [normrnd(0,1,10,3),normrnd(0,.1,10,1)]; D = pdist(X,'euclidean');

Next, use `cmdscale` to find a configuration with those inter-point distances. `cmdscale` accepts distances as either a square matrix, or, as in this example, in the vector upper-triangular form produced by `pdist`.

[Y,eigvals] = cmdscale(D);

`cmdscale` produces two outputs. The first output, `Y`, is a matrix containing the reconstructed points. The second output, `eigvals`, is a vector containing the sorted eigenvalues of what is often referred to as the "scalar product matrix," which, in the simplest case, is equal to `Y*Y'`. The relative magnitudes of those eigenvalues indicate the relative contribution of the corresponding columns of `Y` in reproducing the original distance matrix `D` with the reconstructed points.

format short g [eigvals eigvals/max(abs(eigvals))]

ans = 35.41 1 11.158 0.31511 1.6894 0.04771 0.1436 0.0040553 7.1529e-15 2.02e-16 4.8817e-15 1.3786e-16 2.4053e-15 6.7927e-17 -2.0017e-17 -5.6528e-19 -5.371e-16 -1.5168e-17 -4.0008e-15 -1.1298e-16

If `eigvals` contains only positive and zero (within round-off error) eigenvalues, the columns of `Y` corresponding to the positive eigenvalues provide an exact reconstruction of `D`, in the sense that their inter-point Euclidean distances, computed using `pdist`, for example, are identical (within round-off) to the values in `D`.

```
maxerr4 = max(abs(D - pdist(Y))) % Exact reconstruction
```

maxerr4 = 7.1054e-15

If two or three of the eigenvalues in `eigvals` are much larger than the rest, then the distance matrix based on the corresponding columns of `Y` nearly reproduces the original distance matrix `D`. In this sense, those columns form a lower-dimensional representation that adequately describes the data. However it is not always possible to find a good low-dimensional reconstruction.

maxerr3 = max(abs(D - pdist(Y(:,1:3)))) % Good reconstruction in 3D maxerr2 = max(abs(D - pdist(Y(:,1:2)))) % Poor reconstruction in 2D

maxerr3 = 0.043142 maxerr2 = 0.98315

The reconstruction in three dimensions reproduces `D` very well, but the reconstruction in two dimensions has errors that are of the same order of magnitude as the largest values in `D`.

max(max(D))

ans = 5.8974

Often, `eigvals` contains some negative eigenvalues, indicating that the distances in `D` cannot be reproduced exactly. That is, there might not be any configuration of points whose inter-point Euclidean distances are given by `D`. If the largest negative eigenvalue is small in magnitude relative to the largest positive eigenvalues, then the configuration returned by `cmdscale` might still reproduce `D` well.

This example shows how to construct a map of 10 US cities based on the distances between those cities, using `cmdscale`.

First, create the distance matrix and pass it to `cmdscale`. In this example, `D` is a full distance matrix: it is square and symmetric, has positive entries off the diagonal, and has zeros on the diagonal.

cities = ... {'Atl','Chi','Den','Hou','LA','Mia','NYC','SF','Sea','WDC'}; D = [ 0 587 1212 701 1936 604 748 2139 2182 543; 587 0 920 940 1745 1188 713 1858 1737 597; 1212 920 0 879 831 1726 1631 949 1021 1494; 701 940 879 0 1374 968 1420 1645 1891 1220; 1936 1745 831 1374 0 2339 2451 347 959 2300; 604 1188 1726 968 2339 0 1092 2594 2734 923; 748 713 1631 1420 2451 1092 0 2571 2408 205; 2139 1858 949 1645 347 2594 2571 0 678 2442; 2182 1737 1021 1891 959 2734 2408 678 0 2329; 543 597 1494 1220 2300 923 205 2442 2329 0]; [Y,eigvals] = cmdscale(D);

Next, look at the eigenvalues returned by `cmdscale`. Some of these are negative, indicating that the original distances are not Euclidean. This is because of the curvature of the earth.

format short g [eigvals eigvals/max(abs(eigvals))]

ans = 9.5821e+06 1 1.6868e+06 0.17604 8157.3 0.0008513 1432.9 0.00014954 508.67 5.3085e-05 25.143 2.624e-06 4.049e-10 4.2256e-17 -897.7 -9.3685e-05 -5467.6 -0.0005706 -35479 -0.0037026

However, in this case, the two largest positive eigenvalues are much larger in magnitude than the remaining eigenvalues. So, despite the negative eigenvalues, the first two coordinates of `Y` are sufficient for a reasonable reproduction of `D`.

Dtriu = D(find(tril(ones(10),-1)))'; maxrelerr = max(abs(Dtriu-pdist(Y(:,1:2))))./max(Dtriu)

maxrelerr = 0.0075371

Here is a plot of the reconstructed city locations as a map. The orientation of the reconstruction is arbitrary.

plot(Y(:,1),Y(:,2),'.') text(Y(:,1)+25,Y(:,2),cities) xlabel('Miles') ylabel('Miles')

The function `mdscale` performs
nonclassical multidimensional scaling. As with `cmdcale`,
you use `mdscale` either to visualize dissimilarity
data for which no "locations" exist, or to visualize
high-dimensional data by reducing its dimensionality. Both functions
take a matrix of dissimilarities as an input and produce a configuration
of points. However, `mdscale` offers a choice of
different criteria to construct the configuration, and allows missing
data and weights.

For example, the cereal data include measurements on 10 variables
describing breakfast cereals. You can use `mdscale` to
visualize these data in two dimensions. First, load the data. For
clarity, this example code selects a subset of 22 of the observations.

load cereal.mat X = [Calories Protein Fat Sodium Fiber ... Carbo Sugars Shelf Potass Vitamins]; % Take a subset from a single manufacturer mfg1 = strcmp('G',cellstr(Mfg)); X = X(mfg1,:); size(X) ans = 22 10

Then use `pdist` to transform
the 10-dimensional data into dissimilarities. The output from `pdist` is
a symmetric dissimilarity matrix, stored as a vector containing only
the (23*22/2) elements in its upper triangle.

dissimilarities = pdist(zscore(X),'cityblock'); size(dissimilarities) ans = 1 231

This example code first standardizes the cereal data, and then uses city block distance as a dissimilarity. The choice of transformation to dissimilarities is application-dependent, and the choice here is only for simplicity. In some applications, the original data are already in the form of dissimilarities.

Next, use `mdscale` to perform metric MDS.
Unlike `cmdscale`, you must specify the desired number
of dimensions, and the method to use to construct the output configuration.
For this example, use two dimensions. The metric STRESS criterion
is a common method for computing the output; for other choices, see
the `mdscale` reference page in
the online documentation. The second output from `mdscale` is
the value of that criterion evaluated for the output configuration.
It measures the how well the inter-point distances of the output configuration
approximate the original input dissimilarities:

[Y,stress] =... mdscale(dissimilarities,2,'criterion','metricstress'); stress stress = 0.1856

A scatterplot of the output from `mdscale` represents
the original 10-dimensional data in two dimensions, and you can use
the `gname` function to label selected
points:

plot(Y(:,1),Y(:,2),'o','LineWidth',2); gname(Name(mfg1))

Metric multidimensional scaling creates a configuration of points
whose inter-point distances approximate the given dissimilarities.
This is sometimes too strict a requirement, and non-metric scaling
is designed to relax it a bit. Instead of trying to approximate the
dissimilarities themselves, non-metric scaling approximates a nonlinear,
but monotonic, transformation of them. Because of the monotonicity,
larger or smaller distances on a plot of the output will correspond
to larger or smaller dissimilarities, respectively. However, the nonlinearity
implies that `mdscale` only attempts to preserve
the ordering of dissimilarities. Thus, there may be contractions or
expansions of distances at different scales.

You use `mdscale` to perform nonmetric MDS
in much the same way as for metric scaling. The nonmetric STRESS criterion
is a common method for computing the output; for more choices, see
the `mdscale` reference page in the online documentation.
As with metric scaling, the second output from `mdscale` is
the value of that criterion evaluated for the output configuration.
For nonmetric scaling, however, it measures the how well the inter-point
distances of the output configuration approximate the disparities.
The disparities are returned in the third output. They are the transformed
values of the original dissimilarities:

[Y,stress,disparities] = ... mdscale(dissimilarities,2,'criterion','stress'); stress stress = 0.1562

To check the fit of the output configuration to the dissimilarities, and to understand the disparities, it helps to make a Shepard plot:

distances = pdist(Y); [dum,ord] = sortrows([disparities(:) dissimilarities(:)]); plot(dissimilarities,distances,'bo', ... dissimilarities(ord),disparities(ord),'r.-', ... [0 25],[0 25],'k-') xlabel('Dissimilarities') ylabel('Distances/Disparities') legend({'Distances' 'Disparities' '1:1 Line'},... 'Location','NorthWest');

This plot shows that `mdscale` has found a
configuration of points in two dimensions whose inter-point distances
approximates the disparities, which in turn are a nonlinear transformation
of the original dissimilarities. The concave shape of the disparities
as a function of the dissimilarities indicates that fit tends to contract
small distances relative to the corresponding dissimilarities. This
may be perfectly acceptable in practice.

`mdscale` uses an iterative algorithm to find
the output configuration, and the results can often depend on the
starting point. By default, `mdscale` uses `cmdscale` to
construct an initial configuration, and this choice often leads to
a globally best solution. However, it is possible for `mdscale` to
stop at a configuration that is a local minimum of the criterion.
Such cases can be diagnosed and often overcome by running `mdscale` multiple
times with different starting points. You can do this using the `'start'` and `'replicates'` parameters.
The following code runs five replicates of MDS, each starting at a
different randomly-chosen initial configuration. The criterion value
is printed out for each replication; `mdscale` returns
the configuration with the best fit.

opts = statset('Display','final'); [Y,stress] =... mdscale(dissimilarities,2,'criterion','stress',... 'start','random','replicates',5,'Options',opts); 35 iterations, Final stress criterion = 0.156209 31 iterations, Final stress criterion = 0.156209 48 iterations, Final stress criterion = 0.171209 33 iterations, Final stress criterion = 0.175341 32 iterations, Final stress criterion = 0.185881

Notice that `mdscale` finds several different
local solutions, some of which do not have as low a stress value as
the solution found with the `cmdscale` starting point.

Was this topic helpful?