In statistics, **Mahalanobis distance** is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analysed. It is a useful way of determining *similarity* of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set.

Formally, the Mahalanobis distance from a group of values with mean $ \mu = ( \mu_1, \mu_2, \mu_3, \dots , \mu_p ) $

and covariance matrix $ \Sigma $

for a multivariate vector $ x = ( x_1, x_2, x_3, \dots, x_p ) $

is defined as:

- $ D_M(x) = \sqrt{(x - \mu)^T \Sigma^{-1} (x-\mu)}.\, $

Mahalanobis distance can also be defined as dissimilarity measure between two random vectors $ \vec{x} $

and $ \vec{y} $ of the same distribution with the covariance matrix

$ \Sigma $

:

- $ d(\vec{x},\vec{y})=\sqrt{(\vec{x}-\vec{y})^T\Sigma^{-1} (\vec{x}-\vec{y})}.\, $

If the covariance matrix is the identity matrix then it is the same as
Euclidean distance. If covariance matrix is diagonal, then it is called *normalized Euclidean distance*:

- $ d(\vec{x},\vec{y})= \sqrt{\sum_{i=1}^p {(x_i - y_i)^2 \over \sigma_i^2}}, $

where $ \sigma_i $

is the standard deviation of the $ x_i $ over the sample set.