Distance in Statistics
2017-03-28 18:02
141 查看
Minkowski distance
The Minkowski distance is a metric in a normed vector space which can be considered as a generalization of both the Euclidean distance and the Manhanttan distance.Definition
The Minkowski distance of order p between two pointsX=(x1,x2,...,xn)andY=(y1,y2,...,y3)∈Rn
is defined as
(∑i=1n|xi−yi|p)1/p
For p⩾1, the Minkowski distance is metric as a result of the Minkowski inequality. When p<1, the distance between (0,0) and (1,1) is 21/p>2, but the point (0,1) is at a distance 1 from both of these points. Since this violates the triangle inequality, for p<1 it is not a metri.
Minkowski distance is typically used with p being 1 or 2. the latter is the Euclidean distance, while the former is somethimes known as the Manhattan distance. In the limmitting case of p reaching infinity, we obtain the Chebyshev distance:
limp→∞(∑i=1n|xi−yi|p)1/p=maxi=1n|xi−yi|
Similarly, for p reaching negative infinity, we have:
limp→−∞(∑i=1n|xi−yi|p)1/p=mini=1n|xi−yi|
The Minkowski distance can also be viewd as multiple of the power mean of the componet-wise differences between P and Q.
Mahalanobis distance
Unlike Euclidean distance, it takes into account the relations between each two dimensions.Definition
For random vector X∈Rm and Y∈Rn, the m×n cross covariance matrix is equal toCov(X,Y)==E[(X−E[X])(Y−E[Y])T]E[XYT]−E[X]E[Y]T
Similarly, the covariance matrix Σ of a distribution {X∈Rn} is
Σ(X)=Cov(X,X)
The Mahalanobis distance between two points X and Y is defined as
DM(X,Y)=(X−Y)TΣ−1(X−Y)−−−−−−−−−−−−−−−−−√
If the covariance matrix is the identity matrix, the Mahalanobis distance reduces to the Euclidean distance. If the covariance matrix is diagonal, then the resulting distance is called a normalized Euclidean distance:
d(X,Y)=∑i=1n(xi−yi)2σ2i−−−−−−−−−−−⎷
Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors the could have transformed one string into the other.Examples
The Hamming distance between:“karolin” and “kathrin” is 3.
“karolin” and “kerstin” is 3.
1011101 and 1001001 is 2.
2173896 and 2233796 is 3.
Jaccard similarity coefficient
The Jaccard similarity coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:J(A,B)=|A∩B||A∪B|
So, the Jaccard distance is defined as
Jδ(A,B)=1−J(A,B)
Pearson Correlation coefficient
Pearson Correlation coefficient is the covariance of two variables divided by the product of their standard deviations:ρX,Y=Cov(X,Y)σXσY
Then, the Correlation distance is defined as DXY=1−ρXY
Information entropy
* Information entropy* is a measure of dispersion of a distributionEntropy(X)=∑i=1n−pilog2pi
where,
n: the number of classification types for a sample sets X
pi: the probability of the i-th type a sample belongs to
相关文章推荐
- 计蒜客 Minimum Distance in a Star Graph 思维水题
- CodeForces 161D [Distance in Tree] 点分治
- 2017 ACM-ICPC 亚洲区(南宁赛区)网络赛 J. Minimum Distance in a Star Graph(bfs+状态保存)
- Maximum Distance in Arrays (第十七周 数组)
- 阅读列表:On the Surprising Behavior of Distance Metrics in High Dimensional Space
- bayes classifier in Statistics Toolbox of matlab
- Statistics doesn’t have to be so hard! Resampling in R and SAS
- Acceptance rate statistics for publications in graphics/interaction/vision
- 2017 ACM-ICPC 亚洲区(南宁赛区)网络赛- J. Minimum Distance in a Star Graph
- LeetCode Maximum Distance in Arrays
- Statistics in Hive的mysql配置
- 2017 ACM-ICPC南宁网络赛: J. Minimum Distance in a Star Graph(BFS)
- 2017 ACM-ICPC 亚洲区(南宁赛区)网络赛(J.Minimum Distance in a Star Graph)
- 2017 ACM-ICPC 亚洲区(南宁赛区)网络赛 J.Minimum Distance in a Star Graph
- the furthest distance in the world
- Forward:Stale statistics on a newly created temporary table in a stored procedure can lead to poor performance
- 613. Shortest Distance in a Line--ABS() and MIN()
- Getting a statistics education: Review of the MSc in Statistics (Sheffield)
- Recover Binary Search Tree & Edit Distance & Reverse Nodes in k-Group