data mining notes
2015-02-05 17:12
127 查看
两个对象i和j之间的相异性可以根据不匹配率来计算:
d(i,j) = (p-m)/p;
其中,m是匹配的数目(即i和j取值相同状态的属性数), 而p是刻画对象的属性总数。
相似性
d(i,j)=1-d(i,j);
对于对称的二元属性,每个状态都同样重要。基于对称二元属性的相异性称做对称的二元相异性。
d(i,j)=(r+s)/(q+r+s+t);
非对称的二元属性,两个状态不是同等重要的,非对称的二元相异性,负匹配数t被认为是不重要的,
d(i,j)=(r+s)/(q+r+s);
数值属性的相异性:euclidean distance, manhattan distance,minkoski distance;
euclidean distance :d(i,j)=sqrt(power((x1-y1),2) + power((x2-y2),2)+power((xn-yn),2));
manhattan distance:d(i,j)=abs(x1-y1)+abs(x2-y2)+abs(xn-yn);
upper distance :produce the max minus value between each dimension of the object
-------------------------------------------------------
weighted euclidean distance
that's d(i,j)=sqrt(power((x1-y1),2)*weight+power((x2-y2),2)*weight+power((xn-yn),2)*weight)
--------------------------------------------------------
So, how can we calculate the dissimilarity of the objects which had mixed attributes .
one method is to group according to the each type of the attribute,then we can proceed
data mining based on the each attribute.however,in real application,each attribute type
which is anabyzed individually can't produce the compatible result
One better way is process all attributes at one time,and only do one analysis.one technology can assemble the different attribute combination in one dissimilarity maxtrix.
and transfer all meaningful attributes to common interval [0.0,1.0]
Assume that the dataset include mixed type attribute amount to p,the dissimilarity between
object i and j will be defined
-------------------------------------------------
the cosine similarity:
s(i,j)=(i*j)/(|i|*|j|)=((x1*y1)+(x2*y2)+(x3*y3)+(xn*yn))/(sqrt(power(x1,2)+power(x2,2)+power(xn,2))*sqrt(power(y1,2)+power(y2,2)+power(yn,2))
---------------------------------------------------
本文出自 “welcome” 博客,请务必保留此出处http://friendsforever.blog.51cto.com/3916357/1612048
d(i,j) = (p-m)/p;
其中,m是匹配的数目(即i和j取值相同状态的属性数), 而p是刻画对象的属性总数。
相似性
d(i,j)=1-d(i,j);
对于对称的二元属性,每个状态都同样重要。基于对称二元属性的相异性称做对称的二元相异性。
d(i,j)=(r+s)/(q+r+s+t);
非对称的二元属性,两个状态不是同等重要的,非对称的二元相异性,负匹配数t被认为是不重要的,
d(i,j)=(r+s)/(q+r+s);
数值属性的相异性:euclidean distance, manhattan distance,minkoski distance;
euclidean distance :d(i,j)=sqrt(power((x1-y1),2) + power((x2-y2),2)+power((xn-yn),2));
manhattan distance:d(i,j)=abs(x1-y1)+abs(x2-y2)+abs(xn-yn);
upper distance :produce the max minus value between each dimension of the object
-------------------------------------------------------
weighted euclidean distance
that's d(i,j)=sqrt(power((x1-y1),2)*weight+power((x2-y2),2)*weight+power((xn-yn),2)*weight)
--------------------------------------------------------
So, how can we calculate the dissimilarity of the objects which had mixed attributes .
one method is to group according to the each type of the attribute,then we can proceed
data mining based on the each attribute.however,in real application,each attribute type
which is anabyzed individually can't produce the compatible result
One better way is process all attributes at one time,and only do one analysis.one technology can assemble the different attribute combination in one dissimilarity maxtrix.
and transfer all meaningful attributes to common interval [0.0,1.0]
Assume that the dataset include mixed type attribute amount to p,the dissimilarity between
object i and j will be defined
-------------------------------------------------
the cosine similarity:
s(i,j)=(i*j)/(|i|*|j|)=((x1*y1)+(x2*y2)+(x3*y3)+(xn*yn))/(sqrt(power(x1,2)+power(x2,2)+power(xn,2))*sqrt(power(y1,2)+power(y2,2)+power(yn,2))
---------------------------------------------------
本文出自 “welcome” 博客,请务必保留此出处http://friendsforever.blog.51cto.com/3916357/1612048
相关文章推荐
- Introduction - Notes of Data Mining
- Data Mining with R (code + notes) Chapter 1 --- R中关于DM的数据结构,以及一些简单的命令
- Notes on Chinese Web Data Extraction in Java(part 3)
- 看Cognos+SPSS如何无缝对接BI+Data Mining
- Data Mining with Big Data--阅读笔记
- Information Theory in Data Mining & Decision Trees learning
- Learning Data Mining with Python-第一章-affinity analysis
- What is Data Mining
- Weka 3: Data Mining Software in Java
- Unusual data type -- Code complete reading notes(13)
- Data Mining Winter 2010 Resources (from last year's course website):
- Introduction to Data Mining
- Data analysis and Data mining
- Mining Twitter Data with Python
- Applying data mining for ontology building
- Exploring the Power of Links in Data Mining-韩家炜演讲摘录
- The Data Mining of Lanzhou University of Finance and Economics
- 数据仓库与数据挖掘 DATA WAREHOUSING AND DATA MINING
- DataMiningHtmlViewers对数据挖掘的展现
- Data Mining for Web Intelligence