经典论文学习bag of feature(二)
2014-12-17 16:38
218 查看
Bag-of-word
Bag-of-words模型是信息检索领域常用的文档表示方法。在信息检索中,BOW模型假定对于一个文档,忽略它的单词顺序和语法、句法等要素,将其仅仅看作是若干个词汇的集合,文档中每个单词的出现都是独立的,不依赖于其它单词是否出现。例如有如下两个文档:
1:Bob likes to play basketball, Jim likes too. 2:Bob also likes to play football games.
基于这两个文本文档,提取单个单词,并构造一个词典:
Dictionary = {1:”Bob”, 2. “like”, 3. “to”, 4. “play”, 5. “basketball”, 6. “also”, 7. “football”, 8. “games”, 9. “Jim”, 10. “too”}。
这个词典一共包含10个不同的单词,根据词典,对上面两个文档中的单词出现次数进行统计,每个文档可表示为10维向量。如下:
1:[1, 2, 1, 1, 1, 0, 0, 0, 1, 1] 2:[1, 1, 1, 1 ,0, 1, 1, 1, 0, 0]
若每种类型的文档中单词的直方图统计呈现特定的规律,则可以利用这种规律进行海量文档归类。
Bag Of Feature
1.1 [CVPR06] Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
摘要:将BOW的思想引入到图像中来,word在图像中用一种特定的特征描述子来代替,但这样完全忽略了图像的空间布局关系,incapable of capturing shape or of segmenting an object from its background,因此结合空间金字塔匹配来实现。 Our method involves repeatedly subdividing the image and computing histograms of local features at increasingly fine resolutions.
比较:以下三方面和传统方法比较:
Pyramid Match Kernels:
xy表示两个矢量 PyramidMatch用来计算xy之间的appriosimate correspondence.通过placing a sequence of increasingly coarser grids over the feature space and taking a weighted sum of the number of matches that occur at each level of resolution.
Match means they fall into the same cell. Resolution counts from 0 to L.
At level l,image can be divide into 2exp(d*l) cells(这里的cell应该就是后面的聚类中心?);The number of matchs at level l is given by (1);
The weight number of level l is set to 1/(2exp(L-l)),Note lower(coarser) level include the num of finer level, so the num of level l is given by
.The Pyramid match kernel can be given by(2);
(1)
(2)
Spatial Matching Scheme
perform pyramid matching in the two-dimensional image space, and use traditional clustering techniques in feature space.(对于图像中feature空间,图像的坐标已经包含了几何空间信息,只需要按照坐标顺序排列vector即可)(特征空间用聚类将特征聚到M个类别channel,大概就是上面讲的fall into the same cell,H就是用直方图统计,I越小表示二者相关度越小)
(3)
Dimension is:
(上式中k(x,y) 中并不是相加 而是每level的I矢量连接成一个很长的矢量) ; M=400 L=3 d=34000 is long and sparse.
Normalize all histograms by the total weight of all features in the image.
(1)
(2)
histogram intersection function
用来对特征构成的直方图进行相似度匹配.计算公式(即式(1)):
上图(2)是对直方图交叉核函数的描述图:(a)里的y和z代表两种数据分布,三幅图代表三层金字塔,每一层里有间距相等的虚线,表示直方图宽度,金字塔L越大宽度越小,间隔越多。可以看到红点蓝点的位置是固定的,但是根据直方图宽度的不同可以划到不同的直方图里,如(b)所示。(c)图就是L的计算结果,是通过(b)里两种直方图取交集得来的,c图每个图的下方都给出了交集数目,比如x0=2,x1=4,x2=3(原图里是5,是不是错了?)
Q:
对照例子和代码:(1)在此处的作用应该是通过类似SPM的可核函数(2)/(3)计算得到H/H'(包含特征的直方图统计结果,但是其中的I含义和原文不一样),cell是聚类中心,match是指在同一个聚类中心,H是通过同cluster的统计直方图,然后用(1)计算I,即numofmatchs,通过直方图内核函数来计算(the histogram intersection function).
按照文章理解:应该是先计算1 然后 2/3来计算。进一步可以看:The Pyramid Match Kernel:Discriminative Classification with Sets of Image Features理解。自己理解是:(2)/(3)权重常数和channel可以乘进去或者一开始就考虑,所以例子和代码看起来就是先考虑2/3,然后1最后计算决定numofmatch的,即两个直方图的相似度(交叉核)。原文中有
的表示。
局部和全局特征表示:本文中说到SPM是一种approximate global geometric correspondence,又如何理解an alternative formulation of a locally orderless image,传统的局部和全局特征是怎样定义的,有哪些??
ps:
部分来源于:http://blog.csdn.net/v_JULY_v
Bag-of-words模型是信息检索领域常用的文档表示方法。在信息检索中,BOW模型假定对于一个文档,忽略它的单词顺序和语法、句法等要素,将其仅仅看作是若干个词汇的集合,文档中每个单词的出现都是独立的,不依赖于其它单词是否出现。例如有如下两个文档:
1:Bob likes to play basketball, Jim likes too. 2:Bob also likes to play football games.
基于这两个文本文档,提取单个单词,并构造一个词典:
Dictionary = {1:”Bob”, 2. “like”, 3. “to”, 4. “play”, 5. “basketball”, 6. “also”, 7. “football”, 8. “games”, 9. “Jim”, 10. “too”}。
这个词典一共包含10个不同的单词,根据词典,对上面两个文档中的单词出现次数进行统计,每个文档可表示为10维向量。如下:
1:[1, 2, 1, 1, 1, 0, 0, 0, 1, 1] 2:[1, 1, 1, 1 ,0, 1, 1, 1, 0, 0]
若每种类型的文档中单词的直方图统计呈现特定的规律,则可以利用这种规律进行海量文档归类。
Bag Of Feature
1.1 [CVPR06] Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
摘要:将BOW的思想引入到图像中来,word在图像中用一种特定的特征描述子来代替,但这样完全忽略了图像的空间布局关系,incapable of capturing shape or of segmenting an object from its background,因此结合空间金字塔匹配来实现。 Our method involves repeatedly subdividing the image and computing histograms of local features at increasingly fine resolutions.
比较:以下三方面和传统方法比较:
locally orderless images:SPM as an alternative formulation of a locally orderless image, instead of a Gaussian scale space of apertures,define a fixed hierarchy of rectangular windows. multiresolution histograms:fixing the resolution at which the features are computed, but varying the spatial resolution at which they are aggregated. subdivide and disorder:the best subdivision scheme may be achieved when multiple resolutions are combined in a principled way; the reason for the empirical success of “subdivide and disorder” techniques is the fact that they actually perform approximate geometric matching.
Pyramid Match Kernels:
xy表示两个矢量 PyramidMatch用来计算xy之间的appriosimate correspondence.通过placing a sequence of increasingly coarser grids over the feature space and taking a weighted sum of the number of matches that occur at each level of resolution.
Match means they fall into the same cell. Resolution counts from 0 to L.
At level l,image can be divide into 2exp(d*l) cells(这里的cell应该就是后面的聚类中心?);The number of matchs at level l is given by (1);
The weight number of level l is set to 1/(2exp(L-l)),Note lower(coarser) level include the num of finer level, so the num of level l is given by
.The Pyramid match kernel can be given by(2);
(1)
(2)
Spatial Matching Scheme
perform pyramid matching in the two-dimensional image space, and use traditional clustering techniques in feature space.(对于图像中feature空间,图像的坐标已经包含了几何空间信息,只需要按照坐标顺序排列vector即可)(特征空间用聚类将特征聚到M个类别channel,大概就是上面讲的fall into the same cell,H就是用直方图统计,I越小表示二者相关度越小)
(3)
Dimension is:
(上式中k(x,y) 中并不是相加 而是每level的I矢量连接成一个很长的矢量) ; M=400 L=3 d=34000 is long and sparse.
Normalize all histograms by the total weight of all features in the image.
(1)
(2)
histogram intersection function
用来对特征构成的直方图进行相似度匹配.计算公式(即式(1)):
上图(2)是对直方图交叉核函数的描述图:(a)里的y和z代表两种数据分布,三幅图代表三层金字塔,每一层里有间距相等的虚线,表示直方图宽度,金字塔L越大宽度越小,间隔越多。可以看到红点蓝点的位置是固定的,但是根据直方图宽度的不同可以划到不同的直方图里,如(b)所示。(c)图就是L的计算结果,是通过(b)里两种直方图取交集得来的,c图每个图的下方都给出了交集数目,比如x0=2,x1=4,x2=3(原图里是5,是不是错了?)
Q:
对照例子和代码:(1)在此处的作用应该是通过类似SPM的可核函数(2)/(3)计算得到H/H'(包含特征的直方图统计结果,但是其中的I含义和原文不一样),cell是聚类中心,match是指在同一个聚类中心,H是通过同cluster的统计直方图,然后用(1)计算I,即numofmatchs,通过直方图内核函数来计算(the histogram intersection function).
按照文章理解:应该是先计算1 然后 2/3来计算。进一步可以看:The Pyramid Match Kernel:Discriminative Classification with Sets of Image Features理解。自己理解是:(2)/(3)权重常数和channel可以乘进去或者一开始就考虑,所以例子和代码看起来就是先考虑2/3,然后1最后计算决定numofmatch的,即两个直方图的相似度(交叉核)。原文中有
的表示。
局部和全局特征表示:本文中说到SPM是一种approximate global geometric correspondence,又如何理解an alternative formulation of a locally orderless image,传统的局部和全局特征是怎样定义的,有哪些??
ps:
部分来源于:http://blog.csdn.net/v_JULY_v
相关文章推荐
- 深度学习论文理解2:on random weights and unsupervised feature learning
- 【深度学习经典论文翻译1】AlexNet-ImageNet Classification with Deep Convolutional Neural Networks全文翻译
- 科普丨机械学习与学习机器论述(经典论文)
- 机器学习经典论文/survey合集
- 经典文章系列:Feature Pyramid Networks for Object Detection(FPN)论文阅读
- 纯干货18 - 2016-2017深度学习-最新-必读-经典论文
- 【深度学习经典论文翻译2】GoogLeNet-Going Deeper with Convolutions全文翻译
- Fully Convolutional Networks for semantic Segmentation(深度学习经典论文翻译)
- [深度学习论文笔记][Object Detection] Rich feature hierarchies for accurate object detection and semantic seg
- 【深度学习经典论文阅读笔记】Going deeper with convolutions
- [深度学习论文笔记][总结]Invariant gait feature extraction based on image transformation
- 经典论文阅读笔记——Feature篇(下)
- 机器学习(三) 深度学习的经典论文、代码、博客文章
- [深度学习论文笔记][CVPR 16]Deep Metric Learning via Lifted Structured Feature Embedding
- HMM经典介绍论文【Rabiner 1989】翻译(八)——学习问题
- 论文实践学习 - Deep Metric Learning via Lifted Structured Feature Embedding
- 机器学习经典论文(转载)
- 深度学习的一些经典论文
- opencv学习经典论文
- 机器学习经典论文(转载)