1
2016-05-25 11:24
162 查看
FREAK: Fast Retina Keypoint
Alexandre Alahi, Raphael Ortiz, Pierre Vandergheynst
Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Switzerland
Abstract
A large number of vision applications rely on matching
keypoints across images. The last decade featured
an arms-race towards faster and more robust keypoints
and association algorithms: Scale Invariant Feature Transform
(SIFT)[17], Speed-up Robust Feature (SURF)[4], and
more recently Binary Robust Invariant Scalable Keypoints
(BRISK)[16] to name a few. These days, the deployment
of vision algorithms on smart phones and embedded devices
with low memory and computation complexity has
even upped the ante: the goal is to make descriptors faster
to compute, more compact while remaining robust to scale,
rotation and noise.
To best address the current requirements, we propose a
novel keypoint descriptor inspired by the human visual system
and more precisely the retina, coined Fast Retina Keypoint
(FREAK). A cascade of binary strings is computed by
efficiently comparing image intensities over a retinal sampling
pattern. Our experiments show that FREAKs are in
general faster to compute with lower memory load and also
more robust than SIFT, SURF or BRISK. They are thus competitive
alternatives to existing keypoints in particular for
embedded applications.
1. Introduction
Visual correspondence, object matching, and many other
vision applications rely on representing images with sparse
number of keypoints. A real challenge is to efficiently describe
keypoints, i.e. image patches, with stable, compact
and robust representations invariant to scale, rotation, affine
transformation, and noise. The past decades witnessed key
players to efficiently describe keypoints and match them.
The most popular descriptor is the histogram of oriented
gradient proposed by Lowe [17] to describe the Scale Invariant
Feature Transform (SIFT) keypoints. Most of the
efforts in the last years was to perform as good as SIFT [14]
with lower computational complexity. The Speeded up Robust
Feature (SURF) by Bay et al. [4] is a good example.
It has similar matching rates with much faster performance
+
-
-
- +
+
- -
+
10110
Figure 1: llustration of our FREAK descriptor. A series of Difference of
Gaussians (DoG) over a retinal pattern are 1 bit quantized.
by describing keypoints with the responses of few Haar-like
filters. In general, Alahi et al. show in [2] that a grid of descriptors,
similar to SIFT and SURF, is better than a single
one to match an image region. Typically, a grid of covariance
matrices [30] attains high detection rate but remains
computationally too expensive for real-time applications.
The deployment of cameras on every phone coupled with
the growing computing power of mobile devices has enabled
a new trend: vision algorithms need to run on mobile
devices with low computing power and memory capacity.
Images obtained by smart phones can be used to
perform structure from motion [27], image retrieval [22],
or object recognition [15]. As a result, new algorithms
are needed where fixed-point operations and low memory
load are preferred. The Binary Robust Independent Elementary
Feature (BRIEF) [5], the Oriented Fast and Rotated
BRIEF (ORB)[26], and the Binary Robust Invariant
Scalable Keypoints[16] (BRISK) are good examples. In the
next section, we will briefly present these descriptors. Their
stimulating contribution is that a binary string obtained by
simply comparing pairs of image intensities can efficiently
describe a keypoint, i.e. an image patch. However, several
problems remain: how to efficiently select the ideal pairs
within an image patch? How to match them? Interestingly,
such trend is inline with the models of the nature to describe
complex observations with simple rules. We propose to address
such unknowns by designing a descriptor inspired by
the Human Visual System, and more precisely the retina.
We propose the Fast Retina Keypoint (FREAK) as a fast,
1
compact and robust keypoint descriptor. A cascade of binary
strings is computed by efficiently comparing pairs of
image intensities over a retinal sampling pattern. Interestingly,
selecting pairs to reduce the dimensionality of the descriptor
yields a highly structured pattern that mimics the
saccadic search of the human eyes.
2. Related work
Keypoint descriptors are often coupled with their detection.
Tuytelaar et al. in [29] and Gauglitz et al. in [11] presented
a detailed survey. We briefly present state-of-the-art
detectors and mainly focus on descriptors.
2.1. Keypoint detectors
A first solution is to consider corners as keypoints. Harris
and Stephen in [12] proposed the Harris corner detector.
Mikolajczyk and Schmid made it scale invariant in [20].
Another solution is to use local extrema of the responses
of certain filters as potential keypoints. Lowe in [17] filtered
the image with differences of Gaussians. Bay et al.
in [4] used a Fast Hessian detector. Agrawal et al. in [1]
proposed simplified center-surround filters to approximate
the Laplacian.. Ebrahimi and Mayol-Cuevas in [7] accelerated
the process by skipping the computation of the filter
response if the response for the previous pixel is very low.
Rostenand and Drummond proposed in [25] the FAST criterion
for corner detection, improved by Mair et al. in [18]
with their AGAST detector. The latter is a fast algorithm to
locate keypoints. The detector used in BRISK by Leutenegger
et al. in [16] is a multi-scale AGAST. They search for
maxima in scale-space using the FAST score as a measure
of saliency. We use the same detector for our evaluation of
FREAK.
2.2. SIFTlike
descriptors
Once keypoints are located, we are interested in describing
the image patch with a robust feature vector. The most
well-known descriptor is SIFT [17]. A 128-dimensional
vector is obtained from a grid of histograms of oriented gradient.
Its high descriptive power and robustness to illumination
change have ranked it as the reference keypoint descriptor
for the past decade. A family of SIFT-like descriptor has
emerged in the past years. The PCA-SIFT [14] reduces the
description vector from 128 to 36 dimension using principal
component analysis. The matching time is reduced, but the
time to build the descriptor is increased leading to a small
gain in speed and a loss of distinctiveness. The GLOH descriptor
[21] is an extension of the SIFT descriptor that is
more distinctive, but also more expensive to compute. The
robustness to change of viewpoint is improved in [31] by
simulating multiple deformations to the descriptive patch.
Good compromises between performances and the number
of simulated patches lead to an algorithm twice slower than
SIFT. Ambai and Yoshida proposed a Compact And Realtime
Descriptors (CARD) in [3] to extract the histogram of
oriented gradient from the grid binning of SIFT or the logpolar
binning of GLOH. The computation of the histograms
is simplified by using lookup tables.
One of the widely used keypoints at the moment is
clearly SURF [4]. It has similar matching performances as
SIFT but is much faster. It also relies on local gradient histograms.
The Haar-wavelet responses are efficiently computed
with integral images leading to 64 or 128-dimensional
vectors. However, the dimensionality of the feature vector
is still too high for large-scale applications such as image
retrieval or 3D reconstruction. Often, Principal Component
Analysis (PCA), or hashing functions are used to reduce the
dimensionality of the descriptors [24]. Such steps involve
time-consuming computation and hence affect the real-time
performance.
2.3. Binary descriptors
Calonder et al. in [5] showed that it is possible to shortcut
the dimensionality reduction step by directly building
a short binary descriptor in which each bits are independent,
called BRIEF. A clear advantage of binary descriptors
is that the Hamming distance (bitwise XOR followed
by a bit count) can replace the usual Euclidean distance.
The descriptor vector is obtained by comparing the intensity
of 512 pairs of pixels after applying a Gaussian smoothing
to reduce the noise sensitivity. The positions of the pixels
are pre-selected randomly according to a Gaussian distribution
around the patch center. The obtained descriptor
is not invariant to scale and rotation changes unless coupled
with detector providing it. Calonder et al. also highlighted
in their work that usually orientation detection reduces
the recognition rate and should therefore be avoided
when it is not required by the target application. Rublee et
al. in [26] proposed the Oriented Fast and Rotated BRIEF
(ORB) descriptor. Their binary descriptor is invariant to
rotation and robust to noise. Similarly, Leutenegger et al.
in [16] proposed a binary descriptor invariant to scale and
rotation called BRISK. To build the descriptor bit-stream,
a limited number of points in a specific sampling pattern
is used. Each point contributes to many pairs. The pairs
are divided in short-distance and long-distance subsets. The
long-distance subset is used to estimate the direction of the
keypoint while the short-distance subset is used to build binary
descriptor after rotating the sampling pattern.
In Section 5, we compare our proposed FREAK descriptor
with the above presented descriptors. But first, we
present a possible intuition on why these trendy binary descriptors
can work based on the study of the human retina.
! " # $""
! " # %""
! " # &""
! " # '""
(" )"
! " # *""
! " # +""
! " # ,""
(" )"
- $.$$..$"
/0121345462137"
/894:7"
;<=>:81="?4::7"
@5A1="6124=A<:7"
B8=<3C"7238=>"
?4::7"
D8=4<3" E1=-:8=4<3"
FGH<=""
34A=<"
?1H6G243"
I8781="
Figure 2: From human retina to computer vision: the biological pathways
leading to action potentials is emulated by simple binary tests over pixel
regions. [Upper part of the image is a courtesy of the book Avian Visual
Cognition by R: Cook].
3. Human retina
3.1. Motivations
In the presented literature, we have seen that recent
progress in image representation has shown that simple intensity
comparison of several pairs of pixels can be good
enough to describe and match image patches [5, 26, 16].
However, there exist some open interrogations on the ideal
selection of pairs. How should we sample them and compare
them? How to be robust to noise? Should we smooth
with a single Gaussian kernel? In this work, we show how
to gain performance by selecting a solution inspired by the
human retina, while enforcing low computational complexity.
Neuroscience has made lots of progress in understanding
the visual system and how images are transmitted to the
brain [8]. It is believed that the human retina extracts details
from images using Difference of Gaussians (DoG) of
various sizes and encodes such differences with action potentials.
The topology of the retina plays an important role.
We propose to mimic the same strategy to design our image
descriptor.
3.2. Analogy: from retinal photoreceptors to pixels
The topology and spatial encoding of the retina is quite
fascinating. First, several photoreceptors influence a ganglion
cell. The region where light influences the response
of a ganglion cell is the receptive field. Its size and dendritic
field increases with radial distance from the foveola
(Figure 3). The spatial distribution of ganglion cells reduces
exponentially with the distance to the foveal. They are segmented
into four areas: foveal, fovea, parafoveal, and perifoveal.
Each area plays an interesting role in the process of
detecting and recognizing objects since higher resolution is
(a) Density of ganglion cells over
the retina [10].
(b) Retina areas [13]
Figure 3: Illustration of the distribution of ganglion cells over the retina.
The density is clustered into four areas: (a) the foveola, (b) fovea, (c)
parafoveal, and (d) perifoveal.
captured in the fovea whereas a low acuity image is formed
in the perifoveal. One can interpret the decrease of resolution
as a body resource optimization. Let us now turn these
insights into an actual keypoint descriptor. Figure 2 presents
the proposed analogy.
4. FREAK
4.1. Retinal sampling pattern
Many sampling grids are possible to compare pairs of
pixel intensities. BRIEF and ORB use random pairs.
BRISK uses a circular pattern where points are equally
spaced on circles concentric, similar to DAISY [28]. We
propose to use the retinal sampling grid which is also circular
with the difference of having higher density of points
near the center. The density of points drops exponentially
as can be seen in Figure 3. .
Each sample point needs to be smoothed to be less sensitive
to noise. BRIEF and ORB use the same kernel for
all points in the patch. To match the retina model, we
use different kernels size for every sample points similar
to BRISK. The difference with BRISK is the exponential
change in size and the overlapping receptive fields. Figure
4 illustrates the topology of the receptive fields. Each circle
represents the standard deviations of the Gaussian kernels
applied to the corresponding sampling points.
We have experimentally observed that changing the size
of the Gaussian kernels with respect to the log-polar retinal
pattern leads to better performance. In addition, overlapping
the receptive fields also increases the performance. A
possible reason is that with the presented overlap in Figure
4, more information is captured. We add redundancy that
brings more discriminative power. Let’s consider the intensities
Ii measured at the receptive fields A;B, and C where:
IA > IB, IB > IC, and IA > IC: (1)
If the fields do not have overlap, then the last test IA >
IC is not adding any discriminant information. However,
Figure 4: Illustration of the FREAK sampling pattern similar to the retinal
ganglion cells distribution with their corresponding receptive fields. Each
circle represents a receptive field where the image is smoothed with its
corresponding Gaussian kernel.
if the fields overlap, partially new information can be encoded.
In general, adding redundancy allow us to use less
receptive fields which is a known strategy employed in compressed
sensing or dictionary learning [6]. According to Olshausen
and Field in [23], such redundancy also exists in the
receptive fields of the retina.
4.2. Coarsetofine
descriptor
We construct our binary descriptor F by thresholding the
difference between pairs of receptive fields with their corresponding
Gaussian kernel. In other words, F is a binary
string formed by a sequence of one-bit Difference of Gaussians
(DoG):
F =
X
0a<N
2aT(Pa); (2)
where Pa is a pair of receptive fields, N is the desired size
of the descriptor, and
T(Pa) =
1 if (I(Pr1
a )
Alexandre Alahi, Raphael Ortiz, Pierre Vandergheynst
Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Switzerland
Abstract
A large number of vision applications rely on matching
keypoints across images. The last decade featured
an arms-race towards faster and more robust keypoints
and association algorithms: Scale Invariant Feature Transform
(SIFT)[17], Speed-up Robust Feature (SURF)[4], and
more recently Binary Robust Invariant Scalable Keypoints
(BRISK)[16] to name a few. These days, the deployment
of vision algorithms on smart phones and embedded devices
with low memory and computation complexity has
even upped the ante: the goal is to make descriptors faster
to compute, more compact while remaining robust to scale,
rotation and noise.
To best address the current requirements, we propose a
novel keypoint descriptor inspired by the human visual system
and more precisely the retina, coined Fast Retina Keypoint
(FREAK). A cascade of binary strings is computed by
efficiently comparing image intensities over a retinal sampling
pattern. Our experiments show that FREAKs are in
general faster to compute with lower memory load and also
more robust than SIFT, SURF or BRISK. They are thus competitive
alternatives to existing keypoints in particular for
embedded applications.
1. Introduction
Visual correspondence, object matching, and many other
vision applications rely on representing images with sparse
number of keypoints. A real challenge is to efficiently describe
keypoints, i.e. image patches, with stable, compact
and robust representations invariant to scale, rotation, affine
transformation, and noise. The past decades witnessed key
players to efficiently describe keypoints and match them.
The most popular descriptor is the histogram of oriented
gradient proposed by Lowe [17] to describe the Scale Invariant
Feature Transform (SIFT) keypoints. Most of the
efforts in the last years was to perform as good as SIFT [14]
with lower computational complexity. The Speeded up Robust
Feature (SURF) by Bay et al. [4] is a good example.
It has similar matching rates with much faster performance
+
-
-
- +
+
- -
+
10110
Figure 1: llustration of our FREAK descriptor. A series of Difference of
Gaussians (DoG) over a retinal pattern are 1 bit quantized.
by describing keypoints with the responses of few Haar-like
filters. In general, Alahi et al. show in [2] that a grid of descriptors,
similar to SIFT and SURF, is better than a single
one to match an image region. Typically, a grid of covariance
matrices [30] attains high detection rate but remains
computationally too expensive for real-time applications.
The deployment of cameras on every phone coupled with
the growing computing power of mobile devices has enabled
a new trend: vision algorithms need to run on mobile
devices with low computing power and memory capacity.
Images obtained by smart phones can be used to
perform structure from motion [27], image retrieval [22],
or object recognition [15]. As a result, new algorithms
are needed where fixed-point operations and low memory
load are preferred. The Binary Robust Independent Elementary
Feature (BRIEF) [5], the Oriented Fast and Rotated
BRIEF (ORB)[26], and the Binary Robust Invariant
Scalable Keypoints[16] (BRISK) are good examples. In the
next section, we will briefly present these descriptors. Their
stimulating contribution is that a binary string obtained by
simply comparing pairs of image intensities can efficiently
describe a keypoint, i.e. an image patch. However, several
problems remain: how to efficiently select the ideal pairs
within an image patch? How to match them? Interestingly,
such trend is inline with the models of the nature to describe
complex observations with simple rules. We propose to address
such unknowns by designing a descriptor inspired by
the Human Visual System, and more precisely the retina.
We propose the Fast Retina Keypoint (FREAK) as a fast,
1
compact and robust keypoint descriptor. A cascade of binary
strings is computed by efficiently comparing pairs of
image intensities over a retinal sampling pattern. Interestingly,
selecting pairs to reduce the dimensionality of the descriptor
yields a highly structured pattern that mimics the
saccadic search of the human eyes.
2. Related work
Keypoint descriptors are often coupled with their detection.
Tuytelaar et al. in [29] and Gauglitz et al. in [11] presented
a detailed survey. We briefly present state-of-the-art
detectors and mainly focus on descriptors.
2.1. Keypoint detectors
A first solution is to consider corners as keypoints. Harris
and Stephen in [12] proposed the Harris corner detector.
Mikolajczyk and Schmid made it scale invariant in [20].
Another solution is to use local extrema of the responses
of certain filters as potential keypoints. Lowe in [17] filtered
the image with differences of Gaussians. Bay et al.
in [4] used a Fast Hessian detector. Agrawal et al. in [1]
proposed simplified center-surround filters to approximate
the Laplacian.. Ebrahimi and Mayol-Cuevas in [7] accelerated
the process by skipping the computation of the filter
response if the response for the previous pixel is very low.
Rostenand and Drummond proposed in [25] the FAST criterion
for corner detection, improved by Mair et al. in [18]
with their AGAST detector. The latter is a fast algorithm to
locate keypoints. The detector used in BRISK by Leutenegger
et al. in [16] is a multi-scale AGAST. They search for
maxima in scale-space using the FAST score as a measure
of saliency. We use the same detector for our evaluation of
FREAK.
2.2. SIFTlike
descriptors
Once keypoints are located, we are interested in describing
the image patch with a robust feature vector. The most
well-known descriptor is SIFT [17]. A 128-dimensional
vector is obtained from a grid of histograms of oriented gradient.
Its high descriptive power and robustness to illumination
change have ranked it as the reference keypoint descriptor
for the past decade. A family of SIFT-like descriptor has
emerged in the past years. The PCA-SIFT [14] reduces the
description vector from 128 to 36 dimension using principal
component analysis. The matching time is reduced, but the
time to build the descriptor is increased leading to a small
gain in speed and a loss of distinctiveness. The GLOH descriptor
[21] is an extension of the SIFT descriptor that is
more distinctive, but also more expensive to compute. The
robustness to change of viewpoint is improved in [31] by
simulating multiple deformations to the descriptive patch.
Good compromises between performances and the number
of simulated patches lead to an algorithm twice slower than
SIFT. Ambai and Yoshida proposed a Compact And Realtime
Descriptors (CARD) in [3] to extract the histogram of
oriented gradient from the grid binning of SIFT or the logpolar
binning of GLOH. The computation of the histograms
is simplified by using lookup tables.
One of the widely used keypoints at the moment is
clearly SURF [4]. It has similar matching performances as
SIFT but is much faster. It also relies on local gradient histograms.
The Haar-wavelet responses are efficiently computed
with integral images leading to 64 or 128-dimensional
vectors. However, the dimensionality of the feature vector
is still too high for large-scale applications such as image
retrieval or 3D reconstruction. Often, Principal Component
Analysis (PCA), or hashing functions are used to reduce the
dimensionality of the descriptors [24]. Such steps involve
time-consuming computation and hence affect the real-time
performance.
2.3. Binary descriptors
Calonder et al. in [5] showed that it is possible to shortcut
the dimensionality reduction step by directly building
a short binary descriptor in which each bits are independent,
called BRIEF. A clear advantage of binary descriptors
is that the Hamming distance (bitwise XOR followed
by a bit count) can replace the usual Euclidean distance.
The descriptor vector is obtained by comparing the intensity
of 512 pairs of pixels after applying a Gaussian smoothing
to reduce the noise sensitivity. The positions of the pixels
are pre-selected randomly according to a Gaussian distribution
around the patch center. The obtained descriptor
is not invariant to scale and rotation changes unless coupled
with detector providing it. Calonder et al. also highlighted
in their work that usually orientation detection reduces
the recognition rate and should therefore be avoided
when it is not required by the target application. Rublee et
al. in [26] proposed the Oriented Fast and Rotated BRIEF
(ORB) descriptor. Their binary descriptor is invariant to
rotation and robust to noise. Similarly, Leutenegger et al.
in [16] proposed a binary descriptor invariant to scale and
rotation called BRISK. To build the descriptor bit-stream,
a limited number of points in a specific sampling pattern
is used. Each point contributes to many pairs. The pairs
are divided in short-distance and long-distance subsets. The
long-distance subset is used to estimate the direction of the
keypoint while the short-distance subset is used to build binary
descriptor after rotating the sampling pattern.
In Section 5, we compare our proposed FREAK descriptor
with the above presented descriptors. But first, we
present a possible intuition on why these trendy binary descriptors
can work based on the study of the human retina.
! " # $""
! " # %""
! " # &""
! " # '""
(" )"
! " # *""
! " # +""
! " # ,""
(" )"
- $.$$..$"
/0121345462137"
/894:7"
;<=>:81="?4::7"
@5A1="6124=A<:7"
B8=<3C"7238=>"
?4::7"
D8=4<3" E1=-:8=4<3"
FGH<=""
34A=<"
?1H6G243"
I8781="
Figure 2: From human retina to computer vision: the biological pathways
leading to action potentials is emulated by simple binary tests over pixel
regions. [Upper part of the image is a courtesy of the book Avian Visual
Cognition by R: Cook].
3. Human retina
3.1. Motivations
In the presented literature, we have seen that recent
progress in image representation has shown that simple intensity
comparison of several pairs of pixels can be good
enough to describe and match image patches [5, 26, 16].
However, there exist some open interrogations on the ideal
selection of pairs. How should we sample them and compare
them? How to be robust to noise? Should we smooth
with a single Gaussian kernel? In this work, we show how
to gain performance by selecting a solution inspired by the
human retina, while enforcing low computational complexity.
Neuroscience has made lots of progress in understanding
the visual system and how images are transmitted to the
brain [8]. It is believed that the human retina extracts details
from images using Difference of Gaussians (DoG) of
various sizes and encodes such differences with action potentials.
The topology of the retina plays an important role.
We propose to mimic the same strategy to design our image
descriptor.
3.2. Analogy: from retinal photoreceptors to pixels
The topology and spatial encoding of the retina is quite
fascinating. First, several photoreceptors influence a ganglion
cell. The region where light influences the response
of a ganglion cell is the receptive field. Its size and dendritic
field increases with radial distance from the foveola
(Figure 3). The spatial distribution of ganglion cells reduces
exponentially with the distance to the foveal. They are segmented
into four areas: foveal, fovea, parafoveal, and perifoveal.
Each area plays an interesting role in the process of
detecting and recognizing objects since higher resolution is
(a) Density of ganglion cells over
the retina [10].
(b) Retina areas [13]
Figure 3: Illustration of the distribution of ganglion cells over the retina.
The density is clustered into four areas: (a) the foveola, (b) fovea, (c)
parafoveal, and (d) perifoveal.
captured in the fovea whereas a low acuity image is formed
in the perifoveal. One can interpret the decrease of resolution
as a body resource optimization. Let us now turn these
insights into an actual keypoint descriptor. Figure 2 presents
the proposed analogy.
4. FREAK
4.1. Retinal sampling pattern
Many sampling grids are possible to compare pairs of
pixel intensities. BRIEF and ORB use random pairs.
BRISK uses a circular pattern where points are equally
spaced on circles concentric, similar to DAISY [28]. We
propose to use the retinal sampling grid which is also circular
with the difference of having higher density of points
near the center. The density of points drops exponentially
as can be seen in Figure 3. .
Each sample point needs to be smoothed to be less sensitive
to noise. BRIEF and ORB use the same kernel for
all points in the patch. To match the retina model, we
use different kernels size for every sample points similar
to BRISK. The difference with BRISK is the exponential
change in size and the overlapping receptive fields. Figure
4 illustrates the topology of the receptive fields. Each circle
represents the standard deviations of the Gaussian kernels
applied to the corresponding sampling points.
We have experimentally observed that changing the size
of the Gaussian kernels with respect to the log-polar retinal
pattern leads to better performance. In addition, overlapping
the receptive fields also increases the performance. A
possible reason is that with the presented overlap in Figure
4, more information is captured. We add redundancy that
brings more discriminative power. Let’s consider the intensities
Ii measured at the receptive fields A;B, and C where:
IA > IB, IB > IC, and IA > IC: (1)
If the fields do not have overlap, then the last test IA >
IC is not adding any discriminant information. However,
Figure 4: Illustration of the FREAK sampling pattern similar to the retinal
ganglion cells distribution with their corresponding receptive fields. Each
circle represents a receptive field where the image is smoothed with its
corresponding Gaussian kernel.
if the fields overlap, partially new information can be encoded.
In general, adding redundancy allow us to use less
receptive fields which is a known strategy employed in compressed
sensing or dictionary learning [6]. According to Olshausen
and Field in [23], such redundancy also exists in the
receptive fields of the retina.
4.2. Coarsetofine
descriptor
We construct our binary descriptor F by thresholding the
difference between pairs of receptive fields with their corresponding
Gaussian kernel. In other words, F is a binary
string formed by a sequence of one-bit Difference of Gaussians
(DoG):
F =
X
0a<N
2aT(Pa); (2)
where Pa is a pair of receptive fields, N is the desired size
of the descriptor, and
T(Pa) =
1 if (I(Pr1
a )
相关文章推荐
- 聊天ListView使用ViewHolder
- Android Cordova 插件开发之Cordova安装
- acm 3 1008 dp
- 【前端开发】CSS中的优先级
- 初次使用QT5串口类QSerialPort
- python网络编程socket之多进程
- Windows下C语言的Socket编程例子(TCP和UDP)
- Java并发编程:线程池的使用
- Python之HelloWorld
- Windows下编译php_scws.php
- android process communication between two application
- 第五届蓝桥杯决赛 生物芯片(完全平方数)
- 笔试面试题15--作业调度算法
- 如何让类对象只在栈(堆)上分配空间?
- Swift --> Map & FlatMap
- 笔试题65. LeetCode OJ (52)
- Python学习-机器学习实战-ch07 AdaBoost
- win7,win8怎样配置iis(图文教程)
- Alias Method解决随机类型概率问题
- Java编程中“为了性能”一些尽量做到的地方