您的位置:首页 > Web前端

Fastest HOG Feature Extraction implementation

2016-03-01 17:56 253 查看
Refer from http://stackoverflow.com/questions/18474897/fastest-hog-feature-extraction-implementation
Question

What's the fastest open-source HOG extraction code for multicore CPUs?

Motivation

I'm working on a real-time object detection application. Specifically, I've developed a variant of

Deformable Parts Model cascades, targeting 30fps object detection. I've reached a point where extracting
HOG features is more expensive than the rest of my pipeline, combined. I'm using the [Felzenzwalb, Girshick, et al] parameters for HOG extraction. That is, a multiresolution
pyramid of HOG descriptors, and each descriptor has a total of 32 bins for orientation and a few other cues.

Goals

I'd like to do multiscale HOG feature extraction at 60fps (16ms) for 640x480 images on a multicore CPU.

Related Work

I've benchmarked a few off-the-shelf multiscale HOG implementations on a 6-core Intel 3930k CPU. For a 640x480 image, I observe the following performance numbers:

HOG in
Dubout's
FFLD DPM code: 19fps (52ms) -- C++ with OpenMP, but no vectorization
HOG in
voc-release5 DPM code: 2.4fps (410ms) -- singlethreaded C++, plus a Matlab wrapper

I've also experimented with the
OpenCV HOG extraction code. The OpenCV version works, but it seems to be hard-coded for Dalal-Triggs' HOG setup, and OpenCV doesn't seem to allow me to use the same HOG parameters (normalization scheme, binary position features, etc) as [Felzenzwalb, Girshick,
et al]. The OpenCV version also doesn't natively support multiscale HOG, though you could do the downsampling yourself and call OpenCV HOG for each scale. I don't remember what the OpenCV HOG performance looked like.

Final Thoughts

The fastest HOG implementation --
FFLD -- seems to leave a lot of performance on the table. I haven't done a GFLOP/s estimate, but I do notice that FFLD's HOG code doesn't use any SSE/AVX vectorization. There isn't that much control flow, so vectorization seems like a cheap speedup opportunity
here.
I haven't mentioned GPU HOG implementations here. I've experimented with
groundHOG/CUHOG and

fasthog. The CUHOG authors claim 20fps (50ms) HOG extraction on an NVIDIA GTX560. But, Intel CPUs are the target platform for my application, and copying a full HOG pyramid from the GPU to CPU is prohibitively expensive.

c++

performance
image-processing
computer-vision
shareimprove
this question
edited
Feb 19 '15 at 19:17

asked
Aug 27 '13 at 20:33




solvingPuzzles
3,32533777

OpenCV includes the Dalal's implementation of HOG both in CPU and GPU versions. They work pretty good in my opinion, and they can be easily used for object detection with the OpenCV's
CvSVM. – marcos.nieto
Oct
25 '13 at 17:10
The filter convolution is the most expensive part in DPM so how do you manage this part? – Mickey
Shine Jun
11 '14 at 14:35
1
@MickeyShine the usual stuff... massively quantizing the features, and doing cascades. I'm doing more deep learning and less HOG-based DPMs these days. But I reached a point where I
could the convolutions for a HOG-based 3-component, 8-part-per-component model in well under 50ms. – solvingPuzzles
Jun
11 '14 at 16:15

1
@3yanlis1bos Thanks! I've fixed the FFLD link. – solvingPuzzles
Feb
19 '15 at 19:17
2
Just adding a couple of updated links
ffld and
ffld2. Seems to have moved again – Jon
Mar
9 '15 at 8:29
show
3 more comments

1 Answer

active

oldest
votes

up vote
1
down vote
Have a look at the following implementation
HoG SSE

It does fit your time requirements. It is written in C and uses 128 bit long SIMD instructions.

The code can be also further customized depending on normalization strategy and output type you need.

I would be glad to hear your feedback and be able to improve this code.

shareimprove
this answer
answered
Nov 12 '13 at 8:17




ivan_a
515411

Interesting! I'll give this a try. Does it do multiscale extraction (a "HOG pyramid," as some people call it)? – solvingPuzzles
Nov
26 '13 at 6:28
1
@solvingPuzzles, did the HoG SEE fit your time requeriments? which solution did you find? – Tin
Feb
24 '14 at 11:32
@ivan_a could you please explain, how to use this code? I see that it uses only 16 bins and it is written that you can't change this? What does that mean? – Parag
S. Chandakkar Aug
17 '14 at 9:28
add a comment
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: