Fastest HOG Feature Extraction implementation
2016-03-01 17:56
253 查看
Refer from http://stackoverflow.com/questions/18474897/fastest-hog-feature-extraction-implementation
oldest
votes
Question What's the fastest open-source HOG extraction code for multicore CPUs? Motivation I'm working on a real-time object detection application. Specifically, I've developed a variant of Deformable Parts Model cascades, targeting 30fps object detection. I've reached a point where extracting HOG features is more expensive than the rest of my pipeline, combined. I'm using the [Felzenzwalb, Girshick, et al] parameters for HOG extraction. That is, a multiresolution pyramid of HOG descriptors, and each descriptor has a total of 32 bins for orientation and a few other cues. Goals I'd like to do multiscale HOG feature extraction at 60fps (16ms) for 640x480 images on a multicore CPU. Related Work I've benchmarked a few off-the-shelf multiscale HOG implementations on a 6-core Intel 3930k CPU. For a 640x480 image, I observe the following performance numbers: HOG in Dubout's FFLD DPM code: 19fps (52ms) -- C++ with OpenMP, but no vectorization HOG in voc-release5 DPM code: 2.4fps (410ms) -- singlethreaded C++, plus a Matlab wrapper I've also experimented with the OpenCV HOG extraction code. The OpenCV version works, but it seems to be hard-coded for Dalal-Triggs' HOG setup, and OpenCV doesn't seem to allow me to use the same HOG parameters (normalization scheme, binary position features, etc) as [Felzenzwalb, Girshick, et al]. The OpenCV version also doesn't natively support multiscale HOG, though you could do the downsampling yourself and call OpenCV HOG for each scale. I don't remember what the OpenCV HOG performance looked like. Final Thoughts The fastest HOG implementation -- FFLD -- seems to leave a lot of performance on the table. I haven't done a GFLOP/s estimate, but I do notice that FFLD's HOG code doesn't use any SSE/AVX vectorization. There isn't that much control flow, so vectorization seems like a cheap speedup opportunity here. I haven't mentioned GPU HOG implementations here. I've experimented with groundHOG/CUHOG and fasthog. The CUHOG authors claim 20fps (50ms) HOG extraction on an NVIDIA GTX560. But, Intel CPUs are the target platform for my application, and copying a full HOG pyramid from the GPU to CPU is prohibitively expensive. c++ performance image-processing computer-vision
| |||||||||||||||||||||
3 more comments |
1 Answer
activeoldest
votes
up vote 1 down vote | Have a look at the following implementation HoG SSE It does fit your time requirements. It is written in C and uses 128 bit long SIMD instructions. The code can be also further customized depending on normalization strategy and output type you need. I would be glad to hear your feedback and be able to improve this code.
| ||||||||||||
|
相关文章推荐
- JSTL
- jquery插件编写学习
- css特效实现html表格显示部分内容,当鼠标移上去显示全部。
- css中合理的使用nth-child实现布局
- SCSS迷你书(下) - SCSS中@指令
- SCSS迷你书(下) - SCSS中@指令
- JS实现继承的几种方式详述(推荐)
- SCSS迷你书(上)
- SCSS迷你书(上)
- extjs 做出这样的效果如何做
- css背景设置,让套图中某张图片居中显示的例子
- javascript动画实现
- HTML+CSS学习笔记 (7) - CSS样式基本知识
- js 获取当前的时间
- datetimepicker设置默认视图为年视图
- React Native学习
- 前端渲染和后端渲染
- ExtJs5.0在WebStorm上的使用之入门教程(一)编写第一个网页 HelloExt
- 前端工具---取色截图测量
- Present Perfect Simple