您的位置：首页 > 其它

使用FCN做图像语义分割

2017-03-09 16:28 801 查看

转载自：http://blog.csdn.net/gavin__zhou/article/details/52142696

FCN原理

原理我已经在上篇博客说过,大家可以参考FCN原理篇

代码

FCN

有官方的代码,具体地址是FCN官方代码

不过我用的不是这个代码,我用的是别人修改官方的版本的代码,使用

Chainer

框架实现的,

Chainer

的源码链接:

Chainer框架源码,如果大家使用过

Keras

的话,应该对它不会感到特别的陌生,

Chainer:
a neural network framework

好了,我使用的代码是FCN的Chainer implementation, 具体地址是FCN Chainer implementation

安装

安装很简单,直接

pip

或者源码安装都可以,但是我在我的机器上装过几次,发现使用

pip

的方式最后

fcn.data_dir

这个变量的值会指向到你系统的Python下的dist-packages这个目录,但是这个目录需要root权限,所以不推荐使用

pip

直接安装的方式;
关于此问题的说明见:

fcn.data_dir的问题

所以我最后使用的是源码安装的方式,这里推荐使用

virtualenv

工具建立虚拟环境,实践中发现这是最不会出错的方式,推荐使用!

clone代码

Git clone https://github.com/wkentaro/fcn.git –recursive

使用virtualenv安装

sudo pip install virtualenv #安装virtualenv

创建虚拟目录

virtualenv test-fcn

cd test-fcn

激活虚拟环境

source ./bin/activate

克隆fcn代码

git clone https://github.com/wkentaro/fcn.git –recursive

cd fcn

安装fcn

python setup.py develop

demo

下载

VOC2012

数据集,放入fcn-data-pascal-VOC2012路径下

1. 转换caffe model为Chainer model

./scripts/caffe_to_chainermodel.py

2. load model,进行分割

./scripts/fcn_forward.py –img-files data/pascal/VOC2012/JPEGImages/2007_000129.jpg

训练自己的数据

这个前后搞了快一个月,才把最终的训练搞定,其中艰辛很多,在这里写出来供大家参考

准备自己的数据集

数据集做成

VOC2012

的

segementClass

的样子,下图是示例,上面一张是原图,下面一张是分割图

但是每一种label指定的物体都有对应的具体的颜色,这个我们犯了很多错,最后跟踪代码找出来的,具体的每一类的RGB值如下:


Index	RGB值
0	(0,0,0)
1	(0,128,0)
2	(128,128,0)
3	(0,0,128)
4	(128,0,128)
5	(0,128,128)
6	(128,128,128)
7	(64,0,0)
8	(192,0,0)
9	(62,128,0)
10	(192,128,0

这里只列出10类的值,更多类的可以看下面这段代码:

def bitget(byteval, idx):
return ((byteval & (1 << idx)) != 0)

def labelcolormap(N=256):
cmap = np.zeros((N, 3))  #N是类别数目
for i in xrange(0, N):
id = i
r, g, b = 0, 0, 0
for j in xrange(0, 8):
r = np.bitwise_or(r, (bitget(id, 0) << 7-j))
g = np.bitwise_or(g, (bitget(id, 1) << 7-j))
b = np.bitwise_or(b, (bitget(id, 2) << 7-j))
id = (id >> 3)
cmap[i, 0] = r
cmap[i, 1] = g
cmap[i, 2] = b
cmap = cmap.astype(np.float32) / 255 #获得Cmap的RGB值
return cmap

def _label_rgb_to_32sc1(self, label_rgb):
assert label_rgb.dtype == np.uint8
label = np.zeros(label_rgb.shape[:2], dtype=np.int32)
label.fill(-1)
cmap = fcn.util.labelcolormap(len(self.target_names))
cmap = (cmap * 255).astype(np.uint8)  #转换为整数值
for l, rgb in enumerate(cmap):
mask = np.all(label_rgb == rgb, axis=-1)
label[mask] = l
return label

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[/code]

按照此颜色表做图就没有问题,代码可以正确的读取分割的ground-truth结果

原始的图像放在

fcn/data/pascal/VOC2012/JPEGImages

分割的图像放在

fcn/data/pascal/VOC2012/SegmentationClass

之后在

fcn/data/pascal/VOC2012/ImageSets/Segmentation

写

train.txt

trainval.txt

val.txt

,写入需要进行相应任务的图片的编号

修改代码

fcn/scripts/fcn_train.py

# setup optimizer
optimizer = O.MomentumSGD(lr=1e-10, momentum=0.99) #这里的lr一定要小,大的话程序会报错,我使用的是1e-9
optimizer.setup(model)

# train
trainer = fcn.Trainer(
dataset=dataset,
model=model,
optimizer=optimizer,
weight_decay=0.0005,
test_interval=1000,
max_iter=100000,
snapshot=4000,
gpu=gpu,
)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[/code]

fcn/fcn/pascal.py

target_names = np.array([
'background',
'aeroplane',
'bicycle',
'bird',
'boat',
'bottle',
'bus',
'car',
'cat',
'chair',
'cow',
'diningtable',
'dog',
'horse',
'motorbike',
'person',
'potted plant',
'sheep',
'sofa',
'train',
'tv/monitor',
]) #修改成自己的,记得按照颜色表写

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[/code]

fcn/fcn/util.py

def resize_img_with_max_size(img, max_size=500*500):  #修改max_size,按照实际写
"""Resize image with max size (height x width)"""
from skimage.transform import rescale
height, width = img.shape[:2]
scale = max_size / (height * width)
resizing_scale = 1
if scale < 1:
resizing_scale = np.sqrt(scale)
img = rescale(img, resizing_scale, preserve_range=True)
img = img.astype(np.uint8)
return img, resizing_scale

1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11
[/code]

fcn/fcn/models/fcn32s.py

def __init__(self, n_class=21):  #修改类别n_class
self.n_class = n_class
super(self.__class__, self).__init__(
conv1_1=L.Convolution2D(3, 64, 3, stride=1, pad=100),
conv1_2=L.Convolution2D(64, 64, 3, stride=1, pad=1),

conv2_1=L.Convolution2D(64, 128, 3, stride=1, pad=1),
conv2_2=L.Convolution2D(128, 128, 3, stride=1, pad=1),

conv3_1=L.Convolution2D(128, 256, 3, stride=1, pad=1),
conv3_2=L.Convolution2D(256, 256, 3, stride=1, pad=1),
conv3_3=L.Convolution2D(256, 256, 3, stride=1, pad=1),

conv4_1=L.Convolution2D(256, 512, 3, stride=1, pad=1),
conv4_2=L.Convolution2D(512, 512, 3, stride=1, pad=1),
conv4_3=L.Convolution2D(512, 512, 3, stride=1, pad=1),

conv5_1=L.Convolution2D(512, 512, 3, stride=1, pad=1),
conv5_2=L.Convolution2D(512, 512, 3, stride=1, pad=1),
conv5_3=L.Convolution2D(512, 512, 3, stride=1, pad=1),

fc6=L.Convolution2D(512, 4096, 7, stride=1, pad=0),
fc7=L.Convolution2D(4096, 4096, 1, stride=1, pad=0),

score_fr=L.Convolution2D(4096, self.n_class, 1, stride=1, pad=0),

upscore=L.Deconvolution2D(self.n_class, self.n_class, 64,
stride=32, pad=0),
)
self.train = False

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[/code]

训练

./scripts/fcn_train.py

其会在

fcn/data/

下创建一个目录叫做

SegmentationClassDataset_db

,里面存放训练的图片的pickle数据,如果需要修改原始的训练图片则需要将此目录删除,否则默认读取此目录内的pickle数据作为图像的原始数据

会在

fcn

下创建

snapshot

这个目录,里面有训练保存的

model

,日志文件等,重新训练的话,建议删除此目录

使用自己训练的model

./scripts/fcn_forward.py -c path/to/your/model -i path/to/your/image

结果存放在

fcn/data/forward_out

下

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 图像分割

相关文章推荐

新的分享

章节导航