您的位置:首页 > 其它

STS-2016-Task1-1

2015-12-31 21:51 302 查看
数据集整理

文件结构

ROOT

- train

STS2012-test#STS.input.MSRpar.txt

......

- gs

STS2012-test#STS.gs.MSRpar.txt

生成

ROOT

input.txt

gs.txt

input.info

格式是文件名,在input.txt的多少行到多少行,0下标开始

——————————————————————————————————————————————————————————

scanner.py

将所有文件合在一起 -- 便于训练
生成dict

import os

ROOT = os.getcwd()
TRAIN = ROOT + '/train/'
GS = ROOT + '/gs/'

train_fw = open('input.txt', 'w')
gs_fw = open('gs.txt', 'w')
dict_fw = open('input.info', 'w')
offSet = 0
dict_list = []
for file in os.listdir(TRAIN):
print file
train_fp = open(TRAIN + file).readlines()
gs_fp = open(GS + file.replace('input', 'gs')).readlines()

dict_list.append([file, str(offSet), str(offSet+len(gs_fp))])
offSet += len(gs_fp)
for train in train_fp:
print >>train_fw, train.strip()

for gs in gs_fp:
print >>gs_fw, gs.strip()

for line in dict_list:
print >>dict_fw, '\t'.join(line)

dict_fw.close()
gs_fw.close()
train_fw.close()
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: