您的位置:首页 > 编程语言 > Python开发

python解决数据预处理:将KDDCPU99数据格式转换成libsvm可读的格式

2015-06-21 10:53 826 查看
       最近在研究基于支持向量机的入侵检测,使用的是KDDCPU99的数据,但是KDDCPU99数据的格式是cvs格式的,我们用的支持向量机的插件用的是libsvm库,但是libsvm读数据的格式是 label index:属性 index:属性;label是分类的标记,后面是跟着的属性。线面本人用python语言写了这段数据转换代码与大家分享。

有啥问题请留言,可以相互交流的。

import string,csv

protocol_type = []
service = []
flag = []
attack_type = []
dos_type = ['back','land','neptune','pod','smurf','teardrop']
probe_type = ['ipsweep','nmap','portsweep','satan']
r2l_type = ['ftp_write','guess_passwd','imap','multihop','phf','spy','warezclient','warezmaster']
u2r_type = ['buffer_overflow','loadmodule','perl','rootkit']

def main():
print "data dealing..."
writer=csv.writer(open('/home/tool/svmdata/kddcup.data_10_percent1.csv','wb'),delimiter=" ")
reader=csv.reader(open('/home/tool/svmdata/kddcup.data_10_percent.csv','rb'))
for line in reader:
#protocol
if line[1] in protocol_type:
pass #not do deal
else:
protocol_type.append(line[1])
protocol_index = protocol_type.index(line[1])+1
line[1]=protocol_index
#service
if line[2] in service:
pass #not do deal
else:
service.append(line[2])
service_index = service.index(line[2])+1
line[2] = service_index
#flag
if line[3] in flag:
pass #not do deal
else:
flag.append(line[3])
flag_index = flag.index(line[3])+1
line[3] = flag_index
#type
line[41] = ''.join(line[41].split('.'))
if line[41] == 'normal':
line[41] = 1
elif line[41] in dos_type:
line[41] = 2
elif line[41] in probe_type:
line[41] = 3
elif line[41] in r2l_type:
line[41] = 4
elif line[41] in u2r_type:
line[41] = 5

temp = line[41]
line.insert(0,temp)
del line[42]
for k in range(1,len(line)):
k2str = str(k) + ':'
line[k] = k2str + str(line[k])
writer.writerow(line)

if __name__=="__main__":
main()
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  python cvs