Machine Learning Foundations(NTU) 第一次作业
2016-01-07 16:15
435 查看
Machine Learning Foundations(NTU) 第一次作业
Each line of the data set contains one (xn,yn) with xn∈R4. T
he first 4 numbers of the line contains the components of xn orderly, the last number is yn.
Please initialize your algorithm with w=0 and take sign(0) as −1
Question 15:
Implement a version of PLA by visiting examples in the naive cycle using the
order of examples in the data set. Run the algorithm on the data set.
What is the number of updates before the algorithm halts?
Question 16:
Implement a version of PLA by visiting examples in fixed, pre-determined random
cycles throughout the algorithm. Run the algorithm on the data set. Please repeat
your experiment for 2000 times, each with a different random seed. What is the average
number of updates before the algorithm halts?
Question 17:
Implement a version of PLA by visiting examples in fixed, pre-determined random cycles
throughout the algorithm, while changing the update rule to be:
wt+1 = wt + alpha * yn(t)xn(t)
with alpha =0.5. Note that your PLA in the previous Question corresponds to alpha=1.
Please repeat your experiment for 2000 times, each with a different random seed.
What is the average number of updates before the algorithm halts?
https://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_18_train.dat
Test DATA: https://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_18_test.dat
Question 18:
As the test set for “verifying” the g returned by your algorithm (see lecture 4 about verifying).
The sets are of the same format as the previous one.
Run the pocket algorithm with a total of 50 updates on D, and verify the performance of w using the test set.
Please repeat your experiment for 2000 times, each with a different random seed.
What is the average error rate on the test set?
Question 19:
Modify your algorithm in Question 18 to return w50 (the PLA vector after 50 updates) instead of
Wg (the pocket vector) after 50 updates. Run the modified algorithm on D, and verify the performance
using the test set. Please repeat your experiment for 2000 times, each with a different random seed.
What is the average error rate on the test set?
Question 20:
Modify your algorithm in Question 18 to run for 100 updates instead of 50, and verify the performance
of wPOCKET using the test set. Please repeat your experiment for 2000 times, each with a different random seed.
What is the average error rate on the test set?
PLA
DATA: https://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_15_train.datEach line of the data set contains one (xn,yn) with xn∈R4. T
he first 4 numbers of the line contains the components of xn orderly, the last number is yn.
Please initialize your algorithm with w=0 and take sign(0) as −1
Question 15:
Implement a version of PLA by visiting examples in the naive cycle using the
order of examples in the data set. Run the algorithm on the data set.
What is the number of updates before the algorithm halts?
Question 16:
Implement a version of PLA by visiting examples in fixed, pre-determined random
cycles throughout the algorithm. Run the algorithm on the data set. Please repeat
your experiment for 2000 times, each with a different random seed. What is the average
number of updates before the algorithm halts?
Question 17:
Implement a version of PLA by visiting examples in fixed, pre-determined random cycles
throughout the algorithm, while changing the update rule to be:
wt+1 = wt + alpha * yn(t)xn(t)
with alpha =0.5. Note that your PLA in the previous Question corresponds to alpha=1.
Please repeat your experiment for 2000 times, each with a different random seed.
What is the average number of updates before the algorithm halts?
[code]import urllib2 import numpy as np import random # url = 'https://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_15_train.dat' # f = urllib2.urlopen(url) # with open("hw1_15_train.dat", "wb") as code: # code.write(f.read()) def train_PLA(): code = open("hw1_15_train.dat", "r") lines = code.readlines() xn = np.zeros((len(lines), 5)).astype(np.float) yn = np.zeros((len(lines),)).astype(np.int) learn_rate = 0.5 for i in range(0, len(lines)): line = lines[i] line = line.rstrip('\r\n').replace('\t', ' ').split(' ') xn[i, 0] = 1 xn[i, 1] = float(line[0]) xn[i, 2] = float(line[1]) xn[i, 3] = float(line[2]) xn[i, 4] = float(line[3]) yn[i] = int(line[4]) # print '---- ', i, ' --------',xn[i, 0], xn[i, 1], xn[i, 2], xn[i, 3], yn[i] wn = np.zeros((5, )).astype(np.float) updates = 1000 cnt = 0 for j in range(updates): is_stop = True idx = range(len(lines)) idx = random.sample(idx, len(lines)) for i in range(0, len(lines)): if int(np.sign(np.dot(wn, xn[idx[i]].transpose()))) != yn[idx[i]]: wn = wn + learn_rate * yn[idx[i]] * xn[idx[i]] is_stop = False cnt += 1 if is_stop: break return cnt updates = 0 for i in range(2000): updates = updates + train_PLA() print 'random :', i print 'Average updates: ', updates / 2000.
Pocket PLA
Train DATA:https://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_18_train.dat
Test DATA: https://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_18_test.dat
Question 18:
As the test set for “verifying” the g returned by your algorithm (see lecture 4 about verifying).
The sets are of the same format as the previous one.
Run the pocket algorithm with a total of 50 updates on D, and verify the performance of w using the test set.
Please repeat your experiment for 2000 times, each with a different random seed.
What is the average error rate on the test set?
Question 19:
Modify your algorithm in Question 18 to return w50 (the PLA vector after 50 updates) instead of
Wg (the pocket vector) after 50 updates. Run the modified algorithm on D, and verify the performance
using the test set. Please repeat your experiment for 2000 times, each with a different random seed.
What is the average error rate on the test set?
Question 20:
Modify your algorithm in Question 18 to run for 100 updates instead of 50, and verify the performance
of wPOCKET using the test set. Please repeat your experiment for 2000 times, each with a different random seed.
What is the average error rate on the test set?
[code]> __author__ = 'zgf' import urllib2 import numpy as np import random url = 'https://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_18_train.dat ' f = urllib2.urlopen(url) with open("hw1_18_train.dat", "wb") as code: code.write(f.read()) def train_PLA(): code = open("hw1_18_train.dat", "r") lines = code.readlines() xn = np.zeros((len(lines), 5)).astype(np.float) yn = np.zeros((len(lines),)).astype(np.int) learn_rate = 1 for i in range(0, len(lines)): line = lines[i] line = line.rstrip('\r\n').replace('\t', ' ').split(' ') xn[i, 0] = 1 xn[i, 1] = float(line[0]) xn[i, 2] = float(line[1]) xn[i, 3] = float(line[2]) xn[i, 4] = float(line[3]) yn[i] = int(line[4]) wn = np.zeros((5, )).astype(np.float) wg = 0 last_error = test_PLA(wn) updates = 100 for i in range(0, updates): idx = range(len(lines)) idx = random.sample(idx, len(lines)) for i in idx: if int(np.sign(np.dot(wn, xn[idx[i]].transpose()))) != yn[i]: wn = wn + yn[i] * xn[i] new_error = test_PLA(wn) if last_error > new_error: last_error = new_error wg = wn break return updates, wg def test_PLA(wn): code = open("hw1_18_test.dat", "r") lines = code.readlines() xn = np.zeros((len(lines), 5)).astype(np.float) yn = np.zeros((len(lines),)).astype(np.int) learn_rate = 1 error_rate = 0 for i in range(0, len(lines)): line = lines[i] line = line.rstrip('\r\n').replace('\t', ' ').split(' ') xn[i, 0] = 1 xn[i, 1] = float(line[0]) xn[i, 2] = float(line[1]) xn[i, 3] = float(line[2]) xn[i, 4] = float(line[3]) yn[i] = int(line[4]) # print '---- ', i, ' --------',xn[i, 0], xn[i, 1], xn[i, 2], xn[i, 3], yn[i] for i in range(0, len(lines)): y = int(np.sign(np.dot(wn, xn[i].transpose()))) if y != yn[i]: error_rate += 1 return error_rate * 1. / len(lines) error_rate = 0 for i in range(2000): wg = train_PLA()[1] error = test_PLA(wg) error_rate += error print 'random :', i, ' error_rate: ', error print 'Average error_rate: ', error_rate / 2000.
相关文章推荐
- 短信验证码相关
- java免费发送邮件实现
- cmd 连接sqlserver
- 【LWJGL2 WIKI】【现代OpenGL篇】版本选择
- Github进行fork后如何与原仓库同步
- C 简单选择排序
- 内核空间和用户空间
- Swift 中的委托/代理模式(转载)
- iOS中,如何做到未知参数数量的反射
- linux中ps命令的用法说明
- 面向对象概括
- mysql通过命令行导入导出问题
- JQM 日期插件【转载】
- 从数据库取出文件流显示图片
- 在Oracle中比较2表字段是否一样
- JMeter遇到问题FQA
- 自定义UI(由下而上版)之第三阶段
- 通过JDBC对Mysql数据库进行简单的增删改查
- subsets
- Swift 调用oc 桥接头文件