您的位置:首页 > 其它

weka up-sampling & down-sampling

2015-06-24 20:14 405 查看
up-sampling:

SMOTE algorithm,over-sampled by creating ``synthetic'' examples rather than by over-sampling with replacement.

Weka supervised SMOTE filter
两个参数:
[list]
[*]nearestNeighbors:how many nearest neighbor instances (surrounding the currently considered instance) are used to build an inbetween synthetic instance. 默认取值5.
[*]percentage.how many synthetic instances are created based on the number of the class with less instances. 默认值100,假设minority class有25个样本,则25个新样本将会根据nearest Neighbors来合成,此时minority class的样本数变成了50.
[/list]

down-sampling
The majority class is under-sampled by randomly removing samples from the majority class population until the minority class becomes some specified percentage of the majority class.

Weka supervised SpreadSubsample filter
maxCount:可以取minority class的样本数量 n。
如果 maxCount < n: 则正负例的样本数量都减少到maxCount
如果 maxCount > n: 则minority class的样本数量 n不变,majority class的样本数量减少到maxCount



Instances train = DataSource
.read(path);
train.setClassIndex(rawins.numAttributes() - 1);
weka.filters.supervised.instance.SpreadSubsample sps = new SpreadSubsample();
sps.setMaxCount(n); //minority class的样本数量 n
sps.setInputFormat(train);
Instances ins = sps.useFilter(train, sps);
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: