您的位置：首页 > 其它

only integer scalar arrays can be converted to a scalar index

2018-03-08 13:59 639 查看

在使用StratifiedShuffleSplit进行交叉验证时，出现上述错误。
具体问题发现与解决过程如下：from sklearn.model_selection import StratifiedShuffleSplit
sss=StratifiedShuffleSplit(n_splits=10,test_size=0.3,train_size=0.7, random_state=42)
for train_index, test_index in sss.split(features, labels):

X_train, X_test = features[train_index], features[test_index]#训练集对应的值
y_train, y_test = labels[train_index], labels[test_index]#类别集对应的值

从文档中查得StratifiedShuffleSplit使用方式如上，运行程序，该段代码存在编码错误 UnicodeDecodeError: 'gbk' codec can't decode bytes in position 69-70: illegal multibyte sequence，排查之后发现错误行为：

y_train, y_test = labels[train_index], labels[test_index] #类别集对应的值

为查找问题，在该行前打印labels[train_index] 出现如题错误 TypeError: only integer scalar arrays can be converted to a scalar index，可见编码错误归结为数组下标问题。
改为np.array(labels)[train_index]打印成功
原因可能是新版的numpy需要这样去使用shuffle，我的train_y是列表，列表元素是array,但是这样无法使用直接获取index.
最终修改后可正确运行

from sklearn.model_selection import StratifiedShuffleSpli
sss=StratifiedShuffleSplit(n_splits=10,test_size=0.3,train_size=0.7, random_state=42)

for train_index, test_index in sss.split(features, labels): print("TRAIN:", train_index, "TEST:", test_index)#获得索引值 X_train, X_test = features[train_index], features[test_index] #训练集对应的值 print("labels[train_index]", np.array(labels)[test_index]) y_train, y_test = np.array(labels)[train_index], np.array(labels)[test_index] #类别集对应的值总结：开始出现编码错误时，我一直在纠结编码错误问题，想着同一个数组，之前也读取过并未出现解码问题，生成建模集和验证集也不应该出现问题。浪费了很多时间。
出现问题时不要一味看表面问题苦恼，逐段、逐行排查寻找根源问题，错误就迎刃而解了！

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航